Finding Old Web Pages

Last updated Jul. 12, 2023.
by Greg R.

The Web changes constantly, and sometimes that page that had just the information you needed yesterday (or last month or two years ago) is not available today. At other times you may want to see how a page's content or design has changed. There are several sources for finding Web pages as they used to exist.

While Google's cache is probably the best known, the others are important alternatives that may have pages not available at Google or the Wayback Machine plus they may have an archived page from a different date. The table below notes the name of the service, the way to find the archived page, and some notes that should give some idea as to how old a page the archive may contain.

Multiple copies of pages
Wayback Machine	Enter URL in search box to view	From late 1996 to six to ten months ago from the Internet Archive
Single "Cached" Copy of a Page
Yahoo!	`Cached` link to view	Estimate from yesterday to 3 months old, no date given
Google	`cache:URL` or `Cached` link to view	Estimate from yesterday to 3 months old, no date given
Gigablast	`[cached]` link to view	From recent to a year old, gives date of cache
ScrubTheWeb	`Cached` link to view	Small database, from 1-3 months old, no date given
IncyWincy	`cached` link to view	Small database based on ODP, about 6 months old, gives date of cache
Family Source	`Cached` link to view	Small database, 1 million+ "family friendly" pages. About 1 month old. Date on search results page, not cached page.
Daypop	`Cached date` link to view	Last two weeks, blog postings and news articles, gives date of cache
Feedster	`Cached` link to view	Typically caches only the first few lines from blog & news RSS feeds
BoardReader	`Cached` link to view	Web forum postings only, date unreliable
Blogging Ecosystem	`c` link to view	Very small: top linked and linking blogs only
Services that Used to Have a Cached Copy
SearchEdu SearchGov SearchMil	All from MaxBot	These used to have their own database and cached copies. As of 2003, SearchGov and SearchEdu just give Google results. SearchMil no longer has cached copies.
Google News	Formerly `cache:URL` to view	Cached capability removed in March 2003.

Note that none of these include all Web pages. A robots.txt file or a <meta name="ROBOTS" content="NOINDEX"> in the header of a file can prohibit the crawling of the page. Google and other should look for a <meta name="ROBOTS" content="NOARCHIVE"> in the header and not cache such pages. But the exclusions do not always work. Other possible ways to resurrect a dead link include checking in your local browser's cache if you visited the page recently or hope that someone else copied and posted the file on the Web.

For more details on searching the Wayback Machine, see my article "The Wayback Machine: The Web’s Archive." ONLINE 26(2): 59-61, Mar.-Apr. 2002.