Feature Chart
SEs by Feature
News Searches
Multi-Search
Directories
Opinions/Usenet
Phone Numbers
Others
Size
Freshness
Inconsistencies
And more
Google
Yahoo!
Search
Teoma
Gigablast
News Archive
Email Lists
Alerts
On the Net columns
Search Strategies
Books on Searching
Site Awards
About this Site |
|
|
|
Finding Old Web Pages
Last updated Jul. 12, 2023.
by Greg R.
The Web changes constantly, and sometimes that page that had just the
information you needed yesterday (or last month or two years ago) is not
available today. At other times you may want to see how a page's content or
design has changed. There are several sources for finding Web pages as they used
to exist.
While Google's cache is probably the best known, the others are important
alternatives that may have pages not available at Google or the Wayback Machine
plus they may have an archived page from a different date. The table below notes
the name of the service, the way to find the archived page, and some notes that
should give some idea as to how old a page the archive may contain.
Multiple copies of pages |
Wayback Machine |
Enter URL in search box to view |
From late 1996 to six to ten months ago
from the Internet Archive |
Single "Cached" Copy of a Page |
Yahoo! |
Cached link to view |
Estimate from yesterday to 3 months old, no date given |
Google |
cache:URL or Cached link to view |
Estimate from yesterday to 3 months old, no date given |
Gigablast |
[cached] link to view |
From recent to a year old, gives date of cache |
ScrubTheWeb |
Cached link to view |
Small database, from 1-3 months old, no date given |
IncyWincy |
cached link to view |
Small database based on ODP, about 6 months old, gives date
of cache |
Family Source |
Cached link to view |
Small database, 1 million+ "family friendly" pages. About 1
month old. Date on search results page, not cached page. |
Daypop |
Cached date link to view |
Last two weeks, blog postings and news articles, gives date
of cache |
Feedster |
Cached link to view |
Typically caches only the first few lines from blog & news
RSS feeds |
BoardReader |
Cached link to view |
Web forum postings only, date unreliable |
Blogging
Ecosystem |
c link to view |
Very small: top linked and linking blogs only |
Services that Used to Have a Cached Copy |
SearchEdu
SearchGov
SearchMil |
All from MaxBot |
These used to have their own database and cached copies. As
of 2003, SearchGov and SearchEdu just give Google results. SearchMil no longer
has cached copies. |
Google News |
Formerly cache:URL to view |
Cached capability removed in March 2003. |
Note that none of these include all Web pages. A
robots.txt file or a <meta
name="ROBOTS" content="NOINDEX"> in the header of a file can prohibit the
crawling of the page. Google and other should look for a <meta name="ROBOTS"
content="NOARCHIVE"> in the header and not cache such pages. But the exclusions
do not always work. Other possible ways to resurrect a dead link include
checking in your local browser's cache if you visited the page recently or hope
that someone else copied and posted the file on the Web.
For more details on searching the Wayback Machine, see my article "The
Wayback Machine: The Web’s Archive." ONLINE 26(2): 59-61, Mar.-Apr.
2002.
|