Search Engine Showdown
[an error occurred while processing this directive]

Search Engine Statistics: Database Total Size Estimates
by Greg R. Notess


Search EngineRange in millions
Fast 280 to 308
Northern Light219 to 241
AltaVista197 to 217
Google!140 to 154
Excite135 to 149
Anzwers73 to 81
iWon Inktomi 43 to 48
Yahoo!'s Inktomi58 to 64
AOL Inktomi61 to 67
Lycos41 to 45
Infoseek48 to 52
HotBot36 to 39
WebCrawler5 to 6
Data from: Feb. 21, 2000
Based on Fast and Northern Light reported size and percentages from relative size analysis
Fast: 307,591,128 reported
Northern Light: 219,427,737 reported
Check today's Northern Light count. (Will open a new browser window)

Fast has grown considerably since November, but shrunk slightly since the January supplemental comparison. AltaVista still does not live up to its claim of 270 million pages. Both Excite and Google showed significant growth since November, and Northern Light continues its steady growth. Note also the figures below which factor in the dead link analysis which represent an even better estimate on the number of accessible pages. This uses the dead link percentages for any kind of 400 family error or other failure to connect.

EngineAdjusted for dead links
Range in millions
Fast Search273 to 300
Northern Light207 to 228
AltaVista170 to 187
Google!134 to 147
Excite123 to 136
Anzwers72 to 80
MSN Inktomi71 to 78
HotBot35 to 38

These estimates use a wide range which is based on exact counts obtained from Fast and Northern Light on the date of the comparison. These total numbers of Web pages in their databases are used as the starting point. To estimate the size of the other databases, that number is multiplied by the percentage of a search engine's total hits from the 25 searches used on the relative size analysis as compared to the number found by Fast and Northern Light. While the terms used for the 25 searches are not chosen completely at random, they were chosen from a variety of subject areas and countries so as to meet the criteria outlined in the methodology.

Northern Light has a technique that can be used to obtain an up-to-date count of Web pages in their database. Limit to the World Wide Web only and enter search or not search. The resulting number should be the current size of their Web database. It works with most common terms. The OR NOT operation finds every record which has the term as well as every record which does not have the term. Fast provided me with a similar technique (which unfortunately I am not permitted to disclose) which gives an exact count of the records in their database.

AltaVista used to have a similar technique, although it was far from accurate. With their changes on Oct. 25, 1999, it no longer works. However, just in case it starts working again, the trick was to enter an asterisk * in their Advance Search Boolean Box and check the Count Documents box. An early technique of +* on the Simple Search used to give a much higher number, but it no longer works either. Trying url:http found "about 86,940,437" on Feb. 23, 2000.

So why these discrepancies between claimed size and these estimates? Bear in mind that these are very rough estimates. There are several factors to consider which may explain these results beyond the limit of basing the estimates on a small number of searches and on only Fast and Northern Light's reported numbers.

The Inktomi-based search engines (HotBot, MSN, and Yahoo!'s search engine component) are run on clusters of computers. According to Inktomi, at any point in time, some of the computers may be down for backup or other maintenance. Consequently, their entire database may not be searched at any point in time. My estimates thus reflect what was available to be searched at the time the searches were run.

AltaVista will time out on some searches and only deliver partial results. Since my numbers are based on actual number of hits found, that may cause AltaVista's size to be under-represented. On the other hand, if Inktomi and AltaVista do not have their full databases available to searchers, what is the use of that extra size if it is inaccessible? These estimates may well give a better sense of the size of the accessible portion of the search engine databases.

See also the previous Total Size Estimates:
Nov. 1999
Sept. 1999
Aug. 1999
May 1999
March 1999
Jan. 1999

While decisions about which Web search engine to use should not be based on size alone, this information is especially important when looking for very specific keywords, phrases, and areas of specialized interest. See also the following statistical analyses: