Search Engine Showdown
[an error occurred while processing this directive]

Search Engine Statistics: Database Total Size Estimates
by Greg R. Notess
Fast Search212,968,062
Northern Light153,586,380
AltaVista133,918,781
Google!91,466,390
Anzwers87,711,980
Yahoo!'s Inktomi72,281,009
Snap49,134,553
MSN Web Search45,242,366
HotBot43,571,826
Excite40,265,189
Infoseek39,162,977
Lycos32,584,148
Date: Aug. 4-5, 1999
Based on Northern Light reported size and percentage from relative size analysis
Northern Light: 153,586,380 reported & claimed
Fast: 200 million claimed
AltaVista: 173,214,649 reported

Reported numbers above are from techniques described below. Claimed size above from press releases or sources such as the Search Engine Watch Size Report.

Since only Northern Light can provide an exact count of the size of their Web database on a given date, I used the number of hits reported by Northern Light as the starting point and then estimated the size of the other databases using that number times the percentage of total hits from the 25 searches used on the relative size analysis. While the terms used for the 25 searches are not completely chosen at random, they were chosen from a variety of subject areas and countries.

Both Northern Light and AltaVista have techniques that can be used for an at the moment count. On Northern Light, use the Power Search limited to the Web and search count or not count. On the AltaVista search, enter a * in their Advance Search Boolean Box and check the Count Documents box. The technique used in March of +* on the Simple Search was returning numbers in the area of 250 million in May. Note the large difference in results when the Count documents is checked as opposed to when it is not. These numbers have been varying greatly in the past month even from hour to hour.

So why these discrepancies between claimed size and these estimates? There are several factors to consider which may explain these results beyond the limit of basing the estimates on a small number of searches and on Northern Light's reported numbers.

The Inktomi-based search engines (HotBot, Snap, and MSN Web Search) are run on clusters of computers. According to Inktomi, at any point in time, some of the computers may be down for backup or other maintenance. Consequently, their entire database may not be searched at any point in time. My estimates thus reflect what was available to be searched at the time the searches were run.

AltaVista will time out on some searches and only deliver partial results. Since my numbers are based on actual number of hits found, that may cause AltaVista's size to be under-represented. On the other hand, if Inktomi and AltaVista do not have their full databases available to searchers, what is the use of that extra size if it is inaccessible? These estimates may well give a better sense of the size of the accessible portion of the search engine databases.

See also the previous Total Size Estimates:
May 1999
March 1999
Jan. 1999

While decisions about which Web search engine to use should not be based on size alone, this information is especially important when looking for very specific keywords, phrases, and areas of specialized interest. See also the following statistical analyses: