Search Engine Statistics: Relative Size Showdown
Data from search engine analysis run on March 4-6, 2002.
+ Google Solidly in Lead
This size showdown compared ten search engines, with MSN Search, HotBot, & iWon representing the Inktomi partners. This analysis used 25 small single word queries. Google found more total hits than any other search engine. In addition, it placed first on 23 of the 25 searches, more than any of the others. WiseNut displaced Fast for second place and found more hits than Google on two of the searches. Fast's database, represented by All the Web, came in third. Northern Light remained fourth despite being generally considered unavailable. AltaVista moved up to fifth followed by HotBot in sixth. MSN, Teoma, iWon, and Direct Hit filled out the bottom four. The following chart gives the total verified number of search results from all 25 searches. Since the exact same queries were used in August 2001, the third column gives that number for comparison.
This comparison is based on the reported number of hits from each database, with the number verified by visiting the last page of results whenever possible.
The number of records that many search engines can display is often different from the number that the search engine first reports. Results here are search engine results unclustered by site.
While this comparison is not a measure based on precision, recall, or relevance, it is an important indicator of the number of records that a searcher can find. It measures the effective database size. For earlier size showdown winners, see the links to older reports and the top three from each at the bottom of this page. Specific Database Notes
Google
includes some results (URLs) that it has not actually indexed. In addition, Google includes and indexes Adobe Acrobat PDF documents and many other file types such as Microsoft Word and PostScript documents. When it counts all the indexed pages, unindexed URLs, and other file formats, it claims over 2 billion. But as these examples show, the effective size is less, since most searchers will see very few of the unindexed URLs. For more details, see Google Database Components and Google's Unindexed URLs for more details and an example of an unindexed URL.
Google clusters results by site and will only display two pages per site with additional hits available under the [ More results from . . . ] link. The numbers used here come from using Google's follow up search which it makes available (when it finds less than 1,000 results) via a note after the last record which states: In order to show you the most relevant results, we have omitted some entries very similar to those already displayed. If you like, you can repeat the search with the omitted results included. Clicking the "repeat the search" option resulted in unclustered listings. WiseNut claims over 1.5 billion records. While it clusters by site, that feature was turned off via the preferences for this comparison. WiseNut has grown since the August comparison. Only 300 results could be displayed, so on the three searches where a higher number was reported, the reported number was used. AllTheWeb uses the Fast database, as does Lycos. The AllTheWeb advanced search with site clustering and offensive content filter turned off was used for this comparison, but the results are almost exactly the same on Lycos. Northern Light announced in January that it "will no longer be providing free Web search capabilities to the general public." While the free Web search at northernlight.com is gone, it is still available at NLResearch.com. That version was used for this comparison. Since Northern Light automatically recognizes and searches the English-form of word variants and plurals, only non plural terms are used. Only the Web portion of Northern Light was searched, not their Special Collection. Northern Light also clusters hits by site with no ability to disable the site clustering. The number of reported hits was used, rather than trying to verify the number under each site. Northern Light is typically fairly accurate in its counts and presents both the total number of hits and the number of sites. Surprisingly, Northern Light showed an increase since August, despite shutting down its better known free search. AltaVista showed an increase since August, a pleasant change for the struggling search engine. Since AltaVista clusters results, this analysis used the Advanced Search with the option set so that results were not clustered by site. Each search result set was checked and only the number of hits available for display was counted. HotBot uses an Inktomi database which used to pull records from the Inktomi Gigadoc database. While it also will retrieve results from other databases such as Direct Hit and the Open Directory, only Inktomi results were counted. The power search was used for this comparison with 100 hits at a time displayed. Although site clustering was turned off (with the best pages only filter), it still clustered results by site. Therefore, for this comparison, the advanced search was used, and then all the top level domains in the results were noted, and the search was re-run using the domain limitation with all found top level domains ORed together. Though tedious, this effectively turned off the site clustering to find HotBot's total number of hits available. Many times, I had to reload a page several times to get the full results set. It appears that sometimes when HotBot gives a "Sorry, your search yielded no results" message, reloading the page may bring up additional hits. MSN Search uses an Inktomi database which used to also pull records from the Inktomi Gigadoc database. While it will retrieve results from other databases such as LookSmart, RealNames, and Direct Hit, only Inktomi results were counted. The advanced search was used for this comparison, and for those searches which pulled up more than 200 results (MSN's display limit), the total number was figured by segmenting the results by adding an additional term, and then excluding the same term, and then totaling the results. MSN had a significant drop in the number of results found since August 2001. Teoma is still in beta. While Teoma said they planned to have a larger database when it launches, the results above show a decline in numbers since last August. In addition, at this point, there is no free submission to the Teoma database available. Only paid submission and crawling. It will be interesting to see if they have been able to grow the database by the next comparison. iWon no longer has an advanced search and first displays results from Overture. None of the very specific searches used here found any paid positioning hits from Overture. Only the Inktomi results on iWon were counted. However, as these numbers show, iWon only finds a relatively small number of hits from the Inktomi database. Direct Hit continues to find far fewer results than the others. Like Teoma, Direct Hit is owned by Ask Jeeves. Older Reports with Largest Three at that Time
More details on the study's methodology provide an example of the comparison process used here. See also Why does size matter?. While decisions about which Web search engine to use should not be based on size alone, this information is especially important when looking for very specific keywords, phrases, and areas of specialized interest. See also the following statistical analyses:
|
A Notess.com Web Site ©1999-2006 by Greg R. , all rights reserved |
Search Engine Showdown Greg's Writings Greg's Presentations |