| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Data from search engine analysis run on August 14, 2001.
+ Google Stays in Lead
This size showdown compared nine search engines, with MSN Search and HotBot representing the largest of the Inktomi partners. This analysis used 25 small single word queries. Google found more total hits than any other search engine. In addition, it placed first on 19 of the 25 searches, more than any of the others. Fast's database, represented by All the Web, came in second but placed first on just 1 of the 25 searches. Newcomer WiseNut ranked third. Northern Light moved up to fourth followed by HotBot in fifth. AltaVista ranked sixth, but did place first on two of the 25 searches. When analyzed using the total number of verified search results from all 25 searches, the Google database ranked first. The exact total number of hits for each of the search engines is as follows:
However, just because Google found more total hits does not mean that on individual searches it will always find more hits. On some of the 25 searches, other search engines found more than Google. In 6 of the searches, Google did not find more than others.
This comparison is based on the reported number of hits from each database, verified by visiting the last page of results whenever possible.
The number of records that many search engines can display is often different from the number that the search engine first reports. Results are not clustered by site.
While this comparison is not a measure based on precision, recall, or relevance, it is an important indicator of the number of records that a searcher can find. It measures the effective database size . For earlier size showdown winners, see the links to older reports and the top three from each at the bottom of this page. Specific Database Notes
Google
includes some results (URLs) that it has not actually indexed. When it counts all the indexed and unindexed URLs, it claims over 1.4 billion. But as these examples show, the effective size is considerably less, since most searchers will see very few of the unindexed URLs.
These URLs that have not been crawled can be readily identified by the lack of a extract and the "cached" link.
Google also clusters results by site and will only display two pages per site with additional hits available under the [ More results from . . . ] link. The numbers used here come from using Google's follow up search which it makes available (when it finds less than 1,000 results) via a note after the last record which states: In order to show you the most relevant results, we have omitted some entries very similar to those already displayed. If you like, you can repeat the search with the omitted results included. Clicking the "repeat the search" option resulted in unclustered listings. Google also has started indexing PDF files, unlike all the other search engines compared here. Thus, the total numbers for Google include a few unindexed URLs and some PDF files in addition to the fully indexed Web pages of other search engines. Fast is available at several sites, most notable All the Web and Lycos. All the Web, with site clustering turned off, was used for this comparison, but the results are almost exactly the same on Lycos. WiseNut is a new search engine which claims over 1.4 billion records. While it clusters by site, that feature was turned off via the preferences for this comparison. WiseNut showed its age in that some searches included exact duplicates (including the URL). The image to the right is an example of the duplicates found. The total number of results was not adjusted for these duplicates, so it is disproportionately high. Only 400 results could be displayed, so on the three searches where a higher number was reported, the reported number was used. Northern Light automatically recognize and search the English-form of word variants and plurals. For that reason, only non plural terms are used. Only the Web portion of Northern Light was searched, not their Special Collection. Northern Light also clusters hits by site with no ability to disable the site clustering. The number of reported hits was used, rather than trying to verify the number under each site. Northern Light is typically fairly accurate in its counts and presents both the total number of hits and the number of sites. HotBot uses an Inktomi database which pulls records from the Inktomi Gigadoc database. While it also will retrieve results from other databases such as Direct Hit and the Open Directory, all of the searches used for this study appeared to only retrieve Inktomi results. The power search was used for this comparison with 100 hits at a time displayed and site clustering turned off (best pages only filter). AltaVista clusters results, but this analysis used the Advanced Search with the option set so that results were not clustered by site. Each search result set was checked and only the number of hits available for display was counted. Since the advanced search can only display the first 1,000 results, none of the search terms found more than that number. MSN Search uses an Inktomi database which pulls records from the Inktomi Gigadoc database. While it also will retrieve results from other databases such as LookSmart, RealNames, and Direct Hit, all of the searches used for this study appeared to only retrieve Inktomi results. The advanced search was used for this comparison, and for those searches which pulled up more than 200 results (MSN's display limit), the total number was figured by segmenting the results by domains (using AND domain:com then NOT domain:com for example) and then totaling the results. Teoma is included for the first time in these studies. It is still in beta, and Teoma says they plan to have a larger database when it launches. Only 200 results could be displayed, so on the three searches where a higher number was reported, the reported number was used. Direct Hit continues to find far fewer results than the others. On one search, Direct Hit found many more results, but they did not include the search term, so they were given a zero for that search.
Search Engines Not IncludediWon now primarily uses results from GoTo. The advanced search page is gone, and only a few results can be found from the Inktomi database. Excite clusters by site and provides no capability for searching all languages simultaneously (it defaults to English only). Due to the impossibility of combining all the language records in one search and the difficulty in unclustering every domain report, Excite was not included in this analysis. NBCi is now entirely using results from GoTo, so it was not included.
More details on the study's methodology provide an example of the comparison process used here. See also Why does size matter?. While decisions about which Web search engine to use should not be based on size alone, this information is especially important when looking for very specific keywords, phrases, and areas of specialized interest. See also the following statistical analyses:
|
A Notess.com Web Site ©1999-2023 by Greg R. Notess, all rights reserved |
Search Engine Showdown Greg's Writings Greg's Presentations |
A Notess.com Web Site ©1999-2023 by Greg R. Notess, all rights reserved |
Search Engine Showdown Greg's Writings Greg's Presentations |