| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Data from search engine analysis run on April 7, 2001.
+ Google Finds Most, Most Often
Well, after far too long, I have finally updated this comparative size showdown. This showdown compared seven of the largest search engines, with MSN Search and iWon's advanced search representing the largest of the Inktomi partners. This analysis used 25 small single word queries. Google found more total hits than any other search engine. In addition, it placed first on 13 of the 25 searches, more than any of the others. Fast's database, represented by All the Web, came in second and placed first on 4 of the 25 searches. MSN Search ranked a very close third, moving Inktomi up into third place and making MSN the largest of the many Inktomi partners, surpassing iWon which formerly held the top Inktomi spot. AltaVista moved up to fourth followed by Northern Light, which slipped to fifth place. When analyzed using the total number of verified search results from all 25 searches, the Google database ranked first. The exact total number of hits for each of the search engines is as follows:
However, just because Google found more total hits does not mean that on individual searches it will always find more hits. On some of the 25 searches, other search engines found more than Google. In 11 of the searches, Google did not find more than others.
This comparison is based on the reported number of hits from each database, verified by visiting the last page of results whenever possible.
The number of records that many search engines can display is often different from the number that the search engine first reports. Results are not clustered by site.
While this comparison is not a measure based on precision, recall, or relevance, it is an important indicator of the number of records that a searcher can find. It measures the effective database size . For earlier size showdown winners, see the links to older reports and the top three from each at the bottom of this page.
Specific Database Notes
Google
includes some results (URLs) that it has not actually indexed. When it counts all the indexed and unindexed URLs, it claims over one billion. But as these examples show, the effective size is considerably less, since most searchers will see very few of the unindexed URLs.
These URLs that have not been crawled can be readily identified by the lack of a extract and the "cached" link.
Google also clusters results by site and will only display two pages per site with additional hits available under the [ More results from . . . ] link. The numbers used here come from using Google's follow up search which it makes available (when it finds less than 1,000 results) via a note after the last record which states: In order to show you the most relevant results, we have omitted some entries very similar to those already displayed. If you like, you can repeat the search with the omitted results included. Clicking the "repeat the search" option resulted in unclustered listings. Google also has started indexing PDF files, unlike all the other search engines compared here. Thus, the total numbers for Google include a few unindexed URLs and some PDF files in addition to the fully indexed Web pages of other search engines. Fast is available at several sites, most notable All the Web and Lycos. All the Web was used for this comparison, but the results are almost exactly the same on Lycos. MSN Search uses an Inktomi database which pulls records from the Inktomi GEN3 database. While it also will retrieve results from other databases such as LookSmart, RealNames, and Direct Hit, all of the searches used for this study appeared to only retrieve Inktomi results. The advanced search was used for this comparison, and for those searches which pulled up more than 200 results (MSN's display limit), the total number was figured by segmenting the results by domains (using AND domain:com then NOT domain:com for example) and then totaling the results. AltaVista clusters results, but this analysis used the Advanced Search with the option set so that results were not clustered by site. Each search result set was checked and only the number of hits available for display was counted. Since the advanced search can only display the first 1,000 results, none of the search terms found more than that number. Northern Light automatically recognize and search the English-form of word variants and plurals. For that reason, only non plural terms are used. Only the Web portion of Northern Light was searched, not their Special Collection. Northern Light also clusters hits by site with no ability to disable the site clustering. The number of reported hits was used, rather than trying to verify the number under each site. Northern Light is typically fairly accurate in its counts and presents both the total number of hits and the number of sites. iWon Advanced Search uses an Inktomi database which also pulls records from the Inktomi GEN3 database. On the basic iWon search, only one page per Web site is shown. Since the Advanced Search shows all pages, unclustered by site, it was used for this analysis. Excite provides no capability for searching all languages simultaneously (it defaults to English only). So, in essence, Excite separates the different language records into their own databases, and non-English language pages can only be searched by specifically selecting them in the Advanced Search. Due to the impossibility of combining all the records, these numbers only reflect the size of Excite's largest database segment and the one commonly searched: English-language pages. Past size showdowns used the total number of pages from all languages. While a few more pages can be found in other languages, especially for some of the search terms used, most searchers will not have the patience to try the search in all the different languages. Other Search Engines were not included in this study, but only those listed above. More details on the study's methodology provide an example of the comparison process used here.
While decisions about which Web search engine to use should not be based on size alone, this information is especially important when looking for very specific keywords, phrases, and areas of specialized interest. See also the following statistical analyses:
|
A Notess.com Web Site ©1999-2023 by Greg R. Notess, all rights reserved |
Search Engine Showdown Greg's Writings Greg's Presentations |