Fast 300 Million Special Supplement Report
by Greg R. Notess | ||||||
On January 12, 2000, Fast Search and Transfer, Inc. launched a new 300 million URL database at the All the Web search site. This special report is based on data from a comparison run that day, comparing the sizes of the three largest Web search engines based on an analysis of unique queries and their verified displayed results. The results of this study show that the Fast database has indeed grown significantly with this new version of its database in comparison with the other two search engines. On almost all searches used for this study Fast finds more hits than either AltaVista of Northern Light. Fast finds a greater total number of results for the combined 25 searches: 31% more than Northern Light and 33% more than AltaVista. In addition, the older version of the Fast database claimed 200 million records while the new one claims 300 million. A comparison of the results from previous Search Engine Showdown size analyses to the results for the same searches on the new databases verifies the significant database size increase (48% based on that comparison). Methodology and Specific ResultsSee separate pages for methodology, query terms, and detailed results. Relative Size ResultsTotal Verified Results from 25 searches:
Most first place finishes: Out of the 25 searches, Fast found more hits than either Northern Light or AltaVista 20 times. AltaVista retrieved the most hits on three searches. Northern Light pulled up more than either Fast or AltaVista on two of the 25 searches. Estimated total sizeWith the availability of precise total numbers of records in the database available from both Fast and Northern Light, an estimated range for the total size of the databases can be calculated. The method for estimating the total size uses the ratio between the total number of hits found by each search engine for the 25 searches, and then using the precise total database size numbers.
Fast provided me with a technique (which I am not permitted to disclose) which gives an exact count of records in their database. On the day of the comparison, it was 311,936,470. Northern Light also has a similar technique involving an OR NOT operation and limiting results to the Web only.
For example,
The disparity between the figures demonstrates the limitations of this method of estimation and a broad margin of error. They should be considered only very general estimates. Percentages and final counts rounded (percentages calculated to 15 places after the decimal for determining total estimate.) See detailed results for more information on the calculations. Commentary and Analysis:This report confirms Fast's claims of a greatly enlarged database of Web pages used by the All the Web search engine. The study shows a significant increase in size since the previous versions and a considerable advance in size over Fast's closest competitors (in terms of search engine database size): AltaVista and Northern Light. See Why Size Matters. Fast has stated its objective to continuing growing its database and is aiming to deliver a 400 million record database in April 2000. In the past, the Fast database at All the Web decreased in size after a new major size increase was announced since they were further revising the database by removing duplicates. To produce their current 300 million record database, Fast states that they crawled 700 million pages and then removed duplicates, spam, and some pornography. So the bulk of duplicate removal has been done before this version of the database was made publicly available. Fast also states that while they have not run a refresh spider in the past, they now will be doing that on a two week cycle. This should help to keep the Fast database much more current than it has been in the past. If Fast can deliver growth to 400 million records in April and continue that growth beyond then, it is well positioned to maintain the title of largest search engine and provide a much higher comparative target for the other search engines to reach. CautionsThe numbers reported here are not a measure based on precision, recall, or relevance but only on the raw database size as measured by the actual results that a search engine can deliver. This report does not factor in dead links, duplication, overlap, or evaluate the actual presence of search terms within the results. Since the Web and the search engine databases change constantly, this study only accurately reflects the database output on the day the searches were run. AltaVista's database may well be larger, but since their search engine will time out and give partial results first, this study reflects the hits that would actually be found by a person doing the specified search on the day of the study. This independent contract report by Notess.com Consulting has been prepared for and funded by Fast Search and Transfer, Inc. It uses the exact same techniques and methodology as used in the regular Search Engine Showdown comparison. The 25 search terms were all different than the usual ones but were chosen as outlined above without input from Fast. The funding covered running this special study and comparison of the three largest search engines in conjunction with the launch of their Fast's database. |
A Notess.com Web Site ©1999-2023 by Greg R. Notess, all rights reserved |
Search Engine Showdown Greg's Writings Greg's Presentations |