Showdown News Vol. 2 No. 2
|
= = = = = = = = = = = = = = = = = = = = = = = = = =
UPDATED SEARCH ENGINE SHOWDOWN STATISTICS New size, change, and dead link statistics available. Fast continues to be the largest of the search engines. Northern Light and AltaVista also remain in second and third places respectively. Excite and Google both showed gains and rank four and five, ahead of all of the Inktomi-based search engines. This iteration of the size comparisons used a combination of new terms and old, and showed an even greater variation in search engine sizes. With 25 unique, single word queries, the total number of hits retrieved ranged from WebCrawler's 191 hits (yes, that was their total hits for all 25 searches) to Fast's 9881. The Fast database retrieved the greatest number of hits on 18 of the 25 searches, while AltaVista found the most on five of them, Northern Light on two, and Excite on one. If you add that up, you'll see there was a tie for first place on one search. The total size estimates now display a range, since I can get exact database size counts from both Fast and Northern Light. The range in millions shows the two figures when both exact counts
are used. These estimate Fast in the 300 million range, Northern Light and AltaVista in the 200 million range; Excite and Google in the 130 to 150 million range; and the Inktomi databases in the 30
to 80 million range. For perspective, the WebCrawler estimate is only five to six million.
FEWER DEAD LINKS On the dead links side, this analysis showed some significant improvements. Of the eight search engines compared using a total of 300 records, most had less than 10% dead and Fast and the three Inktomi databases had less than 3% dead links. AltaVista fared the worst at 13.7% while Excite was second worst at 8.7%. Looking at database coverage over time, Northern Light continues its steady growth and was joined by Fast, Northern Light, AltaVista, Excite, Google, and MSN demonstrating growth since the November comparisons. Fast did show a slight decrease since January when it first launched its 300 million plus database. AOL, Yahoo, Infoseek, iWon, HotBot, and Anzwers all found fewer hits than
they did on the same seven searches last November with iWon showing the biggest decline.
The largest search engines certainly continue to make strides towards indexing more and more of the public Web. With estimates of over one and a half billion pages on the public, indexable Web, there is still a great deal that is not yet covered. At least the search engine databases are not remaining static. It is also heartening to see the decrease in dead links. While they are not yet keeping up with the whole Web, at least they are getting better at keeping their databases fresher. Stay tuned for more comparisons. The overlap and unique analyses should be updated by next week on the site. It should be interesting to see if overlap has increased now that the largest search engines are including more Web pages. WEBMAP Back in January, Inktomi announced its WebMap project. Their WebMap database has over one billion documents. However, Inktomi does not make its full WebMap database available to its partners or
in any other publicly searchable form. Instead they cull about 100 million records from it which is then made available to their partners. Since their partners can choose different options and
parameters, this seems to explain why the actual search results from Inktomi databases appears to represent searchable databases that are smaller than 100 million.
It is intriguing to note that three of the large search engines crawl many more pages than they make available via their search engine. Inktomi has over one billion, of which only about a tenth is in their searchable database. Fast claimed it spidered 700 million pages to build an index of 300 million. Excite claims to have crawled 900 million to produce a database of 250 million (although my estimate puts them only at about 140 million). Whatever their true size, there is no question that they are excluding a hundreds of millions of pages. While many of these are presumably duplicates or search engine spam, that still seems an incredibly high number to exclude. LEFT-HAND TRUNCATION Ever try searching for a term, such as a chemical name, where you would like to put the wildcard symbol at the beginning of a search term? Now you can. Sometime in the past several months, some of the Inktomi search engines quietly introduced this left-hand truncation capability. Currently, HotBot, Snap, and Anzwers all support using the asterisk * for searching anywhere in the search term. A search term can even have more than one asterisk. The truncation symbol can be used at the beginning of a term, anywhere in the middle, or at the end. It can represent any number of characters (including zero). For example, a search on ALTAVISTA RELATED SEARCHES AltaVista has added the ability to find related pages. At the end of some of the search results, a "Related pages" link is available. Clicking on that link will search for additional hits
similar to the one chosen. Searchers can even use this feature direct from the search box by using the new field search of "like:" which can be used on both the advanced and simple searches. It
behaves a bit differently than other field searches in that it cannot be combined with additional terms. If a To use like: in the advanced search, it must be in the "Sort by" box. It does not work if it is put in the advanced search Boolean box. To try it, put a complete URL after the like: field.
Well, the http:// is optional, but the rest should be complete. For example, try a search such as ALTAVISTA SEARCH CENTERS AltaVista separated its multimedia search into three distinct tabs, now known as Search Centers: Images, Video, and MP3/Audio. Each search center has a separate page with its own portal-style
content. They also have new search features including the ability to limit by collection. The information and links below the search box are now specific to the search area. For example, on the
Images Search Center, there are picture-related directory sections, image discussion groups, a special Image Toolkit, and shopping links to camcorders, scanners, and cameras.
In addition, their Advanced Search now links to a beta version of an Advanced Search Center. Like the other Search Centers, this has information and links aimed at a specific audience:
librarians, researchers, and Web masters. The rest of the search features remain the same. However it does offer one small, but very convenient change. In the current Advanced Search, when you enter
something in the Boolean box, you have to click the search button to run the search. In the beta Advanced Search Center there is still a button, but you can also just press the Enter key after typing
the search.
MSN PAID SPONSORSHIPS AltaVista tried it. GoTo lives on it. And now Microsoft will tried some paid positioning on its MSN Search. Like AltaVista's abortive attempt, the paid keyword placement will be differentiated from the regular search results. The prototype shows them in the left-hand margin with other advertisements under the heading of Sponsors. It is not supposed to affect the ranking of the regular results. One rather intriguing tool that Microsoft has made available along with information about their proposal is their "How much does it cost for Keyword listing on MSN Search?" search box. Put a
term or phrase in there, and it reports on how many times that query was searched last month, as well as the current monthly and daily bids. This is one of the first opportunities to see how
frequently specific words and phrases are searched, or at least were searched last month at MSN.
MORE (AND LESS) CACHE OPTIONS Google has changed the way it displays its cached pages. The heading now is more explanatory, but it also no longer includes the date information that the page reported at the time that
Google's spider indexed it and added it to their database. Meanwhile, MaxBot.com now offers SearchEdu.com, SearchGov.com, and SearchMil.com. All three of these search engines only index pages from
one specific top level domain (edu, gov, and mil, of course). They also offer cached versions of the page similar to the way Google does. This provides searchers with another opportunity for finding
dead pages or looking at previous incarnations of existing pages.
RECENT SEARCH ARTICLES Greg R. Notess. "Search Engine Inconsistencies." Online 24(2): 66-68, Mar.-Apr. 2000. Greg R. Notess. "PubScience: Evolution or Devolution?" EContent 23(1): 64-67, Feb.-Mar. 2000.
Greg R. Notess. "Internet Search Engine Update." from the March 2000 Online is available online.
= = = = = = = = = = = = = = = = = = = = = = = = = =
= = = = = = = = = = = = = = = = = = = = = = = = = = |
A Notess.com Web Site ©1999-2023 by Greg R. Notess, all rights reserved |
Search Engine Showdown Greg's Writings Greg's Presentations |