Search Engine Showdown
[an error occurred while processing this directive]

Showdown News Vol. 2 No. 2

= = = = = = = = = = = = = = = = = = = = = = = = = =
SHOWDOWN NEWS
The Search Engine Showdown Online Newsletter
Feb. 24, 2000 Vol. 2 No. 2
By Greg R. Notess, Search Engine Showdown
http://SearchEngineShowdown.com
= = = = = = = = = = = = = = = = = = = = = = = = = =

    Updated search engine statistics
    Fewer dead links
    Left-hand truncation
    More (and less) cache options

UPDATED SEARCH ENGINE SHOWDOWN STATISTICS

New size, change, and dead link statistics available. Fast continues to be the largest of the search engines. Northern Light and AltaVista also remain in second and third places respectively. Excite and Google both showed gains and rank four and five, ahead of all of the Inktomi-based search engines. This iteration of the size comparisons used a combination of new terms and old, and showed an even greater variation in search engine sizes.

With 25 unique, single word queries, the total number of hits retrieved ranged from WebCrawler's 191 hits (yes, that was their total hits for all 25 searches) to Fast's 9881. The Fast database retrieved the greatest number of hits on 18 of the 25 searches, while AltaVista found the most on five of them, Northern Light on two, and Excite on one. If you add that up, you'll see there was a tie for first place on one search.

The total size estimates now display a range, since I can get exact database size counts from both Fast and Northern Light. The range in millions shows the two figures when both exact counts are used. These estimate Fast in the 300 million range, Northern Light and AltaVista in the 200 million range; Excite and Google in the 130 to 150 million range; and the Inktomi databases in the 30 to 80 million range. For perspective, the WebCrawler estimate is only five to six million.
<http://www.searchengineshowdown.com/stats/>

FEWER DEAD LINKS

On the dead links side, this analysis showed some significant improvements. Of the eight search engines compared using a total of 300 records, most had less than 10% dead and Fast and the three Inktomi databases had less than 3% dead links. AltaVista fared the worst at 13.7% while Excite was second worst at 8.7%.

Looking at database coverage over time, Northern Light continues its steady growth and was joined by Fast, Northern Light, AltaVista, Excite, Google, and MSN demonstrating growth since the November comparisons. Fast did show a slight decrease since January when it first launched its 300 million plus database. AOL, Yahoo, Infoseek, iWon, HotBot, and Anzwers all found fewer hits than they did on the same seven searches last November with iWon showing the biggest decline.
<http://www.searchengineshowdown.com/stats/>

The largest search engines certainly continue to make strides towards indexing more and more of the public Web. With estimates of over one and a half billion pages on the public, indexable Web, there is still a great deal that is not yet covered. At least the search engine databases are not remaining static. It is also heartening to see the decrease in dead links. While they are not yet keeping up with the whole Web, at least they are getting better at keeping their databases fresher.

Stay tuned for more comparisons. The overlap and unique analyses should be updated by next week on the site. It should be interesting to see if overlap has increased now that the largest search engines are including more Web pages.

WEBMAP

Back in January, Inktomi announced its WebMap project. Their WebMap database has over one billion documents. However, Inktomi does not make its full WebMap database available to its partners or in any other publicly searchable form. Instead they cull about 100 million records from it which is then made available to their partners. Since their partners can choose different options and parameters, this seems to explain why the actual search results from Inktomi databases appears to represent searchable databases that are smaller than 100 million.
<http://www.inktomi.com/webmap/>

It is intriguing to note that three of the large search engines crawl many more pages than they make available via their search engine. Inktomi has over one billion, of which only about a tenth is in their searchable database. Fast claimed it spidered 700 million pages to build an index of 300 million. Excite claims to have crawled 900 million to produce a database of 250 million (although my estimate puts them only at about 140 million).

Whatever their true size, there is no question that they are excluding a hundreds of millions of pages. While many of these are presumably duplicates or search engine spam, that still seems an incredibly high number to exclude.

LEFT-HAND TRUNCATION

Ever try searching for a term, such as a chemical name, where you would like to put the wildcard symbol at the beginning of a search term? Now you can. Sometime in the past several months, some of the Inktomi search engines quietly introduced this left-hand truncation capability. Currently, HotBot, Snap, and Anzwers all support using the asterisk * for searching anywhere in the search term. A search term can even have more than one asterisk.

The truncation symbol can be used at the beginning of a term, anywhere in the middle, or at the end. It can represent any number of characters (including zero). For example, a search on *siloxane* can now find polydimethylsiloxane, thyldisiloxane, dimethylsiloxanes, and polyvinylmethylsiloxane all at the same time. While you may not have any particular search that requires using this, keep it in mind for the rare search that can benefit from the trick. It opens up some interesting possibilities.

ALTAVISTA RELATED SEARCHES

AltaVista has added the ability to find related pages. At the end of some of the search results, a "Related pages" link is available. Clicking on that link will search for additional hits similar to the one chosen. Searchers can even use this feature direct from the search box by using the new field search of "like:" which can be used on both the advanced and simple searches. It behaves a bit differently than other field searches in that it cannot be combined with additional terms. If a like:[URL] search is combined with other terms, the additional words will just be ignored.

To use like: in the advanced search, it must be in the "Sort by" box. It does not work if it is put in the advanced search Boolean box. To try it, put a complete URL after the like: field. Well, the http:// is optional, but the rest should be complete. For example, try a search such as
like:www.sla.org

ALTAVISTA SEARCH CENTERS

AltaVista separated its multimedia search into three distinct tabs, now known as Search Centers: Images, Video, and MP3/Audio. Each search center has a separate page with its own portal-style content. They also have new search features including the ability to limit by collection. The information and links below the search box are now specific to the search area. For example, on the Images Search Center, there are picture-related directory sections, image discussion groups, a special Image Toolkit, and shopping links to camcorders, scanners, and cameras.
<http://doc.altavista.com/company_info/press/pr020700.shtml>

In addition, their Advanced Search now links to a beta version of an Advanced Search Center. Like the other Search Centers, this has information and links aimed at a specific audience: librarians, researchers, and Web masters. The rest of the search features remain the same. However it does offer one small, but very convenient change. In the current Advanced Search, when you enter something in the Boolean box, you have to click the search button to run the search. In the beta Advanced Search Center there is still a button, but you can also just press the Enter key after typing the search.
<http://jump.altavista.com/as_prev>

MSN PAID SPONSORSHIPS

AltaVista tried it. GoTo lives on it. And now Microsoft will tried some paid positioning on its MSN Search. Like AltaVista's abortive attempt, the paid keyword placement will be differentiated from the regular search results. The prototype shows them in the left-hand margin with other advertisements under the heading of Sponsors. It is not supposed to affect the ranking of the regular results.

One rather intriguing tool that Microsoft has made available along with information about their proposal is their "How much does it cost for Keyword listing on MSN Search?" search box. Put a term or phrase in there, and it reports on how many times that query was searched last month, as well as the current monthly and daily bids. This is one of the first opportunities to see how frequently specific words and phrases are searched, or at least were searched last month at MSN.
<http://keywords.bcentral.com/>

MORE (AND LESS) CACHE OPTIONS

Google has changed the way it displays its cached pages. The heading now is more explanatory, but it also no longer includes the date information that the page reported at the time that Google's spider indexed it and added it to their database. Meanwhile, MaxBot.com now offers SearchEdu.com, SearchGov.com, and SearchMil.com. All three of these search engines only index pages from one specific top level domain (edu, gov, and mil, of course). They also offer cached versions of the page similar to the way Google does. This provides searchers with another opportunity for finding dead pages or looking at previous incarnations of existing pages.
<http://www.maxbot.com/>

RECENT SEARCH ARTICLES

Greg R. Notess. "Search Engine Inconsistencies." Online 24(2): 66-68, Mar.-Apr. 2000.
This articles demonstrates and discusses many of the inconsistencies that occur with the search engines. It is a useful supplement to the information on the Search Engine Showdown Inconsistencies page.
<http://www.onlinemag.net/OL2000/net3.html>
<http://SearchEngineShowdown.com/inconsistent.shtml>

Greg R. Notess. "PubScience: Evolution or Devolution?" EContent 23(1): 64-67, Feb.-Mar. 2000.
For librarians and researchers in the physical sciences, this article discusses OSTI's new PubScience and compares it to the Energy Science and Technology bibliographic database.
<http://www.ecmag.net/EC2000/web2.html>

Greg R. Notess. "Internet Search Engine Update." from the March 2000 Online is available online.
For more search engine news and changes during December and early January.
<http://www.onlinemag.net/OL2000/engine3.html>

= = = = = = = = = = = = = = = = = = = = = = = = = =
Showdown News: copyright © 2000, Greg R. Notess
This may be forwarded to others if it is forwarded
in its entirety including this copyright notice.

= = = = = = = = = = = = = = = = = = = = = = = = = =
For more information about Showdown News,
including subscription information, see
http://searchengineshowdown.com/lists/news.shtml
Questions or problems? mailto:greg@notess.com
= = = = = = = = = = = = = = = = = = = = = = = = = =