Greg R. Notess |
ON THE NET
The Many Faces of Inktomi
DATABASE, April 1999 |
In the bibliographic world, a database is known by its name and generally includes the same content regardless of the search platform or vendor. Unfortunately, the world of bibliographic databases rarely predicts how databases are delivered on the Net. Inktomi is an interesting example. The Inktomi database underlies several Web search engines, leading many people to assume that it is the same database, no matter which of the Inktomi partners are used to search it. Unfortunately, assumptions and the Internet do not always mix well. The scope and content of the various Inktomi databases actually demonstrate that different, albeit similar, databases are made available through the Inktomi partners. Not only are more results found on some Inktomi partners than on others, but different results will be found as well.
When Inktomi went commercial, it took a different approach from Lycos and most other search engine companies. Instead of marketing its database and search software itself, Inktomi packages the database and search engine as a product that is sold as an unbranded database to customers, such as Yahoo!, Microsoft, and Snap!. Inktomi's first customer was Wired Digital's HotBot. For a long time (as the Net measures time), HotBot was the sole and only access to the Inktomi database. However, as its recent addition of new clients shows, Inktomi's database is now available via several other companies. Inktomi sells other Internet-related products, such as a shopping search engine and its traffic server, but this column will examine only its "search engine" product.
Yahoo! switched from using AltaVista as its back-end search engine to Inktomi in July 1998. When searching Yahoo! and no results are found with Yahoo!'s own directory, the search defaults to hits from Inktomi. In addition, when hits are found in Yahoo!'s directory, the Inktomi results will follow or are directly available by clicking on the Web Pages category. Snap!, the directory, search engine, and portal from CNET backed by the media might of NBC, uses Inktomi for its search engine. With an approach similar to Yahoo!, Snap! defaults to an Inktomi search whenever the Snap! directory fails to find a hit. In addition, by choosing Snap!'s Advanced Search, users can skip the directory part and just search Snap!'s version of Inktomi directly.
On another front, Microsoft uses the Inktomi database and search engine on the Microsoft Network (MSN) Web Search page (http://search.msn.com/). On the general MSN search page, Microsoft still points to Infoseek, AltaVista, Lycos, and GoTo, in addition to its own search service which is listed as MSN Web Search. Interestingly enough, one of the services that Microsoft points to is another Inktomi client, GoTo, best known for its practice of selling placement of search results. Any site that has paid for positioning on a certain search term shows up first, but the rest of the results are from GoTo's version of Inktomi. Inktomi has several other clients, some of whom use smaller versions of its database while others provide a national presence. These include Aeneid, GeoCities, N2H2, Goo, Anzwers, Canada.com, UKMax, and RadarUOL.
On most searches, HotBot and Snap! found the largest number of hits .... |
Inktomi, on the other hand, offers a database of indexed Web pages. It includes over 100 million of them and, like all databases of Web pages, the entire content must be continually refreshed. Since Inktomi hosts the database for all of its partners on Inktomi's own hardware and at its site, the assumption is that each Inktomi client will search the same database. If this duplication of databases were true, there would be no need to search more than one Inktomi partner. However, the databases are not the same. While they do not all report the total number of hits or even count hits exactly the same, you will see some rather large disparities in the number of hits from Yahoo! to HotBot and between MSN and Snap!. Each Inktomi databases has a large share in common, but they are different databases in several ways.
Inktomi itself encourages its clients to differentiate their databases from those of other Inktomi customers. Some of the differences are more obvious than others. GoTo and Snap! are both supposed to include additional records in their versions of the Inktomi database. GoTo adds records for the Web pages of its paying customers. Snap! adds records for the pages listed in its directory. Even though Inktomi hosts all these databases on its servers, each of its customers can still have their own customized version. While the majority of the records in each database may be shared by the others, do not be surprised at finding some unique items in each.
... the Geocities site is so large, a typical Web spider cannot crawl the entire site without putting a rather high load on the system. |
Bear in mind, however, that HotBot's recent change in how it reports the number of hits makes for some unique challenges. HotBot groups pages from the same host together. Under each hit is a link to "See results from this site only," even if there is only one from that host. Unlike Infoseek, HotBot offers no way to ungroup the search results. This causes two different methods of reporting numbers on HotBot. The results of a small search report the number of sites that have hits, but to get the true number of pages, each of the "See results from this site only" links must be checked for additional hits. On larger searches, HotBot always reports the number of hits as a multiple of ten. This is closer to the actual number of records found, but is only an estimate.
Just because both GoTo and Snap! found 39 hits on "foillan" does not mean that they are the exact same 39 hits. Actually, GoTo listed 39 hits, but 15 of them were duplicates. Snap! had 39 unique hits, including several not listed by GoTo. At least the 39 hits from Snap! and the 40 from HotBot should be almost the same, right? No, only 27 of Snap!'s 39 showed up on HotBot. And what about MSN and Yahoo!? More confusion. While far more unique hits showed up on Snap! and HotBot, both MSN and Yahoo! had some unique hits on "foillan" that were not found using the other Inktomi incarnations.
Running the same search on the Australian Anzwers and on Canada.com finally did result in identical retrieval sets, even sorted in the same order. Comparing those results to the other search engines showed that, while the grouping was different on HotBot, the individual hits were the same. Why does this happen? Part of the answer is that Inktomi runs several clusters of computers and different databases use different clusters. Inktomi plans to have all the databases running on the same cluster. When this happens, the partner databases should become much more similar
Snap!'s Advanced Search comes in as a close second, leaving out only the personal page depth limit and the continental breakdowns (although the word stemming option is listed as "all forms of the word" in the initial drop-down menu rather than as a separate choice). Anzwers has almost as many search features as Snap!, but defaults to searching Australian sites only. It does not have the language limit, word stemming, or a URL-only display.
The MSN Web Search leaves out the page depth limit, searches for a person, the Swedish language limit, word stemming, and a scripted date range. Canada.com's SuperSearch also does not offer many of the advanced options, such as the title words search, language limits, domain limits, and several media types. GeoCities offers only language and page content limits in its Advanced Search. Yahoo! and GoTo offer none of the advanced searching capabilities via a scripted advanced search screen.
For general searching, Snap!, Anzwers, HotBot, and Canada.com offer the most search features and the largest Inktomi databases. If you have not used Snap! as a search engine before, try bookmarking the advanced search screen. It is similar to how HotBot used to work. Also interesting are the national search engines: Anzwers from Australia and Canada. com. For the comprehensive searcher, be sure to try the GeoCities side of its Inktomi search for a few extra pages.
Inktomi has been the behind the scenes player in the world of Web search engines. For the information professional, it is important to be aware of the differences between the various Inktomi partners' products. It can also be very helpful when one version is unavailable to know that other, albeit different, versions of Inktomi's database can still be searched.
Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, Bozeman, MT 59717-0332; 406/994-6563; greg@notess.com ; http://www.notess.com.
Copyright © 1999, Online Inc. All rights reserved.