In the last issue of _ONLINE_, On the Nets compared search techniques and capabilities of different indexes for the World-Wide Web [1]. As the high-quality information resources available on the Internet continue to multiply, the need for indexes to these resources grows apace. While none of the indexes reviewed offer the kind of sophisticated Boolean and field searching that is standard on CD-ROM and online databases, even the best retrieval system will be useless if the information in the database is inaccurate or incomplete. The ease with which Web pages can be brought online, changed or deleted means that no index can be either comprehensive or entirely accurate. Yet despite their weaknesses, the available indexes to the World-Wide Web and other Internet resources fulfill an important role in information retrieval on the Internet. The best of the indexes, including archie, veronica, Lycos, and WebCrawler, have been developed by researchers or academicians and freely disseminated on the Net. These databases of Net resources have been highly used but are beset with system overloads and database maintenance problems. Can the commercial sector find a more efficient and effective way to provide such databases? With the advent of the commercial InfoSeek service, users can decide for themselves. Businesses are scrambling to find the magic combination for successful Internet marketing. InfoSeek demonstrates great Net savvy in offering useful free services and an attractive pricing structure. InfoSeek is a budding online databank, aiming to tap into the Internet market. It gives subscribers access to full-text and bibliographic databases. Anyone can search the InfoSeek WWW Pages database, but only subscribers can see more than the first ten hits. By combining limited free access and reputable databases, InfoSeek makes an aggressive play for the Internet online market. INFOSEEK To what does this Santa Clara, California company aspire? "InfoSeek is a new full-text search service that makes finding information easy. You can search WWW pages, Usenet News, over 50 computer magazines, newspaper newswires and press releases, company profiles, movie reviews, technical support databases, and much more" [2]. In answer to the question whether InfoSeek is cheaper than DIALOG and CompuServe, the online documentation states that "in most cases, InfoSeek is the lowest cost information search and retrieval service available." InfoSeek appears to be aiming for the Internet-and computer-user market, combining computer, news and business databases with an easy interface and competitive pricing. The databases available for searching from this new databank are a modest selection of standard commercial offerings combined with some unique Internet databases: WWW Pages, Usenet News, Wire Services, Cineman Reviews, Computer Select, MDX Health Digest, full text of ComputerWorld and InfoWorld, and two of the Hoover Business databases. Figure 1 shows the InfoSeek search screen with the list of available databases. The "Wire Services" heading includes AP Online, BusinessWire, PR Newswire, Newsbytes News Network, and Reuters Business Report. InfoSeek has also stated that it plans to add Medline and unnamed databases in business, finance, health, sports and national news within the next six months. PRICING InfoSeek uses a transaction fee-based pricing. Each search request counts as one transaction and each document retrieval request counts as one transaction. Transaction charges range from $0.10 to $0.20, depending on which of the three subscription plans is chosen. The standard plan costs $9.95/month and includes 100 transactions, with each additional transaction costing $0.10. The light use plan costs $1.95/month and includes ten transactions, with each additional transaction costing $0.15. The occasional plan has no monthly charge but transactions cost $0.20 each. No other per-minute or per-record charges apply, except for the premium collections that have additional access charges. Site license discounts are also available. In what should prove to be a very effective marketing move, InfoSeek supplements their commercial offerings with several free services. Their WWW Pages database is available free of charge via Netscape's Internet Search page at http://home.netscape.com/home/internet-search.html. Searching is free, but the display is limited to a maximum of only ten references. (Registered users can display up to 200 per transaction.) InfoSeek has an extensive and well-organized Frequently Asked Questions about InfoSeek file. The FAQ is available to everyone for free searching at http:// www.infoseek.com/FAQQuery. In addition to the free search of the WWW Pages databases and the free searching of their FAQ, InfoSeek also gives new users a free trial run. The demonstration account lasts for one month or 100 transactions, plus a $5 credit for either additional transactions or access to the Premium collections. WWW PAGES DATABASE The most heavily used InfoSeek database is the WWW Pages--not surprising since InfoSeek offers limited free access. InfoSeek claims to have the largest index of WWW pages, but that depends on how you count pages. A single Web document could be composed of multiple HTML files and include links to even more URLs. According to an InfoSeek comparison between InfoSeek and Lycos in February of 1995, the InfoSeek database included more than 214,000 URLs while Lycos included over 318,000. Since Lycos includes http, gopher, and FTP URLs, and InfoSeek only includes http URLs or those that use the WWW protocol, InfoSeek still claims a larger database. They based their claim on the size of the file containing the raw data: InfoSeek's was 813MB compared to Lycos' 634MB. But part of the reason that the raw data measurement is larger for InfoSeek is that InfoSeek indexes the full text of the documents while Lycos does not index entire pages, only the title, headings, subheadings, hypertext links and the "100 highest weighted words" in the page. The method that is most effective may well depend on the specific search. Results from an InfoSeek WWW Pages search can be seen in Figure 2. The top line of each record is the title of the document and is highlighted as the hypertext link to the resource. The title is followed by a brief description taken from the beginning of the body of the document. The URL of the resource on the next line is followed by a page-size designation in kilobytes, which can be especially useful to those on a slow connection. InfoSeek updates its WWW database weekly, paying special attention to submitted URLs and ones mentioned in the press. In addition, an update is run on the entire database once a month. This ensures that any content changes in the thousands of pages in the database are correctly indexed. Maintaining currency in an Internet index is a delicate balancing act. On the one hand, network documents change so quickly and often that almost daily verification is necessary to maintain currency. On the other hand, frequent verification of thousands of resources involves a huge amount of bandwidth and an undue burden on all of the individual pages. The InfoSeek once-a-month approach strikes a happy medium. LYCOS AND WEBCRAWLER InfoSeek's WWW Pages database is an impressive index, but how does it compare to the other two major indexes, Lycos and WebCrawler? The numbers given in the InfoSeek comparison mentioned earlier do not quite tell the whole story. While the comparison mentions the over 300,000 explored URLs, it neglects to point out that Lycos included over a million unexplored URLs with descriptions. By April of 1995, Lycos boasted over 3.3 million "unique URLs," including the explored and unexplored. Some of the additional URLs can be attributed to the Lycos inclusion of FTP and gopher resources. The numbers for WebCrawler are also confusing. Its database includes over 100,000 "explored" documents and another 900,000 "unexplored" documents. So which of the three is the most comprehensive? None of them alone. Any attempt at the impossible "comprehensive" Internet search must include at least all three. Searching for very distinctive keywords to try on all three, I explored some Japanese Web sites that included references to the Oyodo River and the Hyga orange. Yet none of the three indexes found these pages or any reference to them based on a simple single-word search of "oyodo" or "hyga". With other single-and multiple-word searches, each of the three databases turned up unique items not seen on the other two. In general, Lycos had the highest number of hits but less precision than InfoSeek. Some of the Lycos records are duplicates or too dated to be of use. WebCrawler usually had less than either of the other two, but occasionally WebCrawler would retrieve relevant documents not found by either InfoSeek or Lycos. AVAILABILITY One major problem with existing Internet indexes is that they have become overwhelmed with use and can be difficult to reach. With a free service, popularity rarely attracts the necessary capital for upgrading equipment to handle the increased load. The "Big Lycos" database is often so busy that search requests are refused. WebCrawler has the same problem. The ready availability of InfoSeek at all hours is a significant advantage over the free indexes--one for which many may be willing to pay. However, even the commercial InfoSeek is not without its availability problems. InfoSeek states up front that the ten free WWW pages search is not its priority and will not always be available. In addition, even the commercial version was not available at all times. While InfoSeek's availability is much, much better than Lycos and WebCrawler, it is not yet perfect. WEB POSITIONING Companies, libraries and any other organization that would like to establish an Internet presence should be aware of the major Internet databases. Does your library or company have a home page? If so, a good test of any Web database is to try to find that local home page. In the event that it is not available, all three of the major WWW indexes give an opportunity to register the URL of your home pages. In submitting URLs, be sure to avoid any typos. Depending on the index, it may take a few days or several weeks for the submitted URLs to show up in the database. The Usenet News database available from InfoSeek presents another important opportunity. Since it can be difficult to guess which of the thousands of newsgroups may contain mention of a specific organization or person, using InfoSeek searches across all of them. Was a competitor recently mentioned in rec.humor.funny or a complaint posted in misc.consumers? The Usenet database also can be used as a way to determine which newsgroups most frequently discuss certain topics. FUTURE WISHES As Powell points out, one of the great advantages to the Web and its HyperText Markup Language is that documents are structured [3]. HTML documents can have titles, headings for major sections, and named hypertext links. While the database gathering tools may look at these specific fields in gathering their data, and Lycos returns search results in records with definite field labels, none of the indexes provide a simple field-searching option. Although Pinkerton, the WebCrawler developer, notes that "titles are an optional part of an HTML document, and 20% of the documents that the WebCrawler visits do not have them," [4] the ability to restrict certain words to the title or named hypertext links of a document could help improve precision. Only one InfoSeek database can be searched at a time (with a separate transaction cost for each). Multiple database searching could be a major time saver for the busy searcher; so could the addition of a current awareness service to InfoSeek's services. Continued active expansion of the WWW Pages database will be essential to maintaining the database as an effective indexing tool. If gopher, telnet, and FTP resources are not added to the WWW Pages database, perhaps InfoSeek will develop a new database to cover those resources. Until that is accomplished, InfoSeek can be considered only a partial search for Internet resources. InfoSeek is not without its problems. It is only a small databank with a sophisticated but limited search language. The WWW Pages and Usenet News archives are important databases that have been combined with significant computer science and general databases. A bit more growth in the number of available databases is still needed. Yet, the savvy shown in its current marketing approach and pricing structure may uniquely position InfoSeek to become a major player in the online information marketplace. Even if they never live up to that potential, the WWW Pages database offers a significant, although far from comprehensive, step in the right direction for creating access to the wealth of Internet information resources. REFERENCES [1] Notess, Greg R. "Searching the World-Wide Web: Lycos, WebCrawler, and More." _ONLINE_ 19, No. 4 (July/August 1995): pp. 48-53. [2] "InfoSeek Home Page." http://www.infoseek.com/ [3] Powell, James. "Adventures with the World Wide Web: Creating a HyperText Library Information System." _DATABASE_ 17, No. 1 (Feb. 1994): pp. 59-66. [4] Pinkerton, Brian. "Finding What People Want: Experiences with the WebCrawler." Electronic Proceedings of the Second World Wide Web Conference '94: Mosaic and the Web. http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/Searching/pinkerton/WebCrawler.html (1994).
Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, Bozeman, MT 59717-0332, 406/994-6563; Internet--greg@notess.com ; http://www.notess.com.
Copyright © 1995, Online Inc. All rights reserved.