As in the print publishing world, the development of finding-aids and indexes must wait for the development of the resources. When anonymous FTP resources multiplied, archie appeared. With the growth of gophers, veronica was born. The explosive growth of World-Wide Web resources in the past year has inspired several contenders for the title of "best Web search engine." The different keyword indexes of Web resources feature a wide variety of search interfaces and capabilities. No clear winner has emerged yet, and the diversity of search engines and databases provides the information professional with multiple choices. There are many Web keyword indexes, but the best-known are: * Lycos * WebCrawler * World-Wide Web Worm * Harvest Broker * CUI Just as World-Wide Web clients can speak other protocols and connect to gopher, telnet and FTP resources, some Web indexes include more than just Web documents. Some of these search engines permit Boolean searches and other sophisticated search options, but all suffer from the problem of overload. SYSTEM OVERLOAD A major problem inherent with successful Internet keyword indexes is that as soon as a particular search tool becomes useful and well- known, it is flooded with users. This in turn makes it less dependable, since the original server is unable to handle the increased load. This happened with the first archie server at McGill University and then with the first veronica server. For both archie and veronica, a partial solution has been to divide the load by multiplying the servers. Many archie servers on different continents now handle the thousands of daily archie searches. The dispersion of veronica servers has occured along similar lines. This has been an effective but only partially successful way of dividing the load. As more servers are being set up by generous hosts on the Net, Internet use is multiplying. The result is that even with a dozen or more veronica servers, the load (determined by the number of simultaneous search requests) is still too high. It is not uncommon to try an archie or veronica search and get a failed search response due to high system load. The same situation occurs with Web finding-aids. When a particular index establishes a reputation for successful searches, it attracts a huge increase in traffic. Then users can no longer depend on that resource and must look for an alternative. Most search options for the Web have not yet resulted in a multiplication of servers, but that time may soon arrive. Meanwhile, the different indexes provide alternatives when a particular favorite is unavailable or unbearably slow. LYCOS Lycos, a project hosted by the computer science department at Carnegie-Mellon University, is one of the best-known and most popular indexing tools for the World-Wide Web. When Netscape Navigator was first widely released in late 1994, the people at Netscape Communications Corporation wisely set up a page that listed various Internet search tools (http://www.netscape.com/home/internet- search.html). In one quick and dirty comparison, they ranked them based on the results from a simple search on surf. Lycos retrieved the most documents and therefore was the first of the listed Internet search tools. Due to its prominence on the Netscape Internet Search page, Lycos' load has increased so greatly that it can be difficult to get any response at all. Although the Lycos database is one of the largest finding-tools, there are other reasons that Lycos searches result in a high number of hits. A single-word search on Lycos defaults to automatic truncation, so the search on surf also retrieves documents with surface. On multipleword searches, Lycos defaults to an OR operation. Although the search results are ranked and give preference to records that have all the search terms, this results in many irrelevant records. In the Lycos technical documentation, the developers say, "We plan to upgrade the search engine's language at some future point to implement more standard Boolean operators. We will definitely add...spelling correction and phonetic and semantic match capabilities." Until that time, the efficiency of Lycos is severely limited. For single keyword searches it works well, but multiple-word searches are not as successful. The current search engine has a few advanced features. While truncation is the default, an exact search can be specified by adding a period at the end of the search term. Also, preceding a search term with a dash designates that term as a negative indicative. "Documents containing that word have their match score reduced, but they may still be retrieved if the other terms in your query are present." You can use these two tools to obtain a more precise search. For example, the search surf. -silicon would result primarily in records with the term surf but not terms such as surface, and it would also mostly exclude pages from Silicon Graphics about its "Silicon Surf" service. The search options and database development for Lycos continue to change. From the main Lycos home page (http://lycos.cs.cmu.edu/) there may be several options (Figure 1). The databases may be numbered Lycos1, Lycos2, Lycos2a or Lycos10. The actual designation has changed over time. The Lycos page also offers a small Lycos database and a big Lycos database. The smaller database is less likely to be overwhelmed. The output of a Lycos search can appear cryptic. At the top of the search report is the number of documents found matching at least one search term and a list of matching words. It includes the requisite hypertext link to the found URL, but also includes a hypertext link to a document with the record's ID number and weighted score. In addition, the date of the document's last update in the Lycos database, the size of the page in bytes, the number of links within the document, the title, an outline and the search keys found in the document are listed. With the default "verbose" display, the record also includes a sometimes lengthy excerpt from the actual document. WEBCRAWLER WebCrawler, developed by Brian Pinkerton at the University of Washington (http://webcrawler.cs.washington.edu) is a much more simple interface and provides results in an easy-to-browse, single- line report. The database WebCrawler searches is not as large as the Lycos database, but it is substantial nonetheless. WebCrawler has a single line for entering the search statement (Figure 2). For a multiword search, it defaults to a Boolean AND search. Just uncheck the button below the search line to run an OR search. There are no nesting or adjacency features. While there is no truncation symbol, WebCrawler does automatically strip "endings" and convert search terms to all lowercase. The example given in the documentation is that "NeXT Computers becomes next computer." Based on the samples I tried, "endings" appears to only refer to plurals, either a final "s" or an "es," and not to other suffixes. While the options are limited, WebCrawler's Boolean capabilities make it the first choice for a search needing an AND, at least until Lycos develops its Boolean capabilities. Unfortunately, WebCrawler can sometimes be as difficult to reach as Lycos. Once again, it is a victim of its own success. WWWW AND HARVEST The World-Wide Web Worm (WWWW) indexes Web document titles and embedded references to other Web resources. Thus it is a smaller database than Lycos or WebCrawler that also indexes parts of the full text in the documents themselves. The Worm works well for those familiar with UNIX and the egrep "regular expression." For example, OR is designated with a pipe | symbol, and .* represents "any amount of intervening text." WWWW has been widely used and can be even more difficult to reach than Lycos or WebCrawler. However, it shows a message saying that it will be moving to a larger machine soon, which will allow more than the current maximum of 25 connections. The Harvest Broker using the Glimpse search engine, provides a much fuller range of Boolean capabilities. This search option goes under a variety of lengthy names: "Query Interface to the WWW Home Pages Broker" or "The Harvest Information Discovery and Access System." Both WAIS and Glimpse are used with Harvest, and the Glimpse search at http://harvest.cs.colorado.edu/ Harvest/brokers/www-home-pages/ features full Boolean operators with parenthetical nesting. Searches can be either case-sensitive or case-insensitive, and truncation is only available as all or nothing. Either the "Keywords match on word boundaries" is checked, which designates an exact search on the search terms, or it is not checked and the engine truncates all search terms. In addition, the Glimpse version of the Harvest Broker supports field searching of title, URL and keywords. While it is comforting to have some more standard Boolean operations available, Harvest Broker has two major limitations. First, it is confusing. Starting with the lack of a distinct name, the documentation describes Harvest as "an integrated set of tools to gather, extract, organize, search, cache and replicate relevant information across the Internet." Unfortunately, the tools are not integrated well enough to make sense to most users. A bit more work on the human-machine interface could improve Harvest Broker greatly. The database also needs to be expanded greatly before Harvest approaches the depth of coverage available from Lycos or the WebCrawler. CUI Another smaller, more refined index option comes from the Centre Universitaire d'Informatique (CUI). Its Web catalog derives from several well-known listings of Web pages--NCSA's What's New pages, CERN's Virtual Library Subject Catalog, Scott Yanoff's _Internet Services List_, John December's _Computer-Mediated Communication Information Sources_ and _Internet Tools Summary_, and a few others. Searches can be based on PERL regular expressions. Like the WWW Worm, | works for OR, and .* for AND (but terms must be in the specified order). CUI works well for finding major resources and for broad keyword searches. The nature of the component databases can result in some redundancy. The descriptions of the resources in the databases may be brief or lengthy, so the success of a search is determined by how well the source is described. Even so, it can help find the better known Web resources. DON'T FORGET VERONICA The Web is rapidly replacing gopher as the standard Internet publishing medium, yet even so, gophers offer many information resources. Veronica should certainly remain in the arsenal of search tools for a comprehensive search. As noted above, veronica servers are often overburdened and too busy to respond to a new request. For this reason, some hint about which servers are least busy can save a considerable amount of time. At Washington & Lee University (gopher://liberty.uc.wlu.edu:70/11/gophers/Veronica) the gopher server does just that. Periodically, it automatically checks each of its known veronica servers. Then it ranks the least busy servers first and lists the servers which did not respond at all. Veronica's strength is that the search statement can include standard Boolean operators (AND, OR, NOT) and nested arguments (designated by parentheses). The default operator on a multiword search is AND. Veronica recognizes the asterisk as an end truncation symbol. Veronica even supports limits. Searches can be limited by gopher type--directory, text file, image, etc. The major limitation with veronica is the database itself. While the best Web-based finding aids index entire HTML documents and can include gopher and FTP resources, veronica is limited to menu listings. In addition, the menu listings may not make much sense out of the context of the upper-level menu titles. With the capabilities of the gopher+ protocol to invoke an external Web browser, some Web documents are now included on gopher menus. However, only a few Web documents are retrieved in a veronica search. USE YAHOO FOR A SUBJECT APPROACH While the keyword search of the search engines described previously is a primary method for tracking down Internet resources, using a classified or subject listing of resources can also be effective. Just as there are numerous keyword search options, there are many subject listings as well. One of the best subject lists is Yahoo, available at Stanford and a mirror site from Netscape. Yahoo has a keyword search option of the entries included in the subject listing. Although the database is small compared with the other keyword search options, it presents very clear options, including case-sensitive matching, either Boolean AND or OR, and substring or complete word searches. Yahoo can be a good source for finding the best-known resources. Also, Yahoo lists over 40 other Web indexes under http://akebono.stanford.edu/yahoo/Reference/Searching_the_Web/ for those trying for a comprehensive Internet search. SEARCH MORE THAN ONE WITH CUSI With so many keyword indexes to Internet resources, the next step is to find a resource that searches all of them. CUSI (Configurable Unified Search Engine) provides one form that can then search various Web search engines. The advantage to the CUSI front end is that the keywords only need to be entered once; then, one at a time, the search can be sent to different indexes (Figure 3). CUSI is one of the few Web index services that has multiple servers. Start at the URL listed for CUSI in the sidebar. Then choose the server closest to you. For Lynx users, most of the CUSI sites do not work, so try the CUSI Radio Button version at http://www.scs.unr.edu/~cbmr/net/search/cusi-r.html. CUSI includes search options for many different search engines in the following categories: * World-Wide Web (WWW) Indices * Other Internet Indices * People and Organizations * Bibliographic * Computer and Network-Related * Reference Works CUSI also includes a link to a multithreaded query page from http://www.sun.fi/mtq/mtquery.html. This runs simultaneous searches in each of the selected indexes. While this option and the CUSI approach seem like the answer to the often time-consuming process of Internet searching, they can take just as long. One problem is that the links to the other indexes may no longer be accurate. In addition, the special features and check boxes that some keyword indexes have may not be available from within the CUSI form. The output is determined by the actual index, and therefore varies greatly in format. COMMERCIAL FUTURE? One possible solution to the overload problem is to limit the number of users by charging for the service. The question becomes whether a commercial entity can make enough profit from an index to develop an easy-to-use, yet powerful interface to a well-maintained database. At least one company is giving it a try. InfoSeek Corporation may offer a glimpse of how online services of the future may be configured. InfoSeek has a variety of indexes, including one to WWW pages, the past four weeks of Usenet news, Computer Select and wire services. It also offers a demonstration database and a one-month free subscription. After the free trial, customers can pay either $9.95/month for up to 100 transactions, and $0.10 a transaction thereafter, or choose one of the other subscription plans. InfoSeek offers some useful resources, but due to the way in which many users search the Net, the fees could add up quickly. (_See Greg's August DATABASE column for an in-depth review of InfoSeek and its content.--NG_) The developers of these Web searching tools should be commended for their hard work and creativity. However, what is needed in the literature is a detailed comparison of the efficacy of the different search options. Until there is a consensus on the best keyword indexing of the Net, information professionals must choose their first try carefully. For single keyword searches of a large database, use Lycos. For multiword searches with an AND, try WebCrawler. For gopher resources, try veronica. And for a time-consuming comprehensive search, use CUSI. Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, Bozeman, MT 59717-0332, 406/994-6563, Internet--align@gemini.oscs.montana. edu; http://notess.com.