by Greg R. Notess from Online 23(3):20-22, May-June 1999
Online has certainly become mainstream. A decade ago, the talk was all about end-user searching, but it never took off with the general public any more than CompuServe and America Online could grab the attention of the masses without the multimedia hyperlinked capabilities of the World Wide Web. The Web offers everyone with Internet access and a bit of knowledge the opportunity to become a publisher, to share information with others. So, in a sense, the advent of the Web and Windows-based graphical browsers was the dawning of the Internet age.
As the information content available on the Web grew, the Web public needed ways to find it. Since all of the information was in digital form and was generally free to all virtual visitors, creating a software program to retrieve and index the content was relatively simple. Dump the indexed files into a database, add Web-based search capabilities on top, and voila! Finally, an online database with search features that the general public not only wanted to use, but clamored to use. So much so, that the database could be supported merely on advertising dollars.
And what have these large databases of indexed Web pages come to be called? Search engines. Despite the fact that the term only describes the search component of the information system and despite all the other search engines in use in non-Internet products and other software programs.
The Web search engines have broad popular appeal. They are the tools that have finally made end-user searching a reality. And while they cater to the general public, the Web search engines are increasingly important tools for the information professional as well. Not ones to disdain free searching or free information, we make good use of these search tools, even while we sometimes wish for the precision and advanced capabilities of our traditional command line systems with accurate Boolean processing, extensive field searching capabilities, advanced output and display options, truncation, search saves, controlled vocabulary, current awareness services, and a targeted subject focus.
As more and more substantial, current, and even authoritative resources become available on the Web, the Internet becomes an ever more important source to search. And effective searching is easier the better we understand how our search tools work.
This issue of ONLINE focuses on search engines, especially the Web search engines. So what is the state of the search engines in this new age of the Internet? How do they work? Which ones have which commands and when are metasearch engines a helpful alternative? What can be used on an intranet? Where are they going in the future? Read on for an overview and some detailed analysis by search engine industry experts.
Randolph Hock, in his detailed Features and Commands Comparison Chart, provides an easy-to-use reference for the professional searcher. Going beyond the usual information presented in such charts, Hock provides more details on Boolean operators, output options, and special features, all from the vantage point of the professional searcher. In addition, he gives clear descriptions of each of the features covered in the chart and even notes the default search operator. Study the chart and accompanying commentary closely since there are many features listed that may be useful to add to your search arsenal.
Danny Sullivan, the well-known keeper of Search Engine Watch and the Webmaster's Guide to Search Engines, offers "Crawling Under the Hood: An Update on Search Engine Technology," which provides an excellent overview of some of the hot issues facing the search engine companies. Based in part on his many conversations with search engine representatives, Sullivan discusses their concerns with database size and freshness, the search engines' perspectives on what users want, and a sense of what professional searchers would like to see.
Martin Courtois' and Michael W. Berry's study on the ranking of search results takes a unique approach, using searches the way that novice searchers might enter them and evaluating the top results for how well they match users' criteria for selecting hits to view. Not only is this an important insight into the general searcher's experience with search engines, but it contains some important lessons for the professional searcher as well. Given some of the changes in the ways search engines rank their results since the time of the study, it also lays the groundwork for a comparison at a later date to see whether or not the ranking algorithms have improved.
Chris Sherman, the Mining Company's Web Search Guide, offers a look into his crystal ball and discusses the future of the Web search engines. Sherman points out the potential impact of some of the newer standards, such as RDF and XML, and reports on the search engines' views on how personal assistants and collective wisdom may be used to provide more relevant answers.
The implications of his observation that search engines may be moving from the portal concept to the "sticky" destination site bears some consideration for those of us that continue to dig for the obscure and hard-to-find answers. If the portal sites indeed become the Internet's media companies, delivering answers from their own files and relegating their generic full-text search results to the background, where do the professional searchers turn when they need a more comprehensive search? A move towards serving up the most frequently requested information on the part of some portal/destinations/Web media companies could end up dividing them from other newer search engines that focus primarily on the more comprehensive, full-text searching of Web sites. At this point, it might be efficient to turn to metasearch engines. Check this issue's metasearch engine article for a listing of products and insight into their functionality as another tool for searchers.
Beyond Web search engine features and alternatives, what else is going on in the area of search engine capabilities? Attempts to properly interpret the users' questions, to provide more relevant results, and to more directly answer the information need are common themes driving much of the development. But Susan Feldman digs below the surface and takes a look at the state of natural language processing in search engines. Her article describes the recent history and use of natural language processing as well as where current research on it is heading. It may well join up at even deeper levels with some of the Web search engines, as they explore how well it will help them on the quest for better answers.
In all of this talk of evaluating relevance, improving relevance, natural language processing, and more sophisticated search systems, where does that leave Boolean searching and precision? At this point, Boolean persists. As Sullivan notes, we still want Boolean searching and we want better and more reliable Boolean and field searching. Some search engines that did not offer Boolean at first, added it later. Google! is the most recent example. At first, it offered neither Boolean nor phrase searching nor the + - system. Then, in its February beta release, it added phrase searching, the - for a NOT operation, and the + for forcing the inclusion of stop-words.
Not just relegated to the Internet, search engines are important tools for intranets as well. For many companies, intranets are beginning to reach critical mass, bulging with a variety of departmental information. Intranet managers are finding that prudent and informed application of search engine software adds much needed functionality to the corporate intranet and eases the search difficulties many end-users face. Darlene Fichter offers excellent practical advice on some of the options and considerations when choosing an intranet search engine. Included is a chart that compares some of the better known intranet search engines, ranging from products that can handle small sites to those designed for larger sites and larger budgets. The overview of features, including installation, scalability, content organization, and price, provides an informative starting point for the local search engine decision and planning process.
In my own ON THE NET column in this issue, I explore how the relevance ranking algorithms used by the search engines have failed in the past and how the relevance is rising above its previous levels. Even if most of the improvements are in practical add-ons that appear before the main search results, the outcome does achieve more helpful and relevant results for many searches. Another ONLINE columnist, Hal Kirkwood, contributes to this issue's theme with a special BOOKMARK CENTRAL. Hal highlights three collections now available on the Bookmark Central Web site (http://www.onlineinc.com/bookmark/). The first is a reviewed collection of search engines, metasearch engines, and search engines for intranets. The second collection was contributed by Chris Sherman and contains links to a variety of Web search help resources. Lastly, Hal highlights columnist Paula Berinstein's collection of image search engines.
While the search engines' focus on relevance and natural language processing improves, the information professional needs to start adjusting search habits. Certain questions will be well answered by the search engines' improved relevance features. Other queries will continue to work best as Boolean searches. AltaVista might be successful on one search and fall flat on another. Trying that same search on Northern Light or Google! might prove more successful.
Support search engine diversity! The information professional is best served with multiple search tools and multiple search strategies. Sherman notes that convergence is coming. Certainly search engines are quickly aligning with traditional media companies. However, let us sincerely hope that convergence does not also mean a convergence of search engines into one single portal. Instead, while a multiplicity of search engines means constant keeping up with the latest features and changes, it also gives a larger collection of tools with which to find those answers we seek.
The Web is dynamic. The information resources available on it are immense. The search engines will change, develop, grow, and maybe even improve. Do not expect consistency between search engines. New features will emerge and old ones, even some we really liked, may be abandoned. Whole new search engines will emerge, and maybe a few more will die. But for now, delve into this issue and explore the world of search engines in this new online age of the Internet.
Greg R. Notess (greg@notess.com; http://www.notess.com/) is a reference librarian at Montana State University.
-------------------------------------------------------------------------------
COPYRIGHT 1999 Online, Inc.
-------------------------------------------------------------------------------