[Photo]

Greg R. Notess
Reference Librarian
Montana State University

ON THE NET

Refining the Internet in '97

DATABASE, December 1997
Copyright © Online Inc.





As each year passes, the dawning of the Age of the Internet recedes further into the dim technological past. The command line use of ftp and gopher and arcane UNIX commands are almost forgotten as Web users concentrate their energy on comparing the latest browser upgrades, arguing the merits of their favorite Internet search engines, or boasting of their latest information 'find' on the Internet. There is no question that the Net has become a major pathway to information. In the past year, users and producers have pushed refinements to the ways in which information is published, accessed, and discovered on the Net.

DATABASES ON THE WEB

This past year has seen huge moves toward putting databases of all kinds onto the Web. Databanks, newspapers, niche databases, and others made major moves to Web versions. For example, more library catalogs moved from telnet connections to full Web access. DIALOG, as well as other commercial online services, not only offered Web access to their professional service, but opened up Web versions for end-users.

It seems that almost all public databases are planning some Web access with private databases following suit. The huge intranet market makes access to proprietary databases via the Web an attractive option. Legacy databases move to corporate intranet access. Commercial databases are finding ways to offer Web access with log in or domain restrictions that provide access only to authorized users. Government information is flowing across the Net with national, regional, and local governments jumping quickly into the stream.

WEB DATABASES AND THEIR SEARCH ENGINES

The main Internet search engines with their huge databases of Web pages have not been idle during the year. They have all been refining their search engines and adding new features. They have, as always, been positioning themselves as the primary starting point for Web exploration. And while none of them is clearly in that position now, there have been some important changes.

Northern Light

The big news in 1997 was the introduction of a major new Web search engine in August called Northern Light. Designed with the help of librarians, Northern Light includes both a database of Web pages on par with the likes of HotBot and AltaVista and its own "Special Collection" of thousands of full-text articles. These articles are available for fees of $1 to $4 and are delivered online with a money back guarantee.

Northern Light is not just noteworthy for its ability to search its Special Collection and the Web. Its database of Web sites ranks quite favorably in terms of size with the other major contenders. In addition, it has made a significant advance over other Web search engines with its Custom Folders. These folders are an early attempt at offering an important sorting capability to Web search results. Hits are sorted into several folders that might include keywords, type, and source. This sorting feature is one of the best improvements from Web search engines for making it easier to browse for relevant documents. Northern Light is still under development, but it is a search engine to watch.

AltaVista

AltaVista went through some graphic redesign of its search pages along with new search features. New was the introduction of a language limit that uses artificial intelligence to determine the language of individual Web pages. This language limit includes a few dozen common languages, but is far from comprehensive. Searchers can choose one or more of the listed languages, but to select more than one language requires using the preferences setting. While Chinese, Korean, and Japanese pages can be detected with this limit, AltaVista does not support search terms submitted in the characters of these languages. AltaVista's LiveTopics experiment became an official part of its search engine during the year. No longer called LiveTopics, the same feature is now available after running any search and then choosing the "refine" option. AltaVista then suggests possible related terms to add or exclude from further searching. In addition to the refine capabilities, AltaVista has added another feature to make it easier to browse results. A small icon at the left in the search results allows easy retrieval of a specific hit in a new window. This keeps the original search results window open for easy browsing.

Users can also customize AltaVista search options. This new preferences feature can set the advanced or simple search as the default as well as allowing specific displays. It does not use cookies, as does HotBot, to set these options. Instead, it gives a new URL to use for a bookmark.

HotBot

HotBot has been busy making changes in 1997, with more in the works for the end of the year. Like so many of its competitors, HotBot added some subject access. The subject categories at the bottom of its search page link to selected sources from the WIRED site. The subject list is known as the Wired Cybrarian and is a small but useful collection of reference links.

HotBot also has expanded its search capabilities. One of the most useful is the addition of title searching. This field search for keywords in Web page titles is available as "words in title." Title searching has been available on AltaVista and Infoseek with use of the "title:term" syntax. HotBot added this important field search capability in the drop down menu, which has made it a bit harder to notice, but it will also accept the "title:term" syntax. While HotBot should be commended for adding this capability, unfortunately it does not seem as reliable as the title search from AltaVista and Infoseek. This may have been a one time problem with its title search, but on a number of searches, HotBot failed to find pages that contained the search term in the title. And these pages were found in the HotBot database using other searches.

Another new HotBot search feature is the page type limit. This offers four radio button choices: Any, Front Page, Index Page, and Page Depth. Any is the default option. The Page Depth button uses a default value of three, but it can be changed to other numbers as well. Choosing the Front Page option limits the search to pages that are the central pages on a specific computer. For example, it will find http:// www.name.edu, but not http://www. name.edu/~jsmith/. This is a great way to limit a search and to cut out all those subsidiary pages on a site. The Index Page limit is similar, but can pick up top level pages within a directory or subdirectory, in addition to central pages. The index page limit can find central pages for individuals and departments within a larger site. For example, it would find http://www. name.edu/~jsmith/index.html, but not http://www.name.edu/~jsmith/bio. html. In addition, the Index Page limit will find pages that use either the http://www.name.edu/~jsmith/ or the http://www.name.edu/~jsmith/index. html format (since these refer to the same page). Unfortunately, HotBot may get rid of this new page depth search option before the end of the year, so do not get too accustomed to its presence.

Lycos

Lycos has made significant steps forward in its search software with the introduction of its Lycos Pro version. This finally supports full Boolean and phrase searching. It was in beta testing for a while during the year then became available directly from the top Lycos page. Lycos has also finally stopped automatically truncating all search terms. While Lycos Pro is still a click away, many of its commands (including Boolean operators) can now be used from the main Lycos page. The many new capabilities that Lycos Pro provides make Lycos a worthy contender to the other search engines. It features proximity operators available from none of the other primary Web search engines. Unfortunately, Lycos still suffers from a smaller and less complete database than Northern Light, HotBot, AltaVista, Excite, and Infoseek. It still claims to be the largest by counting all the URLs in its database, but its search results do not support the claim.

Infoseek and Excite

One attractive new feature of the Infoseek search engine is its post-processing option. After running a search, a follow-up search can be run against the results. It is not as good as the ability to combine search sets, but at least it is a step in the right direction. Just check the "Search only these results" button. Infoseek also suggests possible related topics.

Excite has some similar follow-up search options. It has featured its "More Like This" link for years, but it now also suggests possible related terms that can be added to the search by checking a box next to the term. Excite's big change this past year was to move toward a channel metaphor, structuring much of its content around broad topics. However, that change related to the subject categories more than the straight Web search itself. Its Power Search provides a form to help construct a query, but except for its ability to change the number of results shown, all that it offers can be requested directly in the search box by users who know the proper commands.

DATABASE SIZE

The Web continues to grow and expand. This is driven not only by newcomers setting up Web pages and by new information resources being made available on the Web, but also by the continual redesign of Web sites. Often, on the first attempt at putting up Web pages, information content is listed on one long page. As the Web designers refine their site design, it becomes apparent that the information is far easier to navigate when it is organized into multiple pages. This process can cause a site of ten pages to easily become 100 pages with no increase in information content.

Yet even as the total number of Web pages increased, the databases of Web pages seemed to stagnate. Surprisingly, 1997 saw little growth in the overall size of the Web databases. True, some pages were taken down or ceased to exist for other reasons. However, the overall trend has been toward growth. Most of the Web databases reported a similar number of hits or even a few less hits than the same keywords found in 1996. HotBot still claims about 50 million pages and the others have not shown any large growth in the total number of pages.

What does this mean for the information seeker? Be sure not to expect a comprehensive search from any single Web database or even from all of them. There are plenty of unindexed Web pages and information resources. Use the Web search engines as one tool in the information quest, but recognize their limitations and lack of comprehensiveness.

WEB BROWSERS

While Microsoft and Netscape argue the merits of their respective approaches, the question that remains to be answered is whether or not and how much the public wants to use push technology.
The primary tool of the information seeker on the Internet has continued to be the Web browser. Almost all users have gravitated to one of the two major players in this field: Netscape's Navigator or Microsoft's Internet Explorer. These two continued to dominate the market for Web client software. Microsoft seems to have gained some ground on Netscape's market share, but the majority of Web users still seem to use a Netscape browser.

The Web browser is now just one part of the Internet software included in the two products. Both have grown so large that home users should expect to wait at least an hour, probably much more, to download the entire installation file. Netscape introduced its new Netscape Communicator software suite midyear. As in 1996, the suite included email and newsgroup capabilities, but the 1997 version made significant improvements in those capabilities. Communicator also includes features for HTML editing, collaboration, calendaring, and scheduling, and push technologies. Internet Explorer's version four counters with many of the same features, but ties all of it in more closely to the Microsoft Windows 95 operating systems and will be even more integrated with Windows 98.

For the information professional, keeping up with the latest versions of these browsers remains an important consideration, since some sites use only the newest features to deliver information content. Except for the push technologies, most of the other new features may have had little impact on the information seeker. They primarily refined navigation ability, HTML display, and layout. Netscape users can enjoy Navigator's new Personal Toolbar that permits the inclusion of personal favorites on one of the top toolbars. This ability to customize the browser desktop makes for even quicker access to your most-used Web sites.

PUSH

Overall, 1997 has been a year of refinements to the Internet as an information conduit and as an information resource.
The hot technology in 1997 has been "push." In one sense, this is a step backwards in the technological development to broadcast media. Push technology uses the Internet to send information content directly to a user. Like turning on a television or radio, the information is pushed across the computer screen. The advantage over traditional broadcast media is that more user customization is possible. The user determines which channels to subscribe to and even which sections. PointCast was one of the early examples of push technology on the Net. The constantly running stock ticker is a small scale example of push technology.

Microsoft's version of push technology, included in its Internet Explorer, is referred to as "channels" and uses its Channel Definition Format. Netscape added its Netcaster component to the Communicator suite. This relies on standard HTML and Javascript to deliver content. While Microsoft and Netscape argue the merits of their respective approaches, the question that remains to be answered is whether or not and how much the public wants to use push technology. While it can be very useful for quickly changing information like stock quotes, is a steady stream of news on your computer more distracting or informative?

The push clients can allow the user to determine the interval at which channels are updated in addition to choosing which channels to subscribe to. The updates can run in the background while the user is working on other projects or using other software on the computer.

Overall, 1997 has been a year of refinements to the Internet as an information conduit and as an information resource. More tools, more content, and more users have become involved with the Net. As the Internet becomes more ubiquitous, it is no longer a question of when to use the Net, but how.


Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, Bozeman, MT 59717-0332; 406/994-6563; greg@notess.com ; http://www.notess.com.

Copyright © 1997, Online Inc. All rights reserved.