On the Nets: The InfoSeek Databases

by Greg R. Notess

DATABASE, August 1995
Copyright © Online Inc.

----------------------------------------------------------------

     In the last issue of _ONLINE_, On the Nets compared search 
techniques and capabilities of different indexes for the World-Wide 
Web [1]. As the high-quality information resources available on the 
Internet continue to multiply, the need for indexes to these resources 
grows apace. While none of the indexes reviewed offer the kind of 
sophisticated Boolean and field searching that is standard on CD-ROM 
and online databases, even the best retrieval system will be useless if 
the information in the database is inaccurate or incomplete. The ease 
with which Web pages can be brought online, changed or deleted means 
that no index can be either comprehensive or entirely accurate. Yet 
despite their weaknesses, the available indexes to the World-Wide Web 
and other Internet resources fulfill an important role in information 
retrieval on the Internet.
     The best of the indexes, including archie, veronica, Lycos, and 
WebCrawler, have been developed by researchers or academicians and 
freely disseminated on the Net. These databases of Net resources have 
been highly used but are beset with system overloads and database 
maintenance problems. Can the commercial sector find a more 
efficient and effective way to provide such databases? With the 
advent of the commercial InfoSeek service, users can decide for 
themselves. 
     Businesses are scrambling to find the magic combination for 
successful Internet marketing. InfoSeek demonstrates great Net savvy 
in offering useful free services and an attractive pricing structure. 
InfoSeek is a budding online databank, aiming to tap into the Internet 
market. It gives subscribers access to full-text and bibliographic 
databases. Anyone can search the InfoSeek WWW Pages database, but 
only subscribers can see more than the first ten hits. By combining 
limited free access and reputable databases, InfoSeek makes an 
aggressive play for the Internet online market.

INFOSEEK
     To what does this Santa Clara, California company aspire? 
"InfoSeek is a new full-text search service that makes finding 
information easy. You can search WWW pages, Usenet News, over 50 
computer magazines, newspaper newswires and press releases, 
company profiles, movie reviews, technical support databases, and 
much more" [2]. In answer to the question whether InfoSeek is cheaper 
than DIALOG and CompuServe, the online documentation states that "in 
most cases, InfoSeek is the lowest cost information search and 
retrieval service available." InfoSeek appears to be aiming for the 
Internet-and computer-user market, combining computer, news and 
business databases with an easy interface and competitive pricing. 
     The databases available for searching from this new databank are a 
modest selection of standard commercial offerings combined with 
some unique Internet databases: WWW Pages, Usenet News, Wire 
Services, Cineman Reviews, Computer Select, MDX Health Digest, full 
text of ComputerWorld and InfoWorld, and two of the Hoover Business 
databases. Figure 1 shows the InfoSeek search screen with the list of 
available databases. The "Wire Services" heading includes AP Online, 
BusinessWire, PR Newswire, Newsbytes News Network, and Reuters 
Business Report. InfoSeek has also stated that it plans to add Medline 
and unnamed databases in business, finance, health, sports and 
national news within the next six months.

PRICING
     InfoSeek uses a transaction fee-based pricing. Each search request 
counts as one transaction and each document retrieval request counts 
as one transaction. Transaction charges range from $0.10 to $0.20, 
depending on which of the three subscription plans is chosen. The 
standard plan costs $9.95/month and includes 100 transactions, with 
each additional transaction costing $0.10. The light use plan costs 
$1.95/month and includes ten transactions, with each additional 
transaction costing $0.15. The occasional plan has no monthly charge 
but transactions cost $0.20 each. No other per-minute or per-record 
charges apply, except for the premium collections that have additional 
access charges. Site license discounts are also available. 
     In what should prove to be a very effective marketing move, 
InfoSeek supplements their commercial offerings with several free 
services. Their WWW Pages database is available free of charge via 
Netscape's Internet Search page at 
http://home.netscape.com/home/internet-search.html. Searching is 
free, but the display is limited to a maximum of only ten references. 
(Registered users can display up to 200 per transaction.) InfoSeek has 
an extensive and well-organized Frequently Asked Questions about 
InfoSeek file. The FAQ is available to everyone for free searching at 
http:// www.infoseek.com/FAQQuery. In addition to the free search of 
the WWW Pages databases and the free searching of their FAQ, 
InfoSeek also gives new users a free trial run. The demonstration 
account lasts for one month or 100 transactions, plus a $5 credit for 
either additional transactions or access to the Premium collections.

WWW PAGES DATABASE
     The most heavily used InfoSeek database is the WWW Pages--not 
surprising since InfoSeek offers limited free access. InfoSeek claims 
to have the largest index of WWW pages, but that depends on how you 
count pages. A single Web document could be composed of multiple 
HTML files and include links to even more URLs. According to an 
InfoSeek comparison between InfoSeek and Lycos in February of 1995, 
the InfoSeek database included more than 214,000 URLs while Lycos 
included over 318,000. Since Lycos includes http, gopher, and FTP 
URLs, and InfoSeek only includes http URLs or those that use the WWW 
protocol, InfoSeek still claims a larger database. They based their 
claim on the size of the file containing the raw data: InfoSeek's was 
813MB compared to Lycos' 634MB. But part of the reason that the raw 
data measurement is larger for InfoSeek is that InfoSeek indexes the 
full text of the documents while Lycos does not index entire pages, 
only the title, headings, subheadings, hypertext links and the "100 
highest weighted words" in the page. The method that is most 
effective may well depend on the specific search.
     Results from an InfoSeek WWW Pages search can be seen in Figure 
2. The top line of each record is the title of the document and is 
highlighted as the hypertext link to the resource. The title is followed 
by a brief description taken from the beginning of the body of the 
document. The URL of the resource on the next line is followed by a 
page-size designation in kilobytes, which can be especially useful to 
those on a slow connection.
     InfoSeek updates its WWW database weekly, paying special 
attention to submitted URLs and ones mentioned in the press. In 
addition, an update is run on the entire database once a month. This 
ensures that any content changes in the thousands of pages in the 
database are correctly indexed. Maintaining currency in an Internet 
index is a delicate balancing act. On the one hand, network documents 
change so quickly and often that almost daily verification is necessary 
to maintain currency. On the other hand, frequent verification of 
thousands of resources involves a huge amount of bandwidth and an 
undue burden on all of the individual pages. The InfoSeek once-a-month 
approach strikes a happy medium.

LYCOS AND WEBCRAWLER
     InfoSeek's WWW Pages database is an impressive index, but how 
does it compare to the other two major indexes, Lycos and 
WebCrawler? The numbers given in the InfoSeek comparison mentioned 
earlier do not quite tell the whole story. While the comparison 
mentions the over 300,000 explored URLs, it neglects to point out that 
Lycos included over a million unexplored URLs with descriptions. By 
April of 1995, Lycos boasted over 3.3 million "unique URLs," including 
the explored and unexplored. Some of the additional URLs can be 
attributed to the Lycos inclusion of FTP and gopher resources. The 
numbers for WebCrawler are also confusing. Its database includes over 
100,000 "explored" documents and another 900,000 "unexplored" 
documents. 
     So which of the three is the most comprehensive? None of them 
alone. Any attempt at the impossible "comprehensive" Internet search 
must include at least all three. Searching for very distinctive 
keywords to try on all three, I explored some Japanese Web sites that 
included references to the Oyodo River and the Hyga orange. Yet none of 
the three indexes found these pages or any reference to them based on 
a simple single-word search of "oyodo" or "hyga". With other single-and 
multiple-word searches, each of the three databases turned up unique 
items not seen on the other two. In general, Lycos had the highest 
number of hits but less precision than InfoSeek. Some of the Lycos 
records are duplicates or too dated to be of use. WebCrawler usually 
had less than either of the other two, but occasionally WebCrawler 
would retrieve relevant documents not found by either InfoSeek or 
Lycos.

AVAILABILITY
     One major problem with existing Internet indexes is that they have 
become overwhelmed with use and can be difficult to reach. With a 
free service, popularity rarely attracts the necessary capital for 
upgrading equipment to handle the increased load. The "Big Lycos" 
database is often so busy that search requests are refused. 
WebCrawler has the same problem. The ready availability of InfoSeek 
at all hours is a significant advantage over the free indexes--one for 
which many may be willing to pay. However, even the commercial 
InfoSeek is not without its availability problems. InfoSeek states up 
front that the ten free WWW pages search is not its priority and will 
not always be available. In addition, even the commercial version was 
not available at all times. While InfoSeek's availability is much, much 
better than Lycos and WebCrawler, it is not yet perfect.

WEB POSITIONING
     Companies, libraries and any other organization that would like to 
establish an Internet presence should be aware of the major Internet 
databases. Does your library or company have a home page? If so, a 
good test of any Web database is to try to find that local home page. In 
the event that it is not available, all three of the major WWW indexes 
give an opportunity to register the URL of your home pages. In 
submitting URLs, be sure to avoid any typos. Depending on the index, it 
may take a few days or several weeks for the submitted URLs to show 
up in the database. 
     The Usenet News database available from InfoSeek presents another 
important opportunity. Since it can be difficult to guess which of the 
thousands of newsgroups may contain mention of a specific 
organization or person, using InfoSeek searches across all of them. 
Was a competitor recently mentioned in rec.humor.funny or a complaint 
posted in misc.consumers? The Usenet database also can be used as a 
way to determine which newsgroups most frequently discuss certain 
topics.

FUTURE WISHES
     As Powell points out, one of the great advantages to the Web and 
its HyperText Markup Language is that documents are structured [3]. 
HTML documents can have titles, headings for major sections, and 
named hypertext links. While the database gathering tools may look at 
these specific fields in gathering their data, and Lycos returns search 
results in records with definite field labels, none of the indexes 
provide a simple field-searching option. Although Pinkerton, the 
WebCrawler developer, notes that "titles are an optional part of an 
HTML document, and 20% of the documents that the WebCrawler visits 
do not have them," [4] the ability to restrict certain words to the title 
or named hypertext links of a document could help improve precision. 
     Only one InfoSeek database can be searched at a time (with a 
separate transaction cost for each). Multiple database searching could 
be a major time saver for the busy searcher; so could the addition of a 
current awareness service to InfoSeek's services. Continued active 
expansion of the WWW Pages database will be essential to maintaining 
the database as an effective indexing tool. If gopher, telnet, and FTP 
resources are not added to the WWW Pages database, perhaps InfoSeek 
will develop a new database to cover those resources. Until that is 
accomplished, InfoSeek can be considered only a partial search for 
Internet resources.
     InfoSeek is not without its problems. It is only a small databank 
with a sophisticated but limited search language. The WWW Pages and 
Usenet News archives are important databases that have been 
combined with significant computer science and general databases. A 
bit more growth in the number of available databases is still needed. 
Yet, the savvy shown in its current marketing approach and pricing 
structure may uniquely position InfoSeek to become a major player in 
the online information marketplace. Even if they never live up to that 
potential, the WWW Pages database offers a significant, although far 
from comprehensive, step in the right direction for creating access to 
the wealth of Internet information resources.

REFERENCES
[1] Notess, Greg R. "Searching the World-Wide Web: Lycos, WebCrawler, 
and More." _ONLINE_ 19, No. 4 (July/August 1995): pp. 48-53.

[2] "InfoSeek Home Page." http://www.infoseek.com/

[3] Powell, James. "Adventures with the World Wide Web: Creating a 
HyperText Library Information System." _DATABASE_ 17, No. 1 (Feb. 
1994): pp. 59-66.

[4] Pinkerton, Brian. "Finding What People Want: Experiences with the 
WebCrawler." Electronic Proceedings of the Second World Wide Web 
Conference '94: Mosaic and the Web. 
http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/Searching/pinkerton/WebCrawler.html (1994).

----------------------------------------------------------------

Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, Bozeman, MT 59717-0332, 406/994-6563; Internet--greg@notess.com ; http://www.notess.com.

Copyright © 1995, Online Inc. All rights reserved.