[Photo]

Greg R. Notess
Reference Librarian
Montana State University

Web Wanderings

PubScience: Evolution or Devolution

EContent, February 2000
Copyright © Online Inc.




Editor's note: Changing the name of this magazine to EContent has caused me to look more closely at some of our columns. The ON THE NET column by Greg Notess, winner of this year's Gale Group Information Authorship Award for Best Columnist, has had the same name here as in our sister publication ONLINE. This does not help readers discern the difference between the publications. Greg and I decided that WEB WANDERINGS better described what we have in mind for his EContent column: an authoritative explanation of Web content that is of value to the serious researcher, Web content that Greg identifies in his Internet wanderings. We are, therefore, re-naming the column WEB WANDERINGS.

--Marydee Ojala


MEDLINE, the venerable and well-indexed database of citations and abstracts for the health sciences literature, is free on the Web. Perhaps this sparked a trend. Produced by the U.S. federal government, MEDLINE had always been an inexpensive database on traditional online hosts, but the advent of the Web created opportunities for MEDLINE to become freely available on a variety of sites. Medical Web sites use MEDLINE as a draw and an easy way to add content. The National Library of Medicine itself offers free versions of MEDLINE, first via PubMed and then with Internet Grateful Med. Of the two, it is PubMed that has received the greatest attention. The name PubMed is becoming better known than MEDLINE.

Indeed, PubMed brought and continues to bring some significant advantages to its version of the MEDLINE database. First of all, it makes MEDLINE available to the Internet public free of charge. Second, it increases the speed with which recent records became available. PubMed includes records from Pre-MEDLINE, the database with citation information for the most recent articles, but prior to the indexing and abstracting stage. In addition, PubMed gets direct citation submissions from publishers. Thus, PubMed is able to offer a version of MEDLINE which has the most recent citations to articles in the health sciences literature. Taking the idea of article access one step further, PubMed adds links to online versions of full-text articles that are available from certain publishers.

PHYSICAL SCIENCE LITERATURE

But MEDLINE and PubMed only cover the health sciences. The bulk of physical science literature is not included, for obvious reasons, since the producer is the National Institute of Health and the National Library of Medicine. Meanwhile, over in the Department of Energy (DOE) lies another huge bibliographic resource--the Energy Science and Technology Database--and another information agency--the Office of Scientific and Technical Information (OSTI).

Based on the PubMed model, with its focus on the physical sciences and other energy-related disciplines, PubScience (http://pubsci.osti.gov/) was born in October 1999. Available free on the Web to all comers and providing a wealth of citations, the product is aimed at the scientific community. While its initial release is likely to grow and change at the usual Internet pace, the current incarnation provides some nice advantages and raises some issues of concern for the information community.

PUBSCIENCE CONTENT

The stated purpose of PubScience is to "facilitate searching and accessing peer reviewed journal literature in the physical sciences and other energy-related disciplines." Thus, it does not cover all the physical sciences, but rather only that portion considered to be of interest to the energy community. Nor does it try to cover articles beyond the peer-reviewed journal literature.

So what is the underlying database behind PubScience? As with so much of the Web these days, PubScience actually includes information from multiple databases and sources. First of all, a significant portion of the Energy Science and Technology Database is included, with full indexing and abstracts. The archive is listed as going back to 1974, but there are citations available all the way back to the 1800s.

What about that country of origin access restriction when searching the versions of Energy Science and Technology available on Dialog and other vendors? Take a look at the Dialog Bluesheet for File 103 which still states that "Use of this database is restricted to users located within the United States and the following countries: Belgium, Brazil, Canada, Denmark, Finland...."

U.S. ONLY

For PubScience, the Department of Energy purged all records that came from outside the United States and thus avoids the problems with the information exchange agreement restriction. Yet that also means that the PubScience version of Energy Science and Technology is a smaller version of the database. Since the Dialog blue sheet says that roughly 50% of the content is from non-U.S. sources, this is a significant reduction.

On the other hand, following in the footsteps of PubMed, PubScience includes additional content as well. Over a dozen publishers have agreements with PubScience and add citations and abstracts from their publications to the PubScience database. OSTI is negotiating with other publishers so expect this list to grow in the future.

The current list includes the following:

American Association for the Advancement of Science
American Physical Society
Blackwell Science
Cambridge University Press
Institute of Physics Publishing
The MIT Press
Royal Society of Chemistry
Society for Industrial and Applied Mathematics
Springer-Verlag
Taylor and Francis Publishers
The full list of publishers provides citations for over 1,000 periodicals. The vast majority are peer-reviewed journals, although some other publications are included.

THE FUTURE OF EST AND INDEXING

Nowhere in the publicity or the online documentation does PubScience mention the issue of indexing. Energy Science and Technology is a well-indexed database using controlled vocabulary. The publisher contributions of citations do not use that controlled vocabulary. Some of the publisher contributions do, however, have keywords. These are not displayed, but are searchable. Keep in mind, however, that they are publisher-supplied keywords with no connection to the Energy Science and Technology thesaurus.

The publisher connection has the advantage for getting basic bibliographic information into the database quickly. Quality indexing and maintenance of an up-to-date thesaurus takes time and money. Will the general scientific community for which PubScience is intended even notice that some of the records do not have indexing? Probably no more than have noticed that some of the PubMed records have no indexing. In addition, the publisher connections make it easy to link to full-text articles available on the publishers' sites.

For now, EST continues on as before, and new records with full indexing from the controlled vocabulary are added to the database. However, OSTI's budget has been greatly reduced over the past several years. Given the expense of indexing and strong budget pressures, the future of EST as an indexed bibliographic database is in doubt. The agency is currently debating the maintenance of the database, although no decision is expected in the immediate future.

In some information systems, full-text searching replaces subject indexing. On PubScience, the full text of articles is available from the publishers, but only if the user (or the user's institution) has a subscription. But full-text availability does not equate with full-text searching. Only the citations (and abstracts and keywords if supplied) are currently searchable.

THE PUBSCIENCE SEARCH INTERFACE

The search options on PubScience seem driven in part by the conglomeration of content. It has two search options: basic and advanced. The basic search has a search box, a publisher limit, and a full-text limit. The Select Publisher option includes all the participating publishers as well as All, DOE Energy Database, and Archive. Choosing DOE Energy Database limits the search to the EST database and is marked "citations only" to make it clear that there are no links to the full-text articles. While All searches both the EST database and the publisher contributions, it is not searching the whole of PubScience. All only covers the most recent ten years. To get the older records from EST, choose the Archive (down at the bottom of the list) which includes everything over ten years old. There is no option for searching the entire database.

The Advanced Search screen has the same two limits available from the Basic Search. In addition, it provides the ability to search for words in the title, author, or keyword fields. Three fielded boxes are available with options to join them with an AND, OR, or NOT. The Advanced Search also has the option to limit by date.

In the search boxes on both options, search terms can be single words or a phrase. If more than one word is put in a box, they are searched as a phrase. PubScience supports direct use of the Boolean operators AND, OR, and NOT which must be entered in uppercase, but nesting with parentheses is not supported. The asterisk (*) is available for truncation.

Although search sets cannot be combined, follow-up searching is available. At the top of the results set, another search box is available for adding another term that will be searched only within the current retrieval set.

The search results are sorted in reverse accession number order. No user-specified sorting is available. So there is currently no option for sorting by publication date, author, or journal name.

ACCESS TO PUBSCIENCE

In addition to the DOE's Office of Scientific and Technical Information, the Government Printing Office is also a partner in PubScience. According to various documents and press releases, PubScience is available at either http://www.osti.gov/pubsci or http://www.access.gpo.gov/su_docs/. However, at this point in time, the first simply redirects and the second points to the shorter URL of http://pubsci.osti.gov/ which is the easiest way to connect to PubScience.

The current incarnation has a section entitled Related Links. While empty now, the intent of this section is to provide links to commercial, value-added services that can provide supplemental information. For example, it may include links to commercial search services, publisher pages, and document delivery services. Since these links are still under negotiation, there is the potential that the participating companies could be linked in other ways as well.

THE FUTURE AND THE FALLOUT

Are PubScience and PubMed indicators of a trend? Certainly, more and more information resources that used to cost are becoming available for free on the Web. The Encyclopedia Britannica, MEDLINE, ERIC, reference sources, news, government publications, and many others have moved from fee-based systems to the world of free and open access to all. Typically supported by advertising (or by the federal government) the move towards free access is paired with tighter budgets. And it likely is easier to get funding to put an information product online than it is to get funding for continuing the same old labor-intensive indexing and building of controlled vocabularies.

Is this a fundamental move in the industry towards bibliographic and full-text databases that have no subject indexing? With an increasing number of links to full-text articles, will search access move towards full-text searching and away from controlled vocabulary? And if so, should we consider that evolution in the information retrieval business or devolution?

THE DEMISE OF CONTROLLED VOCABULARIES

As with many things, the answer is probably both. Economic and political pressure for free access to information is likely to continue. Given the expense of indexing, that will be a likely area for cuts. So information resources may evolve towards free Web-based abstract services with links to pay per transaction ecommerce sites for purchasing articles. And the same information products may well devolve to become citation and abstract databases without subject indexing. That leaves open the chance for the commercial sector to provide subject indexing as a value-added service, which will then leave it up to us as information consumers to decide whether the additional subject indexing is worth the price when some level of free access is available.

Many of us have bemoaned the lack of controlled vocabulary indexing of the Web, in general. Now budget constraints at OSTI and the momentum of the Web are pushing PubScience to sacrifice indexing for broader access. It is too early to call this a trend, but not too early to be sure that we continue to relay concerns to database providers that they maintain quality subject indexing along with moving towards the Web.

Despite my concerns with this trend away from indexing in favor of quick citation access, the DOE should still be commended for the PubScience product. It freely offers a significant collection of bibliographic citations, many with indexing and others with links to full-text articles. Its search interface is functional, if not yet very sophisticated. It could use enhancements such as the ability to limit to a specific publication and the addition of user-specified sorting by date, author, and source. Even so, PubScience provides an important bibliographic resource. Its status as a free resource guarantees that the physical science community will use it. Expect to see more growth in its use and penetration, just as PubMed has done.


Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, Bozeman, MT 59717-0332; 406/994-6563; greg@notess.com ; http://www.notess.com.