[Photo]

Greg R. Notess
Reference Librarian
Montana State University

Web Wanderings

Finding Online Government Publications

EContent, June 2000
Copyright © Online Inc.





Since U.S. government documents are made freely available without copyright restrictions, the Web is a natural and convenient avenue for dissemination.
In the brave new world of the Internet, online information comes in many forms. The ability to link from section to section within a Web page, combined with the limited space for Web-page reading that is available on a typical computer monitor, makes for a very different kind of presentation of material. Information displayed on a Web page is often most easily absorbed when it is in a bulleted list of points or outline form. A full-length article presented on a Web page may be broken into sections, so that users only have to scroll down a few screens. (This also allows for increased advertising, as ads can be placed around an article that has been split up onto several Web pages.)

Yet for all the information available on such Web pages, there are many times when someone needs more in-depth information, details, and lengthy explanations--material that would be available in full-length reports, books, articles, and other such documents.

And the Web contains many such documents. Especially as PDF versions, but also available in HTML, Word, and many other formats, long full-text publications have been tossed up on Web sites. The U.S. federal government is one significant source for such documents. Since U.S. government documents are made freely available without copyright restrictions, the Web is a natural and convenient avenue for dissemination. While commercial publishers are also moving toward providing more substantial publications on the Web, these are often only available to those willing to pay a subscription fee.

As more and more U.S. government publications of all kinds are made available on the Web, the challenge then becomes to find these documents. Like the rest of the Web, they shift locations, servers, and format. Some of the documents disappear completely, while others just become unlinked from other pages on the site. Others maintain the same URL year after year.

Fortunately, there are several excellent sources for trying to track down the online locations of these government publications. A look at several of the best sites, combined with some knowledge of general techniques, can greatly simplify the online hunt.

BET ON THIS SITE

The U.S. Government Printing Office (GPO) offers several important Web sites. One of the better ones is the unimaginatively named Browse Electronic Titles (BET), with an unwieldy URL (http://www.access.gpo.gov/su_docs/dpos/btitles.html). The page lists online government documents available on the Internet including both monographic and serial titles. It focuses on documents available on official federal agency sites.

The list is divided by agency, with no search capability beyond that built into the browser. In each agency section, titles are listed alphabetically. Each entry includes title, item number, Superintendent of Documents (SuDoc) number, and a link to the resource. Many of the links use Permanent Uniform Resource Locators (PURLs) rather than pointing directly to the site itself. For more information on the GPO's use of PURLs, see its information page and searchable database (http://purl.access.gpo.gov/).

While BET is not the easiest tool to use when searching for a specific document, it provides an excellent overview of the kinds of documents available online, the range of document types, and the diversity of locations.

OTHER SEARCH TOOLS

For alphabetical access by agency name and by title, there is a non-government site called Uncle Sam Migrating Government Publications (http://www.lib.memphis.edu/gpo/mig.htm). It is shorter than BET, since it only includes serial and periodical government documents. On the other hand, it includes both official and unofficial sites that may host such publications, and is thus in some ways broader than BET.

For keyword searching beyond simply browsing for a title, there is the classic government documents bibliographic database, the Monthly Catalog of Publications (http://www.access.gpo.gov/su_docs/dpos/adpos400.html), known online as the Catalog of U.S. Government Publications. It includes bibliographic records for both print and online government publications. Like BET, it often uses PURLs.

For a more comprehensive search, the Monthly Catalog offers a variety of search options. It even displays full MARC records and a link to locate depository libraries that should own a specific document. The version of this database on GPO's site only goes back to 1994, but when looking for documents on the Web, that is just about the right time frame.

However, the Catalog of U.S. Government Publications is neither comprehensive in its coverage of online documents nor is it completely up-to-date. Try a title search on world factbook and look at the variety of URLs. Some work while others point to dead ends. The current URL is available via the PURL and at least one of the records.

While these sites are excellent starting points, other approaches will turn up additional online government documents not found through any of these sites. Plenty of agencies post documents that never find their way into the depository library system. Others put up online versions of their publications, but never inform GPO of the location or their availability.

SEARCH STRATEGIES

The general Web search engines may help to locate some of these online documents. However, bear in mind that many documents are made available in Adobe PDF format or even in some other formatted file type. Because of this, the text within the documents cannot be found by using a general search engine. The title of the document may be linked on a Web page, so it could be found with a search engine, but some agencies only link their documents through the report numbers or even via a local bibliographic database.

If the links are contained in a bibliographic database file, those records may well be unseen by the search engines, since they would be behind a script or form interface. For the same reasons, such documents would likely also not be found using any site search that the agency might have available.

In working on updating my book, Government Information on the Internet, I was able to observe firsthand the variety of locations where federal government sites put their online documents. Based on that experience, there are several standard places to look. In all these cases, the first step of the process is to identify the agency and find its Web site. Once connected to the agency's Web site, the hunt begins. In many cases, it is fairly obvious to find the proper section. Links under Publications, Documents, or Reports are certainly the first place to look. However, even when such sections exist, they often do not contain all the online documents.

WHERE TO LOOK

For example, a Publications page may consist of a bibliographic list of published articles from the agency's researchers. Some of these lists have no full text available online. Other government sites simply make their publications catalog available. Plenty of sites do include full-text online documents in such sections. But even for those that do, there may be additional publications available on the site in other sections as well.

Press releases are the most obvious example. Often listed under News, Press, What's New, or Public Affairs, government press releases can contain substantial information content. In addition, these sections may include additional publications beyond press releases. Some government sites that only have a publication catalog on their Publications page actually have some online publications, but only on the press release page.

If you know a particular publication's title, try searching for that using the local site-search capability. Also, look carefully at publications lists or a catalog. Some link to a few online documents in the midst of a long list of print publications. Technical reports might be located under a Research section. Consumer information pamphlets might only be available under a section for the public. Agency annual reports are often available on sites but not included via the links listed earlier. Some sites stick their annual report under an About the Agency or similar section.

FOIA

In the past few years, many government sites have added a Freedom of Information Act (FOIA) or an Electronic Reading Room section. With the requirements of the Electronic Freedom of Information Act Amendments of 1996, government sites have created these sections. How they are actually used varies greatly. Some sites simply include information about the FOIA regulations and how to make a FOIA request to the agency.

Meanwhile, others sites gather all of their publications under this section, especially in some of those using the Electronic Reading Room label. They include lists of online publications, print publications, press releases, reports, and other documents, along with the FOIA instructions.

THE DATABASE APPROACH

SearchGov.com and SearchMil.com are search engines that limit their crawling to their respective top-level domains of .gov and .mil.
Some of the larger research- oriented sites have extensive collections of online information, with everything from multimegabyte data sets to online image archives to a variety of databases.

When searching for an online version of a technical report from a researcher at the NASA Lewis Research Center, none of the usual site locations had that particular report available in full-text online, even though other documents were available. With some more digging, one database of technical reports looked like a possibility for at least providing more bibliographic details about the report. Like many bibliographic databases, there was no browse access to the records. Instead, they needed to be searched. This meant that the information in the database would be inaccessible via any regular search engine.

On this site's page for the database, there was little description of the content of the database. Only after running the search did it become clear that the database not only provided bibliographic information about the report, but also a link to a full-text PDF version housed on the site's FTP server.

BACK TO THE SEARCH ENGINES

Tracking fugitive online documents can require flexibility in approach. If all these strategies fail to track down the document, try the search engines again, searching for title words of the publication at several levels. Start with the agency's own site-search capabilities. Then, if it is a subsidiary office, try a site search on the parent agency's site.

Next, try a search with some of the largest of the general search engines such as Fast, AltaVista, and Northern Light. Sometimes, they will find pages on a site that the agency's own site search fails to uncover.

If it is a document that may have been available in the past and has for some reason been de-linked or removed altogether, try using a search engine such as Google that keeps cached copies of older pages. For government sites, there are additional alternatives for retrieving copies of older pages. SearchGov. com and SearchMil.com are search engines that limit their crawling to their respective top-level domains of .gov and .mil. Like Google, both keep cached copies of the pages they crawl and make those pages available with the search results.

Looking at the older, cached version of a page may find a copy of a particular document that was formerly available, or one that is on a Web server that has crashed. The older pages can also sometimes give clues as to a possible new location for a site or individual document.

As the government turns more and more to the Web for providing online access to its many publications, using sites and strategies such as these will become even more essential for tracking down documents. While agencies and their Webmasters pay close attention to the location and availability of their latest reports, the ones from last year may be forgotten. The links to them may disappear, even while they remain at the same location. They may be moved to new directories or into an online database.

As the Web ages, sites are constantly revised, and new agencies arise out of the old ones. Some agencies, such as the Environmental Protection Agency in February, temporarily take their entire Web site offline. Military sites are cutting back significantly the information they make available on the Web, citing security concerns. The bibliographic result: more and more documents begin to go fugitive. In the long run, new initiatives may find ways to ensure ready access to the publications for years to come. In the meantime, using the available techniques may begin to bridge that gap, and help find those lost, online full-text government documents.


Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, Bozeman, MT 59717-0332; 406/994-6563; greg@notess.com ; http://www.notess.com.