[Photo]

Greg R. Notess
Reference Librarian
Montana State University

ON THE NET

Searching for Current News

DATABASE, June 1999
Copyright © Online Inc.





...the general Web search engines are not very effective for searching news sites.
News stories are available for free on the Internet from many sites. The major U.S. news media companies offer a substantial number of breaking stories on their Web sites while countless community news media resources offer local coverage. In the past six years, the information landscape has changed from one in which the only sources of free news were local radio and television stations to the Web and its wealth of local, national, and international news resources.

Most of the portals, search engines, and subject directories offer top headlines, stock quotes, sports scores, and other popular news items. Finding breaking news and popular topics is general straightforward on the Web of today. But what about a searchable database of news? What is available on the Internet for free that provides the most current and some portion of a back file of news articles? And how do these free Web products compare to the commercial news databases?

NEWS DATABASES

Although professional searchers have several databases at their disposal, note that the general Web search engines are not very effective for searching news sites. Since news sites are usually updated daily, and most of the general Web search engines do not crawl and index such sites that frequently, they will not include the most current postings. Sometimes they are useful for finding older articles. However, when general Web search engines include records for older articles in their database, too often either the link has moved or the page is just no longer available online for free.

Instead, searching for news beyond today's top headlines calls for the use of specialized, searchable news databases. TotalNEWS, News Index, Excite's NewsTracker, Northern Light's Current News, and the news sections on Yahoo!, HotBot, and Infoseek all offer such a searchable database.

All these news databases index freely available news stories published on the Web by a local newspaper, network television news, or other recognized news sources. In many cases, these Web-accessible stories duplicate what would be found in the print (or other) media, but in some cases these sites offer Web-only stories.

These news databases are built just as the general Web search engines build their databases. Spider programs visit all the designated news sites and index the available pages. The news database spiders are programmed to visit far fewer sites, but they visit them much more frequently to be able to keep up with the most current news stories. Some news databases cover newswire stories. Rather than being indexed by a spider traveling to external sites, the wire stories may be loaded directly and then indexed.

TotalNEWS

TotalNEWS (http://www.totalnews.com) is one of the two search services not associated with the main search engines and portals. TotalNEWS indexes hundreds of Web sites from news media publishers and has one of the most extensive back files of any of these types of databases. An occasional article from over a year ago may show up, although it is not clear which of their sources' archives are searchable and for how far back. Note that the date displayed on the results screen is the day the spider found the article, not the date of the actual story itself. While this is more extensive than the other free Web news databases which only cover a week or a month, it is nothing close to the archive coverage from NEXIS, NewsBank, or other commercial news databases.

TotalNEWS defaults to a Boolean AND operation on searches with more than one term. No OR, nesting, or + or - symbols can be used. Phrase searching is available with the unofficial standard of surrounding the phrase with double quotes. TotalNEWS also exhibits some peculiar behavior. After finding several records on one search, I tried searching for a phrase displayed by one of the results. That secondary search found nothing, not even the record found by the previous search. Apparently, the TotalNEWS spider does not successfully index the full content of all the pages in its database.

News Index

News Index (http://www.newsindex.com) is the other news search engine without ties to a general portal. I mentioned this service in my March 1999 ONLINE column for its current awareness tool: News Index Delivered. Yet News Index can also be searched directly. I see a political bias on its top page, but the search engine side does not reflect any bias in its results. It covers numerous news resources, but, unfortunately, provides no list of the sources. On the positive side, News Index does state on the top of the results screen the last time sites were indexed. Look for the "current catalog created at..." statement.

News Index does not display dates, but the documentation specifically states that it is not an archive. News Index is designed to help find current articles. Even so, it appears to have about a week's worth of stories. Given the lack of a list of sources and only a relatively recent time frame, News Index can delivers surprisingly high numbers of hits when compared to the archive available from TotalNEWS.

Northern Light's Current News

Northern Light provides free, searchable access to the past two weeks of over 70 newswires in its Current News (http://www.northernlight.com/news.html). The older newswire stories are available in their fee-based Special Collection. Northern Light Current News is updated every 15 minutes, more frequently than any of the other searchable news databases listed here. However, it differs from some of the other services in that Northern Light indexes newswire stories only. This is not an index of news Web sites.

Searching under the Current News tab works like other searching in Northern Light. Full Boolean and phrase searching, the + and - symbols, and truncation are all supported. There are no stopwords and no case sensitivity. Results can be sorted by date or relevance, and broad subject categories can be used as limits. With its 15-minute updates, Northern Light's Current News is the most up-to-date of these searchable news databases, at least for the newswire stories.

HotBot News Channel

The HotBot News Channel (http://news.hotbot.com) serves up a searchable database of almost two dozen of better known news Web sites, including CNN, CBS News, ABC News, MSNBC, ESPN, the Los Angeles Times, USA Today, the Washington Post, and the New York Times. It refreshes its news database every 30 minutes (although it will sometimes misreport the times on stories, labeling them with a time in the near future). As such, it is the most current database for news from media Web sites.

On the back end, it covers date searching up to a month of archived news. Search features include almost full Boolean searching, supporting AND, OR, and nesting. Despite the advice above the search box, the NOT operator results in an error message. Truncation, phrase searching, and date sorting are available. So while the HotBot News Channel covers neither the most sites nor the oldest material, it has good searching capability and the most current of the news Web site indexes.

Infoseek News

The Infoseek headline news page (http://www.infoseek.com/news) offers several different news databases to search: News Wires, National News, and News Web Sites. The search syntax and capabilities are the same for each, but the databases are different. Look at the bottom of the page below the links to popular stories for the search box with the choice of databases. The News Wires database includes wires, such as Reuters, PR Newswire, and Business Wire. The National News database is more like the other news databases that index major media Web sites. It covers resources, such as ABC News, CNN, USA Today, and MSNBC. The News Web Sites database, which sounds more like what their National News database actually is, appears to simply run a search in the Infoseek directory with no particular focus on news. Use the News Wires and the National News databases for real news searching.

The search features are like the general Infoseek features. It supports case recognition, the + - system, and phrase searching, but has no Boolean operators or truncation. Unlike most of the other news databases, there are no stopwords in Infoseek's databases. The Advanced News search offers several breakdowns of the News databases, including separate searching of Reuters, PR, and Business Wire. There is an option of All News Sources which sounds like it would combine both the News Wires and the National News databases. Unfortunately, it actually just searches the News Wires database and not the National News sites.

Excite's NewsTracker

While Infoseek comes close to combining a searchable database of newswires and news Web sites, it so far has failed to integrate the two news sources into a single search. Excite's NewsTracker (http://nt.excite.com) now does just that. NewsTracker originally only searched the most current pages of news media Web sites. Now it includes some of their archived pages. In most cases, the archive depth seems to range from a week to several months, although some older records are in the database.

More recently, Excite added newswires to the search as well. These are separated and displayed above the news Web sites. The wires include only very recent news. By putting them on the page with the Web News for older coverage, Excite offers a broad spectrum of news resources.

Yahoo! News

Yahoo! News (http://dailynews.yahoo.com) also combines newswires and Web sites, but its main emphasis is the wires. Unlike Excite, Yahoo! fails to provide a list of sources. Search results do cite sources, such as Reuters, AP, PR News Wire, Business Wire, with an occasional news Web site, such as the Washington Post.

Searches on Yahoo! News default to an AND, but no Boolean operators can be used. The + and - symbols and phrase searching are available. Yahoo!'s News section is designed more for browsing than searching, but it functions as both.

SEARCH FEATURES

The search features chart summarizes many of the features of these news search engines discussed here. The Default column lists what happens when two or more terms are entered without Boolean operators or other special punctuation. The Boolean column lists Full for those that support all three basic Boolean operators and nesting. That column also mentions when a + requires a term (AND) and a - excludes a term (NOT). Only Infoseek has case sensitivity, which is only invoked if one or more letters are entered as uppercase. The Dates column shows how far back the database goes, while the Updates column refers to the frequency with which the index is refreshed. The Source column identifies which of these news search engines build their database(s) from locally-loaded newswires or from external media Web sites or both.

Some of these news databases are integrated, to a degree, with general search engines. Excite and Yahoo! present easy access to their news search engine results from the results of a general search. Infoseek takes a slightly different approach in that a news search can be chosen from the main Infoseek search box on the top of the Go page. Northern Light and HotBot have separate links for their news searches.

SIZE COMPARISONS

Considering the variable update frequency, availability of back files, and number of news sources, how do these various news search engines actually perform? Since Yahoo!, Northern Light, and TotalNEWS automatically search both plural and singular word forms, an accurate comparison needs to use terms without plurals or other word variants. For a quick comparison, I tried the terms shown in the accompanying chart. All were entered in lowercase. These four search examples demonstrate a wide variation in results. Northern Light found the most for "maritime" but relatively few for "kalamazoo." TotalNEWS scored well for "mongolia" but closer to the bottom on "tosco."

In short, the news search engines show no consistency in quantity of results. Since they index different sources and cover different periods of time, it might be expected to find one with considerably more coverage than the others. But if these four search examples are representative of the larger databases, it varies depending on the search term and the date of the search as well.

OTHER STRATEGIES

For a metasite providing links to individaul news archives on the Web, try the SLA News Division's "Newspaper Archives on the Web."
As with all kinds of Web searches, none of these are comprehensive. In comparing news search engines to commercial news databases from NEXIS, Dialog, or Dow Jones, the commercial ones have much more coverage, especially in terms of the older news stories. With more powerful search features and larger databases on the commercial news sources, why even bother with the free Web news search engines?

For one reason, some of the Web-based news resources are not indexed by the commercial databases. And they are free. While certainly not yet up to the standards of their commercial cousins, they are improving in both search features and in database size, scope, and coverage. Northern Light offers very current newswires while HotBot's News provides frequent indexing of Web-based news sites. Both can sometimes be more current than many of their commercial cousins.

One of the biggest drawbacks with these searchable databases of news is their slim coverage of older material. Yet more of the news sites, from Pathfinder's Time to local newspapers, are establishing ever-deeper archives. Since none of the search engines cover all the archives, they need to be searched separately. Just go directly to their Web site.

For a metasite providing links to individual news archives on the Web, try the SLA News Division's "Newspaper Archives on the Web" (http://metalab.unc.edu/slanews/internet/archives.html). Arranged by state, the basic table format features the name of the paper, its city, links to the archives, date range of the archive, and cost, if any.

News is a popular information commodity on the Web. While many sites offer top headlines for browsing, researchers looking for more details can delve into these news search engines for a look beyond the headlines.


Featured Sites

Excite's NewsTracker
http://nt.excite.com

HotBot News Channel
http://news.hotbot.com

Infoseek News
http://www.infoseek.com/news

News Index
http://www.newsindex.com

Newspaper Archives on the Web
http://metalab.unc.edu/slanews/internet/archives.html

Northern Light's Current News
http//www.northernlight.com/news.html

TotalNEWS
http://www.totalnews.com

Yahoo! News
http://dailynews.yahoo.com


Basic Search Features and Database Scope of News Search Engines

  Default Boolean Case Phrase Dates Updates Source
TotalNEWS AND AND No Yes Year hours Web sites
NewsIndex OR AND, OR No No Week hour Web sites
Excite's News Tracker OR Full, +, - No Yes Months hours Wires & Web
Northern Light's Current News AND Full, +, - No Yes 2 weeks 15 minutes Newswires
HotBot News Channel Phrase AND, OR, ( ) No Yes Month 30 minutes Web sites
Infoseek News OR +, - Yes Yes Month hours Wires & Web
Yahoo! News AND +, - No Yes Week hours Wires & Web


Number of Results from Each News Search Engine

search term: mongolia kalamazoo maritime tosco
TotalNEWS 94 31 43 16
News Index 6 19 58 20
Excite's NewsTracker 13 42 69 7
Northern Light's Current News 42 12 117 20
HotBot News Channel 37 18 79 38
Infoseek National News 7 36 20 6
Infoseek Wires 23 3 61 11
Yahoo! News 7 5 40 25
Highest and lowest number of hits are in bold.


Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, Bozeman, MT 59717-0332; 406/994-6563; greg@notess.com ; http://www.notess.com.

Copyright © 1999, Online Inc. All rights reserved.