[On the Nets]
Greg R. Notess
Reference Librarian
Montana State University

----------------------------------------------------------------

Email Address Databases


DATABASE, October 1996
Copyright © Online Inc.

Featured Databases
Four11 Directory Service
http://www.four11.com/

Internet Address Finder
http://www.iaf.net/

OKRA
http://okra.ucr.edu/okra/

Usenet Addresses Database
http://usenet-addresses.mit.edu/

WhoWhere?
http://www.whowhere.com/

"Can you help me find someone's email address?" Finding people on the Internet can be a very easy or a very difficult process. A successful search depends less upon the searcher's skill than on whether or not the available databases contain the requested information and whether or not it is accurate. For many years, the standard response to the email address question has been quite simple. Call the person and ask them. Tools such as finger and its derivative netfind provided a beginning for searching, but none of these tools provided comprehensive coverage and there existed no central repository of all email addresses. That repository still does not exist, but a number of databases begin to offer a partial solution to the problem. These databases contain millions of email addresses, phone numbers, and other information. Let's see how they stack up.

LOOKING UP EMAIL ADDRESSES

The smart searcher uses multiple databases and strategies for finding someone's email address. No one source will do, and even using all the databases available, some email addresses will still elude the searcher. If the person's organization is known, one of the best strategies is to try to connect to the organization's Web site to see if they maintain their own local email database. Searching for a friend at the University of Southern North Dakota? Try www.usnd.edu and look for an online directory. Lost contact with a colleague at the Superbig Sprockets Company library? Check www.supersprocket.com for an employee directory. These directories are often the most reliable source available.

Unfortunately, only some organizations put them up while many others do not. In that case, try one of the general Internet-wide databases. These general email databases feature different search options, but they all give results with email addresses as hypertext links. In other words, just click on one of the resulting records to send email to that address. Since the directories want to grow in size and usefulness, people can add new records and sometimes even modify existing ones.

Many people on the Internet have multiple email addresses. This can occur in a couple different ways. First, there can be different ways of referring to the same machine. Sending email to my address, greg@notess.com, will be the same as sending to notess@comp-unltd.com. Both messages would end up on the same machine in the same account. Secondly, some people have more than one Internet account or more than one email address on the same machine. This makes it difficult to accurately count entries in an email database. Those that include every email variant have a larger raw number of records but offer much less precision. With databases such as these, it is usually easy to browse through a longer list of false drops. The more common problem is not finding any hits. For the broadest search, try all of the following email databases.

FOUR11

One of the largest and best organized email address databases is Four11 Directory Service, the self-proclaimed "Internet white pages." More than just an email directory service, Four11 also includes a U.S. phone number database and a much smaller database of NetPhone users. These are separate databases, currently without links between them. Four11 is named after the traditional telephone 411 directory assistance number.

Like many of these directories, Four11 does not go into great detail about the source of the database content. In their FAQ they claim to be "the Internet's largest white page directory with over 6.5 million listings." Further down, they mention the sources as being self-registration, public sources, and auto-registration. To their credit, Four11 has attracted over half a million self-registrations. The public sources are not listed, except a brief statement that it is primarily from Usenet. The auto-registration comes from Internet service providers registering their users. Four11's merger earlier this year with a competitor, LookUp, may also have added to their database.

The email database search options include first name, last name, domain, city, state, and country. One unique feature is the "smart name" check box. This interprets some nicknames and variant forms of first names. Checking this option to search on Bob will result in retrieval including Robert. One negative on the Four11 search is that it automatically truncates first and last names. Searching for a last name of Johns will result in many hits on Johnson and Johnston, though the Johns are displayed first.

Four11 has a voluntary, no-fee registration, which is partly used to build their database. Unregistered searchers are limited to output of 50 names. Registered users can get 100 and have an advanced search capability. Based on registration information, registered users can search the additional categories, such as past organization, research topics, past location, and interests. Best of all, registered users can turn off the automatic truncation by unmarking the "Flexible Search" check box.

The Four11 email directory seems to result in far fewer duplicate entries than some of its competitors. One reason for this is that it will properly interpret a multiterm query as an AND. Other databases default to OR, which makes the search results for "Jane Smith" rather lengthy.

INTERNET ADDRESS FINDER

With about four million entries, the Internet Address Finder (IAF) is a substantial competitor to Four11. Like Four11, it claims public databases as its source, mentioning Usenet specifically. It also has a self registration option. The search fields are first name, last name, organization, and domain. Truncation is available by using the asterisk, but it is not automatic. One additional and very useful search feature that IAF includes is the email address search. This reverse look-up searches on email addresses to provide additional user information.

The IAF results display includes more than just the name and email address. It also can include an organization affiliation and address, the date the record was last updated, and a guess at a link to the organization's Web site. Generally it guesses the URL by simply adding a www to the front of the domain name in the email address. This is often inaccurate.

Also note that the organization field is just the owner of the domain name. That may or may not be the individual's actual organization. The organization address is also not very dependable. Typically it is the address for the department maintaining the Internet connection, not necessarily the main address for the organization itself. While not quite as large as Four11 (at least at the time of this writing), IAF does bring up unique addresses. It also can bring up more alternate email addresses for the same individual than some of its competitors.

OKRA

The strangely named OKRA, subtitled "net.citizen Directory Service," claims well over five million entries in its email directory. Only one search line is provided, not multiple fields like the other services. As their FAQ boasts, "OKRA uses a complex set of statistical relationships to determine which words are most important for a given query. This eliminates the need for word grouping, boolean operators, and the very annoying 'multiple input box' syndrome." Personally, I would rather use Boolean and field searching than trust a "complex set of statistical relationships." Despite the black box approach, OKRA is a substantial collection of email addresses. They should also be credited for including a current statistics box on the top level page, listing the number of database records and the number of searches run.

For simple searches, the single line entry works fine. Yet, OKRA search results contain many duplicate addresses. Therefore, take the five million with a hefty spoonful of salt, since a percentage of those records are duplicates. The results display takes a while to get used to, particularly the annoying, smiling yellow ball. The degree of the grimace represents how close a match to the search query the answer is. OKRA only displays the name, email address, and date first entered, unlike the more detailed display from IAF. OKRA limits output to 50 records.

WHOWHERE?

Yet another email look up contender is WhoWhere?, claiming to be "the largest worldwide directory of email addresses." Yet I could not find even an approximate total number of records in the WhoWhere? database. On some queries, it came close to Four11 in quantity but other searches fared less well. The content, like the others, comes from "public sources available on the Internet" and self-registration.

The WhoWhere? input form takes a middle road between the single line OKRA and the multiple fields of Four11 and IAF. WhoWhere? uses two search boxes: the first for names and the second for "information about the person," including city, state, country, company, or email provider. Except for common names, a simple name search usually succeeds best. Searches can be run as either "all matches" or "only exact matches." All matches is the standard search. The exact match requires that the full name match exactly, including both first and last names. However, the exact match does not need to be exact on search terms included in the second search box. For any search terms in the "information about this person" section, WhoWhere? still treats it as a keyword.

WhoWhere? uses a different approach to searching than the other databases. Results are listed as highly relevant, probably relevant, or possibly relevant (denoted with different colored dots rather than the obnoxious faces that OKRA uses). The latter two categories start using fuzzy matches, where Gibson matches Johnson, Jane matches Janet, and Baker matches Barker, Becker, and Whitaker. WhoWhere? then offers another option, the email wizard, which uses even fuzzier logic to broaden the search. While such search functions are interesting, it is only useful when you are not sure of the name of the person. Most people trying to find an email address usually know the name of the person. It would be interesting to know how many people find useful results in the possibly relevant set.

USENET ADDRESSES

Besides the tools mentioned previously, one longtime standard database to search is the Usenet addresses database. This database contains a clearly delimited subset of Internet email addresses. The database description states that it includes "addresses collected from Usenet postings between July 1991 and February 1996. . .There is no guarantee that the addresses are correct or usable." Thus, only people that have posted a message on Usenet news will be included in this database. While that by itself greatly narrows the scope of this database, remember that many email lists are mirrored on Usenet, so those addresses will be included as well. This database is also one of the primary components of most of the other email directory services.

The mobility of Usenet posters creates both a problem and advantage for this database. The problem is that many people will move on and change email addresses and this database may not include their new one. Many Usenet users are college students. Assuming that some of them actually graduate and move onto a new email address, the database then includes many defunct addresses. To help with this problem, the server displays results in reverse chronological order. Given the displayed date, the user can then make an informed decision as to how reliable the given address may be. In addition, the server tries to link to local directory services from the host. Many of these do not work, but if they do, they provide more current verification of the email address.

The positive side to the mobility factor (if you are not bothered by the privacy implications) is that it can be used to track a person's activities over time. Someone may have posted news in college, on the first job, from an individual account, and from the current job. With all these as separate records in the database, it is possible to make some interesting deductions about an individual's recent life.

WHO'S THE BEST?

No single one of these databases is clearly better than the rest. Certainly, none of them yet approach comprehensiveness. Even combining all these databases probably does not approach half the total number of Internet email addresses. For the few distinctive surnames that I used in comparing these database, Four11 gave the highest number of non-duplicative retrievals. But it was not that far above OKRA, IAF, and WhoWhere?. The Usenet addresses database also gave similar numbers to those three, but it contains more out-of-date addresses.

Using a single distinctive name alone to compare the numbers is misleading. In comparing these results, I excluded false drops due to the automatic truncation and multiple entries for the same person. Raw counts of the number of hits are much different. Each database has unique records that may not be found in the other databases. The most efficient strategy is just to search them one at a time until you find the answer.

PROMOTION VERSUS PRIVACY

Since these databases all accept user added entries, you can help all of them grow and become more accurate. Add your email address and others at your organization. If you want to be sure that email addresses at your organization can be found in these databases, see if the systems administrator has a list of email addresses that can be contributed. That should work for all but the Usenet addresses database. To be found in that one, post something on Usenet. It is not even necessary to take part in one of the newsgroups. Just post a test message to a test newsgroup such as alt.test.

On the other hand, do you want to advertise your email address? Most Web pages are designed for public viewing. Email addresses tend to be a bit more personal. Will your email address be used for direct marketing? Are you going to receive tons of junk mail by registering? Would you prefer not to be found? The email database providers all address this concern. It is a common question in their FAQs. The companies take steps to limit the number of addresses that can be retrieved from the database to discourage abuse by direct marketing companies. They also state that they do not sell their lists.

For the privacy conscious, you can even remove yourself from some of the directories. They prefer that you not do that, but they do provide a removal service. As an alternative, WhoWhere? offers a WhoThere? service. WhoThere? maintains a listing for you in the directory but does not display an email address. Instead, if someone wants to send you an email message, it goes through WhoWhere? first, and they forward it to you.

These options for privacy protection also create the potential for abuse by others. Could someone else go in and remove your email address? Yes, although the company will send an email message to you confirming the deletion. Other possible abuses could include giving misleading or inaccurate listings for others. The companies have begun to grapple with these database integrity issues, but they may not have easy answers. Fortunately, there have not been many reports of abuse yet.

Netscape has had a page for Internet white page directories for some time now (http://home.netscape.com/home/internet-white-pages.html). With Navigator version 3, it has been renamed People and is on the directory buttons as well as in the drop down Directory menu. It includes the databases mentioned here, along with other email databases, phone number databases, and home page directories. Yahoo includes even more under Reference: White Pages: Individuals (http://www.yahoo.com/Reference/White_Pages/Individuals/). Even as the universe of email addresses continues to explode, these email address databases begin to provide a means for searchers to track down at least some portion of the uncounted millions in use.

----------------------------------------------------------------

Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, P.O. Box 173320, Bozeman, MT 59717-3320; 406/994-6563; greg@notess.com ; http://www.notess.com.

Copyright © 1996, Online Inc. All rights reserved.