As of March 25, 2004, AlltheWeb no longer uses its own database from FAST Web search. Instead, it is using new owner
Yahoo!'s database. Many of
the advanced features listed below are now gone.
This review will continue to be available to provide an historical
record of the search capabilities AlltheWeb used to have.
Originally from Fast Search & Transfer, AlltheWeb
was their
demonstration site when it launched in May 1999. On Dec. 7, 1999 it introduced its Advanced Search,
and in 2003 it added full Boolean searching. Now owned by
Overture (who also
owns AltaVista), they expect to merge the two databases later in 2003, but the
search site may remain as a separate destination with separate search features. Use the table of contents on the left to navigate this review.
The FAST databases are also available at sites like Lycos. See the FAST Review for more information.
Strengths:
* It's big, fast, and search-focused
* Full Boolean searching, no stop words, and other advanced features
* Indexes full Web page, PDF files, Microsoft Word files, text within Flash files,
and other file types
* Database frequently refreshed
* Customization
Weaknesses:
* No cached copy of pages
* Not as well-known as others
* It lacks proximity operators and truncation
Default Operation:
The default setting is an "all of the words" search, so multiple search terms are processed as an AND operation by default. In the Advanced Search, the default was "any of the words" but has since changed to an AND with "all the words."
Boolean Searching:
Full Boolean searching is available from the Advanced Search
page but only if the "boolean expression" option is chosen from the top
drop-down box that defaults to "all of the words." For operators, AlltheWeb uses
'and,' 'or,' and 'andnot.' Note that the NOT operator is the relatively unusual
'andnot' with no space. Search terms can be nested using parentheses. Operators
can be in lower or upper case. If no Boolean operator is used between multiple
words and they are not searched as a phrase, AlltheWeb will give zero results.
Another option when "boolean expression" is chosen is to use a 'rank' operator.
This is supposed to boost the importance of records containing that term,
although some strange results can happen when using it. The example they give is
florida and golf andnot "Arnold Palmer" rank LPGA.
From all other search forms,
AlltheWeb allows the use of a + for AND, - for NOT, and multiple words in parentheses such as (term1 term2) for OR, but no Boolean operators. It has a drop down menu with choices for "all of the words," "any of the words," or "the exact phrase" in the advanced search and that can also be chosen via the customize link for the basic search as well. If no + or - marks are used and the drop down menu is not changed, the search will run an automatic AND operation. The Advanced Search offers several lines for adding additional terms or phrases connected to the search as "Should Include," "Must Include," or "Must Not Include"
but note that multiple words entered in those boxes will be automatically
treated as phrases with no way to turn that part off.
Proximity Searching:
Phrase searching is available by using "double quotes" around a phrase, by marking the "exact phrase" check box, or by choosing "the exact phrase" from a drop down menu.
In the Advanced Search under Word Filters, the drop down box at the far right allows four field searches. Use the drop down box on the right in the Word Filters sections and then select the field to search within. As of July 2001, the in the link name
option was removed, and command line versions of these field searches were
introduced. Below is a list of fields available (see the
Query Language
section of their help pages for more details).
Command Line
Drop Down
Explanation
no prefix needed
The default search.
url:
Pages have the term(s) somewhere in the URL (host name, path, or filename).
link: or link.all:
Pages that link to this URL or portion of a URL.
title: or normal.title:
Hits have the term(s) in the HTML title element.
site:
none, under domain filters instead
A better, more exact match for the domain name. Introduced in Sept. 2002, the site:
command is shorter to type, more common at other search engines, and more of an exact match. For example, site:www.total.com finds different results than site:www.total.com.au.
The site: command can be used with two additional operators, the
carat ^ and the asterisk *. The ^ anchors the domain while the * unanchors. In
other words, site:^total.com will no match either www.total.com or
total.com.au. And site:*total.com* will match total.com,
www.total.com, and total.com.au. The * and ^ can be used within the same query,
and the default is to have the end anchored but not the beginning as in site:*total.com^
language:
Specify a language limit, using two letter language codes.
filesize:
Under results restrictions
Specify file size or each of the results. Can be exact number
or range (use <, >, and [ ]).
filetype:
Under results restrictions
For the file type limit, use pdf, msword, flash, rtf,
powerpoint, excel, postscript, wordperfect..
The following are no longer documented, but
still work as of Aug. 2003:
url.all:
Same as url: above. Pages have the term(s) somewhere in the URL (host name, path, or filename).
url.domain:
Pages with the specified term anywhere in the domain name.
url.tld:
none
Pages within the specified top level domain.
url.host:
none
Pages with the specified host name.
normal.titlehead:
none
Hits have the term(s) in the HTML title element or elsewhere
within a HEADER tag.
link.extension:
none
Matches pages that contain files with the specified
extension.
The following field search was available from the advanced search prior to July 8, 2001 but is no longer available as a drop down choice or as a command line option.
Drop Down
Explanation
Pages that have the term(s) in the anchor or linked text.
Limits:
AlltheWeb has limits for language, domain, regional domains, date, embedded
content, and file types. For the language limits, 49 different languages are available. This is more than any other search engine. Only one language limit at a time can be used in the basic and advanced searches, but through the customize option, multiple languages
(up to 8) can be chosen as limits. AlltheWeb will also identify searchers based
on their IP address and default to the main language or languages of that
country plus English. This will appear on the simple search form as the default
language limit with an "Any Language" option as well. To change the default,
just click on the language to go directly the
language section of the preferences pages. Here is the full list as of
Aug.
2003:
Afrikaans
Albanian
Arabic
Basque
Bulgarian
Byelorussian
Catalan
Chinese (simplified and traditional)
Croatian
Czech
Danish
Dutch
English
Estonian
Faeroese
Finnish
French
Frisian
Galician
German
Greek
Hebrew
Hungarian
Icelandic
Indonesian
Italian
Japanese
Korean
Latin
Latvian
Lithuanian
Malay
Norwegian
Polish
Portuguese
Romanian
Russian
Serbian
Slovak
Slovenian
Spanish
Swahili
Swedish
Thai
Turkish
Ukrainian
Vietnamese
Welsh
The Advanced Search also has Domain Filters which can be used to limit to pages at a specific domains as well as to exclude pages from specific domains. Several can be included in either box separated by spaces. It handles top level domains like .com, host names like name.org, and three part host names such as science.youru.edu.
They also have added a regional domain limit for areas like South East Asia and
Europe based on top level domain groupings.
In 2001 the advanced search added limits for date and document size. AlltheWeb
is the only search engine with a document size limit. The date limit was changed
in Sept. 2002 to accept any dates. Previously it only offered the following
choices:
last month
last 3 months
last 6 months
last 9 months
last year
In Sept. 2002, the following limits were added to the advanced search page. There is a new "Embedded Content" option for finding pages that contain (or do not contain) the following file types: images, audio, video, RealVideo or RealAudio, Flash, Java applets, Javascript,
and VBScript.
File format limits are available in the advanced search for PDF files, Flash files, and Microsoft Word files.
They expanded to other file types in the summer of 2003. The following syntax can be used directly in the search box:
filetype:pdf
filetype:flash
filetype:msword
filetype:rtf
filetype:powerpoint
filetype:excel
filetype:postscript
filetype:wordperfect
filetype:staroffice
Unique: AlltheWeb is the only search engine that offers a page size limit, an IP limit, and the ability to search text in Macromedia Flash files.
Stop Words:
All words are searched. There are no known stop words. However, some query rewrites may discard common words from the query.
Sorting:
By defaults, sites are sorted in order of perceived relevance. Only one page per domain is displayed, unless the Customize option for site collapsing is turned off. Sites clustered under the one page per domain are not marked, and most users will not realize that more hits from that domain might be available.
To change the ranking, you can use the advanced search, select "boolean
expression," and add a rank keyword to boost records containing whatever keyword you used. There is no option for sorting alphabetically or by date.
Display:
AlltheWeb displays the title, a three line keyword-in-context (KWIC) extract, a description from the META Description tag or the Open Directory (if either exist), the URL, and file size for each hit. In the regular search, AlltheWeb only displays 10 records at a time. In the Advanced Search at the bottom, the user can select to see 10, 25, 50, 75, or 100 results at a time. Or through the "customize" link, the number of results can be set as a preference.
Other Notes: AlltheWeb introduced a variety of quick links, bookmark
shortcuts, and search options for Internet Explorer, Netscape, Opera, and Mac OS
/ Sherlock from their
Search Tools
page. You can use these tricks to search AlltheWeb directly from the address
box, by highlighting a term on a Web page and then clicking a bookmark, and
more.