[an error occurred while processing this directive] |
|
Review of
Last updated Jul. 12, 2023.
by Greg R.
Google has become for many the pre-eminent Web search engine. In Feb. 1999 it moved from Alpha test version to Beta and on Sept. 21, 1999 it officially launched.
Since that time it has made its mark with its relevance ranking based on link analysis, cached pages, and aggressive growth. Since its beta release, it has had phrase searching and the - for NOT, but it did not add an OR operation until Oct. 2000. In Dec. 2000, it added title searching. In June 2000 it announced a database of over 560 million pages, which grew to over 600 million by the end of 2000 and then 1.5 billion in Dec. 2001. The 2+ billion reported on their home page as of April 2002 includes indexed pages, unindexed URLs, and other file formats.
By Nov. 2002, they moved their claim up to 3 billion, and in Feb. 2004 it went
to 4 billion. Use the table of contents on the left to navigate this review.
Databases:
- Web: Indexed Web pages (also includes
URLs that it has not fully indexed) and additional file types in the Web
database include PDF, .ps, .doc, .xls, .txt, .ppt, .rtf, .asp, .wpd, and more. See Google Database Components for more details.
- Ads: Paid advertisements usually shown on the right side (or top) under a
"Sponsored Links" heading
- Images: Picture database
- Groups: Usenet news database
- News: Past 30 days of
Web-based news sites
- Directory: A version
of the Open Directory
with entries ranked in Google's PageRank order
- Froogle: Shopping
and product search (still in beta as of March 2004)
- Catalog Search:
Scanned were all still in beta as of Aug. 2003.
Google also has a PageRank version of the Open Directory, and above their regular results, hits from their own news headlines database, stock quotes,
calculator/conversions, and a phone number database may display. In addition it offers several specialized subsets: a government database of the
.gov and .mil sites; University searches; a Linux search; an Apple/Macintosh search; and a Microsoft search.
The Google database is used by AOL, iWon, at Netscape's Search
site. Yahoo! dropped Google in
Feb. 2004
after switching from Inktomi to Google in July 2000
and then reaffirming and more closely integrating Google results in
Oct. 2002.
BBCi used Google from May 2002
until March 2003 when they switched to Inktomi.
Strengths:
* Size and scope: It is now the largest, and includes PDF, DOC, PS, and many other file types
* Relevance based on sites' linkages and authority
* Cached archive of Web pages as the looked were indexed
* Additional databases: Google Groups, News, Directory, etc.
Weaknesses: See also the Google Inconsistencies Page
* Limited search features: no nesting, no truncation, does not support full Boolean
* Link searches must be exact and are incomplete
* Only indexes first 101 KB of a Web page and about 120 KB of PDFs
* May search for
plural/singular, synonyms,
and grammatical variants without telling you
Default Operation:
Multiple search terms are processed as an AND operation by default. Phrase matches are ranked higher
Boolean Searching:
Google uses an automatic Boolean AND between terms and has slowly been moving towards more Boolean support; however, it does not yet support the AND operator, NOT operator, or full Boolean searching with the ability to nest operators. In Feb. 1999, Google added the - symbol to perform a NOT function. In Oct. 2000, they added the ability to use an OR (which must be in upper case) to do some Boolean OR operations. See the Boolean Searching on Google page for more details on how to get Google to do certain kinds of Boolean searches.
The + used to be able to be used to require a term, but since the default operation was AND, the + was never really needed and for a while caused the following message to appear:
Google always searches for pages containing all the words in your query, so you do not need to use + in front of words.
However, the + can be used for forcing a search on stop words
and for requiring Google to search for only that exact term without any possible
plural/singular, synonyms, and grammatical variants.
Proximity Searching:
In Feb. 1999, Google added phrase searching designated in the usual manner by enclosing the phrase in "double quotes." Google also detects phrase matches even when the quotes are not used and usually ranks phrase matches higher. No other proximity searching is
directly available. However, using the wildcard word within a phrase trick
described below, the unofficial
Google API
Proximity Search tool can reproduce proximity searching up to a distance of
3 words.
Truncation:
No truncation is available. Some
automatic plural searching
and word stemming occurs for English words and can be turned off by using
the plus sign in front of each term that should not be stemmed. However, within phrases, there is a trick which can be used for a wildcard word. Use an asterisk * within a phrase search to match any word in that position. So, for example, to find "a little neglect may breed mischief" when you are not sure of the second to last word, search "a little neglect may * mischief" .
Multiple asterisks can be used as in "a little * * * mischief" . This is the only way Google supports a wildcard symbol.
While not exactly truncation, the synonym operator of a tilde ~ before a search
term, with no space, to tell Google to look for synonyms. So a search on
yosemite ~trails will find pages that have terms like 'hiking,'
'rides,' and 'maps.' This synonym finder will sometimes include plural,
singular, or other grammatical variants as well. So the earlier search also
found matches with 'trail' and 'trailer.' So the ~ can be used to get something
a bit closer to truncation but not very. Bear in mind that the ~ only works in
Google's Web database and only for English language terms.
Case Sensitivity:
Google has no case sensitive searching. Using either lower or upper case results in the same hits.
Field Searching:
Google offers several field searches connected with entering URLs. In the December 2000 revision of its advanced search form, it add several title and URL field searches.
Note that most field searching cannot be combined with other query words. In others words, a search entered such as uniqueword link:name.com will only be processed as if only the field search was present as in link:name.com . The uniqueword
is ignored. The intitle: and inurl: fields can be combined with other search terms
but allintitle: and allinurl: cannot.
Field | Explanation |
intitle: | Finds pages that have the term(s) in the HTML title element. Can be combined with other search terms. intitle:search engines.
This should find 'search' in the title and 'engines' anywhere in the page. |
inurl: | Finds pages that have the term(s) somewhere in the URL (host name, path, or filename). Can be combined with other search terms. inurl:searchenginewatch. |
allintitle: | Finds pages that have the term(s) in the HTML title element. allintitle:search engines. |
link: | Finds pages which contain hypertext links to the exact specified URL. link:notess.com/search finds pages with links to this site. |
allinurl: | Finds pages that have the term(s) somewhere in the URL (host name, path, or filename). allinurl:searchenginewatch. |
site: | Finds pages from
the designated Web site. Paths and file names cannot be included. site:notess.com |
allinanchor: | Finds pages that have the term(s) somewhere in the links to the page. . |
related: | Invokes GoogleScout to find other pages similar in linkage patterns to the given URL and at a similar hierarchical level. The URL must be exact. In other works related:notess.com and related:www.notess.com find different results. |
numrange: |
Finds a range of numbers. Either
5..11 or
numrange:5-11 works. See
number searching section below. |
pricerange: |
Finds a range of numbers prefixed by the $
sign. Either $5..11
or pricerange:5-11
works. See number searching section below. |
flink: | Used to find pages linked from the given URL. No longer working as of Oct. 30, 1999. flink:notess.com |
Before the official release in Sept. 1999, clicking the small bar graph at the beginning of a displayed hit would automatically run a link: search, but that graphic disappeared with the official launch. Another field search which can be used is related:[URL] which invokes GoogleScout to find other pages similar in linkage patterns to the given URL.
Limits:
Google has language, domain, date, filetype, and adult content limits. The date limit, added in July 2001, is only available on the Advanced Search page. Only three options are available: Past 3 Months, Past 6 Months, or Past Year.
The file type limit, added along with the addition of other file types to the Google index, was added to the Advanced Search page in Nov. 2001. The Advanced Search page only offers file type limits under the label of File Formats for PDF, Word (.doc), Excel (.xls), PowerPoint (.ppt), and Rich Text Format (.rtf). Using the filetype: prefix, the file type limit can also be used for PostScript (.ps), Text (.txt), .htm, WordPerfect (.wpd), and other file extensions. To use the prefix command, just put the extension immediately after filetype: as in differentials filetype:ps .
Google introduced the language limit in April 2000 with eleven languages which was expanded as of Aug. 2000 to 24. As of July 2001, Russian was added. In Nov. 2001, Arabic and Turkish
and then in early 2002 Catalan, Croatian, Indonesian, Serbian, Slovak, and
Slovenian joined the group for the following 34 language limit options. These are available on the Advanced Search page and their Language Tools page.
- Arabic
- Bulgarian
- Catalan
- Chinese (Simplified & Traditional)
- Croatian
- Czech
- Danish
- Dutch
- English
- Estonian
- Finnish
- French
- German
- Greek
- Hebrew
- Hungarian
- Icelandic
- Indonesian
- Italian
- Japanese
- Korean
- Latvian
- Lithuanian
- Norwegian
- Polish
- Portuguese
- Romanian
- Russian
- Serbian
- Slovak
- Slovenian
- Spanish
- Swedish
- Turkish
To choose more than one at a time use the preferences page, which also offers a choice for which of 14 languages the surrounding text will be displayed in.
In May 2000, a family filter was added which tries to exclude adult Web pages. Turn it on from the preferences page
or in the Advanced Search.
By default, "Moderate Filtering" is turned on which is supposed to "Filter
explicit images only." In other words, the Web (or text) search is not filtered,
but the image results are. The Strict Filtering option which will "filter both
explicit text and explicit images" will turn on the filter for the Web and
Images databases (and the Directory) but not Groups, News, and Froogle. Some ads
will be filtered. Also, the Strict Filtering option will block all results for
certain words. However, none of the filters, even the Strict Filtering, blocks
all explicit content. The Advanced Search offers a
domain limit, which can be used to limit results to those from the specified domain or it can be used to exclude results from a specified domain.
Stop Words:
Google does ignore frequent words. Its documentation mentions terms such as 'the', 'of', 'and', and 'or'. However, it also notes that these can be searched by putting + in front of them. As of March 2000, 'the'
was a stop word that could not be searched even with the + sign. But by 2002,
'the' could be searched with the plus. Be sure to only place the + in front of stop words. If a + is placed in front of a non-stop word in the same query, all + signs will be ignored. As of Nov. 2001, stop words within a phrase no longer require a + sign and will automatically be searched.
Also, if only stop words are entered even without phrase markings, they will be
searched.
Sorting:
Results are sorted by relevance which is determined by Google's PageRank analysis, determined by links from other pages with a greater weight given to authoritative sites. Pages are also clustered by site. Only two pages per site will be displayed, with the second indented. Others are available via the [ More results from . . . ] link. If the search finds less than 1,000 results when clustered with two pages per site and if you page forward to the last page, after the last record the following message will appear:
In order to show you the most relevant results, we have omitted some entries very similar to the 63 already displayed.
If you like, you can repeat the search with the omitted results included.
Clicking the "repeat the search" option will bring up more pages, some of which are near or exact duplicates of pages already found while others are pages that were clusted under a site listing. However, clicking on that link will not necessarily retrieve all results that have been clustered under a site.
You can also just add &filter=0 to the end of a search results URL. To see all results available on Google, you need to check under each site cluster as well as using the "repeat this search" option.
Display:
The display includes the title, URL, a brief extract showing text near the search terms, the file size, and for many hits, a link to a cached copy of the page. This cached copy is from Google's index and may be older than the version currently available on the Web. The cached copy will display highlighted search terms. If more than one search term is used, each has a different color highlighting.
The default output is 10 hits per screen, but the searcher can also choose 20, 30, 50, or 100 hits at a time on the preferences page. In June 1999, numeric relevance scores and "phase match" or "partial phrase match" indicators were removed.
In Sept. 1999, the graphic relevancy bar with its link to a link: search was removed. At the same time, a GoogleScout link was added. GoogleScout is now just labeled as "Similar pages" and find other pages similar in linkage patterns to the displayed hit. In April 2000, Google started clustering results by site. Formerly, hits from the same site would be listed indented under the first. As of April 2000, only the first two hits are displayed (with the second one indented) and the rest available under a
[ More results from hostname ]
link.
With the addition of non-HTML files in 2001, Google added two notes to the display to identify those files. Before the title in the first line of the display, [PDF] or [PS] or [XLS] is used to denote the different file format. On some, a second line of the display lists
File Format: PDF/Adobe Acrobat - View as Text.
Around Aug. 2001, Google started refreshing the indexing of certain pages (those with daily updates) more frequently than the rest of the database. These were marked with "Fresh!" after the URL and size. In Dec. 2001, this tag was changed to list the indexing date. As of Feb. 2002, 3 million pages were being refreshed on an almost daily basis.
Special Search Features:
Cached Pages: Google was the first general search
engine that provides access to pages at the time they were indexed, designated
as "cached" pages. For an alternative sources for cached pages see the
archives page.
Character Searching: Google is also the only search engine that searches for some characters. As of Sept. 2003, it would search for the ampersand &
and the underscore _ characters by themselves or as part of a
character string. In other words, a search on adv_search gets
different results than "adv search" and &tc differs
from tc . While it would not search # or + in most cases, it does
differentiate c# , c++ , c+ , and c .
It does not, however, differentiate c* , c+@ , or c+- , interpreting c* as c and both c+- and c+@ as c+ . (These c+ type strings are all various programming languages.)
In March 2004, it started searching for the $ when it
precedes a number (see more details on number searching below). Other punctuation marks may change the sorting of results.
Tomi Häsä reports that searching for I/O works as does searching for sharped
musical pitches as in a#, c#, f#, and g#. The &, + and _ also can be used one or
more times in the middle or at the end of a character or a word or between
characters and words as in a+, a_, C++, page_count, a&b&c, and i&&.
Number Searching: (New,
March 2004) Google handles numbers in some special ways and can search for a
range of numbers. When it searches for numbers, it also finds numbers with and
without commas. The number range search finds decimal numbers within the range
as well. In other words, a search like
chennai 565011 finds
pages with 565,011 while the number range of 5..11 will match
numbers such as 5, 7, 9, 11, and 7.23. Both plain number searches and number
range searches can be combined with other terms and can be included in phrase
searching.
Number Range Searching Syntax:
- Smaller number, two periods, larger number, as in
5..11 or
987.6..1001.34
- Use the prefix of numrange, a colon, and then the smaller number, a dash,
larger number.
numrange:5-11 or numrange:987.6-1001.34
- For open ended ranges, just leave off one of the numbers. For example, to
search for all numbers equal to or larger than 534, use either
534..
or numrange:534- . To find only numbers smaller than 16, use
either ..16 or numrange:-16 (and 0..16
or numrange:0-16 also works).
- A variant of the number search is the price range search. Currently, it
recognize the dollar ($) sign when it is placed immediately in front of the
number with no space, but it does not yet recognize the pound (£), Yen (¥), or
Euro (€) characters yet. Either use the $ sign with the .. syntax as in
$5..11 (the second number could also have the $ sign, but it is not
required while the first number must have it to work), or use
pricerange:5-11 without the $ sign.
Number Searching Syntax Notes:
- Be sure to put the smaller number first or the range operation won't work
- Must be positive numbers (although you may find negatives, the - sign is
not searched and is interpreted as a NOT operator)
- Numbers and number ranges can be used within a phrase search
- A plain number search also will match a number with a comma. In other
words,
2001 finds pages with 2001 but not pages with 2,001 while
2001..2001
- A number range search, like most other Google searches, will also find
pages that do not contain the number but are linked from other pages that
contain the number in the linking anchor text. Check the cached copy of the
page for a header saying "These
terms only appear in links pointing to this page."
Documentation
Google Help Pages
Google Zeitgeist (search patterns and trends)
Press Releases
|