Search Engine Showdown[an error occurred while processing this directive]
home feature chart reviews statistics learn directorires search


Feature Chart
SEs by Feature
News Searches
Multi-Search
Directories
Opinions/Usenet
Phone Numbers
Others


Size
Freshness
Inconsistencies
And more


Google
Yahoo! Search
Teoma
Gigablast


News Archive
Email Lists
Alerts


On the Net columns
Search Strategies
Books on Searching


Site Awards

About this Site

Google Special Report: Database Components

Data from search engine analysis run on March 4-6, 2002.
by Greg R.

Google's Multifaceted Database

Google's Web database has several facets of interest to searchers. This article compares Google's reports on the size of these components of its Web database to actual search results and what searchers can expect to find.

Google's Reported Numbers

Pie Chart - 3K Google includes some results (URLs) that it has not actually indexed. In addition, it includes other file types like PDFs, PostScript, and others. According to a company press release from Dec. 11, 2001 along with additional information from company representatives, the breakdown is something like this:

in millions

percent

Indexed Web Pages

1,465

73.1%

Unindexed URLs

500

25%

Other file types

35

1.75%

Daily Reindexed Web Pages

3

0.15%

Counting all the above, Google reports over 2 billion "Web documents." However, in an analysis of the results from 25 very specific searches, the effective size is considerably less, since most searchers will see very few of the unindexed URLs. What are these four categories?

Indexed Web Pages
Regular search engine results -- Web pages whose words have been indexed.
Unindexed URLs
URLs for Web pages or documents that Google's spider has not actually visited and has not been indexed. See my page on Google's Unindexed URLs for more details and an example.
Other File Types
Web-accessible documents that are not Web pages, such as Adobe Acrobat PDF, Microsoft Word, PostScript, Excel, PowerPoint, WordPerfect, and other files.
Daily Reindexed Web Pages
These are just regular indexed Web pages like those in the first category, except that Google has noticed that these are pages that are frequently updated. Therefore, Google reindexes these every day or so. These pages display the date they were last refreshed after the URL and size in Google's results.

Analysis of 25 Google Searches

2 pie charts So what can we expect from our searches? The upper pie chart to the right represents a total of 8,371 results from 25 small, specific, one-word searches on Google. The largest slice includes both the Indexed Web Pages and the Daily Reindexed Web Pages, since the latter were not counted separately. The lower pie chart shows the subdivision of the Other File Types and the key identifies the file extension and the actual number of results for each type. PDFs are by far the most numerous.

See some of my other statistical analyses such as the relative size and change over time for more search engine database information.

A Notess.com Web Site
©1999-2006 by Greg R. , all rights reserved
Search Engine Showdown
Greg's Writings
Greg's Presentations