4 How Google crawls, indexes and serves up web pages
In simple terms, searching the web is like looking in a huge book with a mammoth index telling you exactly where everything is located. So, when a potential customer performs a Google search, our programs check our index to determine the most relevant search results to be served to them. The three key processes are:
Crawling is the process by which new and updated pages are added to the Google index. We use a huge set of computers to fetch (crawl) billions of web pages. The program that does the fetching is called Googlebot (also known as a robot, bot, or spider). Googlebot uses an algorithmic process to determine which sites to crawl, how often, and how many pages to fetch from each site. Google doesn't accept payment to crawl a site more frequently, and the search side of our business is separate from our AdWords service.
Googlebot processes each of the pages it crawls to compile a massive index of all the words it sees and their location on each page. In addition, we process information included in key content tags and attributes, such as Title tags and ALT attributes. Googlebot can process many, but not all, content types. For example, we cannot process the content of some rich media files or dynamic pages.
When a user starts a search, our machines search the index for matching pages and return the most relevant results. Relevancy is determined by over 200 factors, only one of which is PageRank — the measure of the importance of a page based on the incoming links from other pages. Each link to a page on your site from another site also adds to your site's PageRank. But not all links are equal: Google works hard to identify spam links and other practices that negatively impact search results. Powerful links are those given based on the quality of your content.
In order for your site to rank well in search results pages, it's important to make sure that Google can crawl and index your site correctly. Our Webmaster Guidelines outline best practices to improve your site's ranking.
Google's Did you mean and Autocomplete features are designed to help users save time by displaying related terms, common misspellings, and popular queries. The keywords used by these features are automatically generated by our web crawlers and search algorithms. We display these predictions only when we think they might save the user time. If a site ranks well for a keyword, it's because we've algorithmically determined that its content is more relevant to the user's query.