Written on Saturday, September 30, 2006 by Gemini
Google searches harness one of the world's most powerful supercomputers. A search, which typically takes less than half a second, if the result of a complex journey that typically makes at least two stops, often thousand miles apart.
1. Google creates its own version of the Internet, using automated programmes called Googlebots, which crawl the Web in search of new information. Web sites known to be important an frequently modified are scanned every few minutes; sites less frequently updates may be scanned every few weeks.
2. Googlebots feed key information fram a Web page to Google's central network: URL, full text of page, references to images and other embedded files and specific information the site owner creates about the page, called metadata
3. At the central network, the information is indexed; every word that could be used in a search query is listed along with information referencing Web sites where the word can be found
4. The index is broken into "Shards" and sent to data centers - facilities made up of thousands of servers wired together - around the world; because centers may have slightly different versions of the index, depending on when they received the last update, users in different places may get slightly different results for the same seearch
SEARCHING and RANKING
When people search Google, they are asking the company to find every instance of the term in its index and rank the corresponding documents by their relevance.
1. The user types a search query; the typical query is two or three words, which can make finding the most relevant results challenging; roughly 1 in 10 queries is mispelled
2. Before Google provides any information, it identifies the searcher's location through his or her Internet Protocol (IP) address. The IP helps speed up the search by sending the request to the nearest data center and allows Google to identify geographically appropriate ads
3. The query is sent to the central network, then redirected to the nearest data center
4. At the data center, the term is run through the index; matching terms are sent back to the central network, then to the user with a summary of the Web page, called a "snippet"
THE "SECRET SAUCE"
Google determines which Web sites are most relevant to a search term by using its "secret sauce", a formula that weighs more than 200 measurements, such as the number of times a search term appears on the Web page, the number of visitors to the page and the Page Rank - the number of sites linking linking to the page and the popularity of those sites