Monday 27 May 2013

How Search Engines Work

The Web would be virtually unusable if it were not for Search Engines – allowing us to quickly track down information on virtually any topic under the sun. Understanding how Search Engines work can allow you a much better chance of finding what you are looking for faster.
Search Engines grew out of the Information Retrieval field – which has been around ever since documents began to be digitised. Rather than a specific set of relatively similar documents in one company or organisation, however, Search Engines have to index pages in multiple formats, across millions of websites, with information changing by the minute. On top of this, they have to deal with the problem of webmasters trying to cheat the system to get their pages to the top of the rankings.
Search Engines are a kind of “know-where” – that is, they tell you where you are likely to find out about a certain topic. They don’t know the answer themselves.

The Index
Search Engines build up an index of the web, by sending out special programs – called Robots or Spiders – to jump from hyperlink to hyperlink to reveal all the pages on the Web. This index is updated all the time - pages that are considered more important are checked more frequently.
When you search for a word or phrase, the search engine looks through its index for pages that contain those words. It then lists the pages it finds in order of relevance – with the first item being the one most likely to contain what you are looking for.

If you and the search engine were having a conversation, it would go like this.

You: Hello Google. Can you please tell me where I might find out about “Titanic”
Google: I’ve looked in the index of all the pages I’ve visited. I’ve found out that there are 13,443,233 webpages that either contain the word “titanic” or that are probably related to the word. I think the most likely place is here (first page). Second is this place (second page)……. and this has the least chance of being what you are after (last page).

Search Engines never reveal the exact formula for how it ranks the pages, but some of the main considerations include:

  • The title of the page
  • How many times the phrase appears in the main text of the page
  • If the phrase appears in bold on the page
  • How many other pages link to the page
  • The domain name of the website
Caching
Most search engines store a copy of the pages on their own servers – which is called the cache. This allows users to see the content of the page, even if the server it is stored on is down when they want to view it.

No comments:

Post a Comment