In the classic 1986 Film "Short Circuit" the main character is a cute Robot with an acute thirst for knowledge known only as 'Number 5'.
One of Robot number 5s most engaging traits is that whenever he stumbles across anything interesting he rushes off enthusiastically to collect and index the data with cries of "Input! Input!".
Robots designed to collect and index data actually exist but they are not made of exotic metals, they are software programs and are used by Search Engines to gather information about the relevance of your web site to a particular search term.
These Robots ( sometimes called spiders or crawlers ) are smart but not very selective. Unless you provide unambiguous ground rules for visiting Search Engine robots, excluding them from areas you don't want them to enter, then every file on your web site will be perceived as "Input!" and is likely to get indexed.
"But", I hear you ask, "I want to get indexed by Search Engines, why is this a problem"? Indexing everything sounds superficially smart, however as part of a coherent web site promotion and Search Engine optimization strategy it has a number of important disadvantages:
* Search Engine spiders should be actively discouraged from visiting areas where sensitive information might be stored.
* Indiscriminately indexing everything can seriously dilute the relevancy of your web sites overall theme and can produce a sub-optimal rank in Search Engine listings.
* Allowing a Search Engine spider to index everything can even inadvertently lead to the perception by some of the Search Engine that your web site contains spam, this can lead to your site being blacklisted.
* For multilingual web sites it's imperative to focus English language robots onto the relevant English language pages and to direct robots from international Search Engines, who might be looking for Spanish, German or French language resources, to the appropriately localized content areas of your site.
* Search Engine robots can only "read" text. Dynamic content or graphical components cannot be read or indexed, rendering your site effectively invisible to Search Engines.
* Some robots "rapid fire" requests causing severe, server loading problems which can detract from your visitors browsing experience and ultimately cause loss of business.
The answer to this problem lies in having a Robot exclusion file on your web server.
Robot exclusion files, normally in the form of "robots.txt" are ASCII text files which reside in the document root directory of web servers and are used to set access permissions and control the actions of robots or spiders. Most of the major US and international Search Engines deploy spiders which look for a robots.txt file during their visit to a web site. There is an agreed industry standard for robots.txt files and, in order to work as anticipated, robots.txt has to be correctly formatted and placed in the proper location on the web server. Once uploaded to your server robots.txt is utilized to notify individual spiders about which elements of a web site cannot be visited and should not be made available on the public Internet. Used in conjunction with Search Engine optimization tools and/or services robots.txt can significantly enhance your sites chances of that all-important first page listing on the major US and international Search Engines by focusing individual spiders on specific content.
Although only a small ASCII text file, robots.txt enables a significant degree of fine tuning to be applied to your Search Engine optimization program. Used intelligently robots.txt can do a big job, significantly improving your knowledge about, and control of, visiting Search Engine robots. This is particularly the case where a web site owner either wishes to deliver specific content optimized for a particular Search Engine, or has paid for an accelerated Search Engine listing service where if would be useful to track the activity of the robot associated with that specific paid-for service.
Just as Robot Number 5 gathered more and more input and transformed this data into useful information, so web site owners can use the data generated by the interaction between robots.txt, visiting spiders and their web logs to gain significant competitive advantage.
Author: Ken Garner