Robots.txt and Link Exchanges

Filed under: Articles

Robots.txt files are used to instruct search engine robots on which areas of a web site they should not visit or index. For example, you can instruct robots not to view or index your image folder.

This file can be used by shady webmasters to instruct search engine robots not to visit or index the links pages or directory of their web site.

Impact of robots.txt on link exchanges

For link exchanges, the use of robots.txt means that your link on the link partner’s page would not be visited, indexed or followed to your web site by search engine robots. This practice is more “sneaky” than using meta tags, javascript or redirected links.

Robots.txt will not stop visitors from coming to your site from this link.

Detecting robots.txt

You will need a little understanding about robots.txt to detect their use. Firstly, visit the robot.txt file, it’s in the root directory of a web site. For example,

http://www.[domain name].com/robots.txt

If this page does not exist, it means the web site doesn’t use a robots.txt file, therefore there’s no chance of the webmaster disallowing search engine robots. If it does exist, look for:

User-agent: * Disallow: /

This means that the web site will instruct ALL search engine robots (User-agent: *) not to visit or index the entire site (Disallow: /).

The “User-agent” may be targeted to specific search engine robots, for example

User-agent: Googlebot

The “Disallow” may also be targeted specifically to the links directory, for example

Disallow: /links

The disallow example will disallow all pages that start with http://www.[domain name].com/links

1 Comment »

  1. Hey, it might be nice to mention here that there are also positive elements of robots.txt that you can use.

    For example you could use a line to tell robots where your sitemap is:

    Sitemap: /site-map.xml

    Comment by Chris — May 19, 2008

Leave a comment