Tuesday, July 19, 2011

How to write Robots.txt file

In a text editor, open a file named robots.txt. Note that the name must be all lower case, even if your Web pages are hosted on a Windows Web server. You'll need to save this file to the root of your Web server. For example:

http://www.mydomain.com/robots.txt

The format of the robots.txt file is

User-agent: robot

Disallow: directories/folders or files #Google will not crawl the named files / folders here

You can use wildcards to indicate all robots, or all robots of a certain type. For example:
To specify all robots:

User-agent: *

To specify all robots that start with the letter A:

User-agent: A*

The disallow lines can specify files or directories:
Don't allow robots to view any files on the site:

Disallow: /

Don't allow robots to view the index.html file

Disallow: /index.html

If you leave the Disallow blank, that means that all files can be retrieved, for example, you might want the Googlebot to see everything on your site:

User-agent: Googlebot

Disallow:

If you disallow a directory, then all files below it will be disallowed as well.

Disallow: /norobots/

You can also use multiple Disallows for one User-agent, to deny access to multiple areas:

User-agent: *

Disallow: /cgi-bin/

Disallow: /images/

You can include comments in your robots.txt file, by putting a pound-sign (#) at the front of the line to be commented:

# Allow Googlebot anywhere

User-agent: Googlebot

Disallow:

Robots follow the rules in order. For example, if you set googlebot specifically in one of your first directives, it will then ignore a directive lower down that is set to a wildcard.

# Allow Googlebot anywhere

User-agent: Googlebot

Disallow:

# Allow no other bots on the site

User-agent: *

Disallow: /

No comments:

Post a Comment