A robot.txt file is a file you can easily create to let the spider know that you don't want it to crawl on your page, or part of your page.
1.Open your favorite text editor. It doesn't matter what text editor you use. Notepad works just fine if you're on a PC, and can be found under "Accessories."
User-Agent: [Spider or Bot name]
Disallow: [Directory or File Name]
Disallow: [Directory or File Name]
For example:
User-Agent: Googlebot
Disallow: /mywebsite/private.html
Disallow: /mywebsite/private.html
where "Googlebot" is the robot sent out by Google, and "private.html" is the file in the directory "mywebsite" that you do not want the robot to index.
User-Agent: *
Disallow: /mywebsite/private.html
Disallow: /mywebsite/private.html
User-Agent: *
Disallow: /
Disallow: /
5.If you want to allow all robots to access your whole site, simply add the asterisk as before, and leave the Disallow section empty, as follows:
User-Agent: *
Disallow:
6.Save the file as robot.txt, and place it in the root directory of your website. For example, http://www.mywebsite.com/robots.txt.
No comments:
Post a Comment