Limiting Spider Access with robots.txt
Spider or crawler robots that access your website are using your bandwidth. Just like the real visitors that browse your site.
So, in the case of WordPress, they are also causing php calls to be made increasing your site’s CPU load and memory use. This is all normal except when you see certain bots hitting your site a few hundred times a day. In this case you may need to deny them access using your robots.txt file. Hopefully they will follow the rules and leave you alone. If they don’t you may need to ban them by IP address. Try your robots.txt file first though and see what happens. It usually works.
Here’s an example of a robots.txt file disallowing access to a crawler. Your robots.txt file should be in your website’s root directory and it is just a text file with something similar to the following.
#this one blocks all from the cgi-bin folder
User-agent: *
Disallow: /cgi-bin/#this one blocks MJ12bot from all folders
User-agent: MJ12bot
Disallow: /#this one blocks Yandex
User-agent: Yandex
Disallow: /# This one keeps all robots out.
# User-agent: *
# Disallow: /
If you have a WordPress site, using a plugin like StatPress Reloaded will give you a list of spiders that are visiting your site and you can see how often they are crawling and what impact they have.




