Block bots from crawling your application.

Search engines like google, bing have bots called googlebot, bingbot which crawl your web application for SEO purposes. But it might be the case that you do not want these bots to crawl your app. How to stop these bots from crawling?

On your server, there exists a file robots.txt file. It is usually found at /home/username/public/robots.txt

User-Agent: *
Disallow: /

User-Agent: * means that it includes all the bots.
Disallow: / means that it disallows all the bots included above.

User-Agent: *

This means that it allows all the bots (*) to crawl the site.

User-Agent: Googlebot
Disallow: /

This means that it blocks only googlebot from crawling the site.

Now what if you want to block a particular file or a particular folder from these bots. Here is how you could do that:

User-agent: *
Disallow: /path_to_some_file/filename.pdf
Disallow: /logs/
Disallow: /tmp/

This means it does not allow the bots to crawl anything under logs and tmp directory.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s