Block bots from crawling your application.

Search engines like google, bing have bots called googlebot, bingbot which crawl your web application for SEO purposes. But it might be the case that you do not want these bots to crawl your app. How to stop these bots from crawling?

On your server, there exists a file robots.txt file. It is usually found at /home/username/public/robots.txt

User-Agent: *
Disallow: /

User-Agent: * means that it includes all the bots.
Disallow: / means that it disallows all the bots included above.

User-Agent: *
Disallow:

This means that it allows all the bots (*) to crawl the site.

User-Agent: Googlebot
Disallow: /

This means that it blocks only googlebot from crawling the site.

Now what if you want to block a particular file or a particular folder from these bots. Here is how you could do that:

User-agent: *
Disallow: /path_to_some_file/filename.pdf
Disallow: /logs/
Disallow: /tmp/

This means it does not allow the bots to crawl anything under logs and tmp directory.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s