What the French toast is a robots.txt file?
If you’re like me you’ve probably asked this question shortly after digging into working with websites. My journey into SEO lead me to asking this very question, maybe you are on the same path I was, or you are on a completely different path and we happen to be meeting at robots.txt trail junction. (intersection) Either way, I’ve done my homework and I am here to help you where no one was there to help me.
Robots.txt is in no way as cool as it sounds. I’ll be honest I thought it was a complicated file with directions for robots to use to take over the world navigate a website. As it turns out, I was somewhat correct, no not in helping robots conquer the world. A robots.txt file does tell robots what to do, they just happen to be boring robots. The kind that search engines use to crawl websites, the robots.txt file tells these crawlers which pages (and/or files) it can or can’t request from your website.
So how does robots.txt work?
When a crawler reaches a new website it checks the robots.txt file. Think of it like mapquest (does anyone still use mapquest?) for search engine crawlers. The robots.txt file manages the traffic and points crawlers in the direction of important pages and away from duplicates and unimportant pages. This will give the crawler more time to analyze the important stuff on your website.
Now I know I literally just told you that a robots.txt file tells crawlers which pages (and/or files) it can or can’t request from your website however (and this is a big however), it should not be used to keep a crawler from indexing and displaying your website on Google (or other search engines). That should be done with either a noindex tag or directive. Your page, even though you told Google “DON’T LOOK HERE DUDE”, can still be indexed via links from other pages. If this happens then your page can still be shown in search results and it’s not pretty. Just look at it
shudders. Are you as ashamed by this as I am? Well you should be! I know I am not clicking that search result, and neither are you.
To recap, a robots.txt file cannot help robots take over the world. Instead, it is used to let search engine crawlers know which pages (and/or content) are important and should be crawled and analyzed. It’s very important to remember it should not be used to keep a website from being indexed and displayed on search results page, instead use the noindex tag.