June 12th, 2013, 11:05 AM
Question regarding robots.txt
Hey, this is probably a stupid question (the best kind!) but I keep getting conflicting answers, so I hope someone here can help.
Let's say I have this directory: www.mysite.com/folder1/page.php
And in my robots.txt file, which is located at www.mysite.com/robots.txt, I have disallowed indexing for /folder1/.
Now, this would normally disallow indexing for anything within /folder1, right? But what if page.php has this metatag: <meta name="robots" content="index, follow">
Which takes precedence? I contend that since I've disallowed indexing at the top level, page.php will never be indexed. Other people tell me I'm wrong, and that the <meta name="robots" content="index, follow"> metatag will override the robots.txt file and cause the page to be indexed.
Does anyone know the answer? Thanks!
June 13th, 2013, 12:35 AM
The robot file gets precedent over meta tags. You're already informing the bot to not to index content in /folder1 so the crawler will likely not reach your /folder1/page.php to to read the meta tags there.
A better way to do this is
The robot file tells the crawler what to do, but your pages can still be indexed if they're linked from some other source which Google happen to crawl.
June 13th, 2013, 05:22 AM
You can follow to this code in your .htacess file of website.
June 13th, 2013, 08:30 PM
robots.txt always takes precedence
google's order of operations:
robots txt at crawl level
page read and meta tag at index level
won't even get to index level if crawl level is disallowed
June 18th, 2013, 05:32 AM
First of all decide whether you want to index this file or not. If you want to index this file you should allow it in robots.txt. But if you don't want to allow to index it then remove the robots tag from the page source code. There is no logic behind keeping it dissallow in robots.txt and puting meta tag of index allow.