#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    6
    Rep Power
    0

    Question regarding robots.txt


    Hey, this is probably a stupid question (the best kind!) but I keep getting conflicting answers, so I hope someone here can help.

    Let's say I have this directory: www.mysite.com/folder1/page.php

    And in my robots.txt file, which is located at www.mysite.com/robots.txt, I have disallowed indexing for /folder1/.

    Now, this would normally disallow indexing for anything within /folder1, right? But what if page.php has this metatag: <meta name="robots" content="index, follow">

    Which takes precedence? I contend that since I've disallowed indexing at the top level, page.php will never be indexed. Other people tell me I'm wrong, and that the <meta name="robots" content="index, follow"> metatag will override the robots.txt file and cause the page to be indexed.

    Does anyone know the answer? Thanks!
  2. #2
  3. No Profile Picture
    Online Strategist
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Location
    Moratuwa, Sri Lanka
    Posts
    112
    Rep Power
    24
    The robot file gets precedent over meta tags. You're already informing the bot to not to index content in /folder1 so the crawler will likely not reach your /folder1/page.php to to read the meta tags there.
    A better way to do this is
    Disallow: /folder1/
    Allow: /folder1/page.php

    The robot file tells the crawler what to do, but your pages can still be indexed if they're linked from some other source which Google happen to crawl.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    56
    Rep Power
    0
    You can follow to this code in your .htacess file of website.

    Disallow: /folder1/
    Allow: /folder1/page.php
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    10
    Rep Power
    0
    robots.txt always takes precedence

    google's order of operations:

    robots txt at crawl level

    page read and meta tag at index level

    won't even get to index level if crawl level is disallowed
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    13
    Rep Power
    0
    First of all decide whether you want to index this file or not. If you want to index this file you should allow it in robots.txt. But if you don't want to allow to index it then remove the robots tag from the page source code. There is no logic behind keeping it dissallow in robots.txt and puting meta tag of index allow.

IMN logo majestic logo threadwatch logo seochat tools logo