|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
robots.txt should not have regular expressions...but many do?!?
From Here
Quote:
So why do some sites use *'s in their Disallow lines? Check out sun, espn, etc. I am writing a small scale web robot, and I am just curious what to do when I come to a disallow line with a * in it. Should I just ignore it since it does not comply? I check META tags too, so if has a noindex tag I won't index it, but most sites do not use these. |
|
#2
|
||||
|
||||
|
Well as far as I am concerned that web site is the standard for compliance, however you will likely find that specific individual robots / spiders have created their own standards which is likely where the use of * anywhere comes into play.
__________________
--------------------- -- SilkySmooth -- --------------------- Proxy | Little Directory |
|
#3
|
|||
|
|||
|
You might find your answer at
http://www.searchengineworld.com/ro...ts_tutorial.htm BUT I've got one for you: What should be the permission or Robots.txt file?? Tks |
|
#4
|
||||
|
||||
|
-rw-r--r--
__________________
Give a person code, and they'll hack for a day; Teach them how to code, and they'll hack forever. Analyze twice; hack once. The world's first existential ITIL question: If a change is released into production without a ticket to track it, was it actually released? About DrGroove: ITIL-Certified IT Process Engineer - Enterprise Application Architect - Freelance IT Journalist - Devshed Moderator - Funk Bassist Extraordinaire |
![]() |
| Viewing: Dev Shed Forums > Other > Dev Shed Lounge > robots.txt should not have regular expressions...but many do?!? |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|
|