Dev Shed Lounge
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsOtherDev Shed Lounge

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old July 22nd, 2003, 12:38 PM
quiknot quiknot is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2003
Posts: 10 quiknot User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
robots.txt should not have regular expressions...but many do?!?

From Here
Quote:
Note also that regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "Disallow: /tmp/*" or "Disallow: *.gif".

So why do some sites use *'s in their Disallow lines? Check out sun, espn, etc.
I am writing a small scale web robot, and I am just curious what to do when I come to a disallow line with a * in it. Should I just ignore it since it does not comply? I check META tags too, so if has a noindex tag I won't index it, but most sites do not use these.

Reply With Quote
  #2  
Old July 22nd, 2003, 01:12 PM
SilkySmooth's Avatar
SilkySmooth SilkySmooth is offline
Newbie :P
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Jan 2001
Location: In the PHP Engine :-)
Posts: 2,880 SilkySmooth User rank is Sergeant (500 - 2000 Reputation Level)SilkySmooth User rank is Sergeant (500 - 2000 Reputation Level)SilkySmooth User rank is Sergeant (500 - 2000 Reputation Level)SilkySmooth User rank is Sergeant (500 - 2000 Reputation Level)SilkySmooth User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 11 h 32 m 23 sec
Reputation Power: 15
Well as far as I am concerned that web site is the standard for compliance, however you will likely find that specific individual robots / spiders have created their own standards which is likely where the use of * anywhere comes into play.
__________________
---------------------
-- SilkySmooth --
---------------------
Proxy | Little Directory

Reply With Quote
  #3  
Old August 15th, 2003, 04:49 PM
grand grand is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2003
Posts: 19 grand User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
You might find your answer at
http://www.searchengineworld.com/ro...ts_tutorial.htm

BUT I've got one for you:
What should be the permission or Robots.txt file??

Tks

Reply With Quote
  #4  
Old August 15th, 2003, 04:54 PM
drgroove's Avatar
drgroove drgroove is offline
pushing envelopes, not pencils
Dev Shed God 2nd Plane (6000 - 6499 posts)
 
Join Date: Feb 2002
Posts: 6,225 drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 1 Day 4 h 48 m 15 sec
Reputation Power: 174
-rw-r--r--
__________________
Give a person code, and they'll hack for a day; Teach them how to code, and they'll hack forever.
Analyze twice; hack once.
The world's first existential ITIL question: If a change is released into production without a ticket to track it,
was it actually released?


About DrGroove: ITIL-Certified IT Process Engineer - Enterprise Application Architect -
Freelance IT Journalist - Devshed Moderator - Funk Bassist Extraordinaire


Reply With Quote
Reply

Viewing: Dev Shed ForumsOtherDev Shed Lounge > robots.txt should not have regular expressions...but many do?!?


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump



 Free IT White Papers!
 
How to Present Effectively Online
This white paper offers practical and actionable advice on the key steps that any presenter should consider as they plan and execute a Webinar or online meeting.

 
Open Source Security Myths
Open Source Software (OSS) is computer software whose source code is available to the general public with relaxed or non-existent intellectual property restrictions (or arrangement such as the public domain), and is usually developed with the input of many contributors.

 
Power and Cooling Capacity Management for Data Centers
This paper describes the principles for achieving power and cooling capacity management.

 
Scalable, Fault-Tolerant NAS for Oracle - The Next Generation
For several years NAS has been evolving as a storage alternative for Oracle databases, and for good reason: NAS is quite often the simplest, most cost-effective storage approach for Oracle. Learn about the benefits that HP's approach to scalable NAS brings to Oracle environments in this comprehensive white paper.

 
Understanding Web Application Security Challenges
This white paper discusses many common threats and preventive measures for Web application security, and explains what you can do to help protect your organization.

 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2009 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway
Stay green...Green IT