Web Design Help
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsWeb DesignWeb Design Help

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old December 25th, 2003, 06:24 AM
tenaka tenaka is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2003
Posts: 112 tenaka User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 5
Question faulty robots.txt???

hi guys,

my robots.txt looks like:
Quote:
User-agent: *
Disallow: /cgi-bin/
Disallow: /gallery/
Disallow: /images/
Disallow: /stat_www/
Disallow: /stat_www_old/
Disallow: /survey/
Disallow: /templates/


when I look at the logfile I see strange things: the google bot is regularly visiting and indexing sites but
Quote:
this one: 66.196.65.36 - Mozilla/5.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
does only come to my site and reads the robots.txt allthe time and then leaves again. Today it read it 7 times and nothing else.

Btw to which chmode do I have to set the robots.txt?
And how do some ppl manage to get a 404 error, when retrieving my robots.txt???

Am I doing something wrong?

Last edited by tenaka : December 25th, 2003 at 06:51 AM.

Reply With Quote
  #2  
Old December 27th, 2003, 08:16 AM
tenaka tenaka is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2003
Posts: 112 tenaka User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 5
Quote:
66.196.65.36 - Mozilla/5.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
Date Page Status Referer
12/27 03:37 /robots.txt 200 -
12/27 04:47 /robots.txt 200 -
12/27 05:50 /robots.txt 200 -
12/27 06:55 /robots.txt 200 -
12/27 08:17 /robots.txt 200 -
12/27 10:43 /robots.txt 200 -
12/27 12:25 /robots.txt 200 -
66.196.90.246 - Mozilla/5.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
Date Page Status Referer
12/27 03:21 /robots.txt 200 -
12/27 07:22 /robots.txt 200 -
12/27 11:28 /robots.txt 200 -


And thats all that slurp does on my site.

Reply With Quote
  #3  
Old January 2nd, 2004, 02:32 PM
tenaka tenaka is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2003
Posts: 112 tenaka User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 5
I just got an answer from inktomi tech support:

Quote:
When attempting to crawl content from your site, we periodically re-read the content in /robots.txt to check for changes. We also periodically re-read the root page of sites as a check on the site status. So the repeated access to your home page and /robots.txt is not unexpected.

Our crawler knows of a few thousand URLs in w*w.yoursite.com, but nearly all of them are in your /cgi-bin directory so we do not actually access the content. Your home page only includes a couple of links to other pages in w*w.yoursite.com, and those URLs are crawled and are included in our search database. If there is other content in w*w.yoursite.de that you would like to see indexed by Slurp, be sure to publish some navigation links from which the crawler can discover the content.

For more information about Slurp, see our search FAQ at http://support.inktomi.com/searchfaq.html.

Regards,
Inktomi Tech Support


Here is my answer to them:
Quote:
I was just wondering because other bots like googlebot do come back and read my web pages again from time to time and I am quite sure that slurp has not come back to reread my content for severall weeks although I stated a revisit time of 10 days in my metatags.

I also expected those indexed pages from my cgi-bin to disappear from your index as they do no longer exist. The only explanation would be that there are still links pointing to them, isn't it?


Maybe thsi helps others too and maybe someone else can shed some more light onto this.

Reply With Quote
  #4  
Old January 3rd, 2004, 08:11 AM
tenaka tenaka is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2003
Posts: 112 tenaka User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 5
talking about 404 errors:

66.77.73.162 - FAST-WebCrawler/3.8 (crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler)

Date Page Status Referer
01/03 15:01 /robots.txt 404 -

The first time FAST visited me for a long time and it did not find my robots.txt ???

Of course I have one and it is ok..

Reply With Quote
  #5  
Old January 7th, 2004, 01:38 PM
tenaka tenaka is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2003
Posts: 112 tenaka User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 5
I just thought this would fit in. Here is another excerpt from my access logfile:

Quote:
66.196.72.46 - Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
Date Page Status Referer
01/07 19:37 /free-sex-lesbian-story.htm 404 -


Never had anything like that on my server!

At least googlebot and fast are now busy reindexing my site although slurp is still only interested in my robots.txt

I have beend thinking about removing the robots.txt for a week or so and see what happens. Btw this is not a critical site it is just a personal site I made to show pictures from my last travels and I am not finished with the design. The whole thing is that I am trying out my SEO skills on this unimportant page. I am trying to push my page up in the rankings just to see if I can do it. Doesn't matter if I remove robots.txt and all the crap in each and every subdirectory gets indexed. If I put the file back after one week, robots will return and if no one is linking to directories I forbid for spiders those will disappear from search engines?

Reply With Quote
  #6  
Old January 9th, 2004, 10:07 AM
tenaka tenaka is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2003
Posts: 112 tenaka User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 5
I removed robots.txt for 3 days now, everythings normal except that slurp is busy with my site. it is checking all the pages it had indexd months/years ago. I hope when it is finished with those inexistent pages it will start indexing the new pages.

The conclusion: slurp had problems with my robots.txt - have a look at the first post here and see if there is something wrong with my robots.txt file.

Reply With Quote
  #7  
Old January 18th, 2004, 05:46 PM
tenaka tenaka is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2003
Posts: 112 tenaka User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 5
I put robots.txt back and slurp/cat left.

now only slurp/si is visiting.

Quote:
66.196.65.36 - Mozilla/5.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
Date Page Status Referer
01/18 01:17 /robots.txt 200 -
01/18 02:17 /robots.txt 200 -
01/18 03:36 /robots.txt 200 -
01/18 04:40 /robots.txt 200 -
01/18 05:44 /robots.txt 200 -
01/18 06:46 /robots.txt 200 -
01/18 07:49 /robots.txt 200 -
01/18 09:06 /robots.txt 200 -
01/18 10:09 /robots.txt 200 -
01/18 11:13 /robots.txt 200 -
01/18 12:24 /robots.txt 200 -
01/18 13:33 /robots.txt 200 -
01/18 14:43 /robots.txt 200 -
01/18 15:43 /robots.txt 200 -
01/18 17:03 /robots.txt 200 -
01/18 18:18 /robots.txt 200 -
01/18 19:41 /robots.txt 200 -
01/18 20:45 /robots.txt 200 -
01/18 21:47 /robots.txt 200 -
01/18 23:00 /robots.txt 200 -
01/19 00:22 /robots.txt 200 -


and slurp/cat which was indexing my site left. there is nothing wrong with my robots.txt and I have other content than what I excluded. Do you think I should write to them again???

Reply With Quote
Reply

Viewing: Dev Shed ForumsWeb DesignWeb Design Help > faulty robots.txt???


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 2 hosted by Hostway