IBM developerWorks
           UNIX Help
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsOperating SystemsUNIX Help

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here
  #1  
Old April 18th, 2008, 11:41 PM
bholabhala bholabhala is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2004
Posts: 77 bholabhala Negative: is most likely a SPAMMER and a traitor to the cause. 
Time spent in forums: 9 h 54 m 14 sec
Reputation Power: 0
Unhappy Data miners, a big problem...

I need some serious help. My website is being crawled 24x7 and that reduces the load time of the website to a point, where 30% of the traffic is showing as error code 206 in my awstats. I need to stop these automated crawlers from accessing my site.

These crawlers may include email extractor, data miners etc.

Any help will be highly appreciated.

Reply With Quote
  #2  
Old April 26th, 2008, 04:19 AM
atlantisstorm atlantisstorm is offline
Hang your freedom higher.
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jan 2005
Posts: 622 atlantisstorm User rank is First Lieutenant (10000 - 20000 Reputation Level)atlantisstorm User rank is First Lieutenant (10000 - 20000 Reputation Level)atlantisstorm User rank is First Lieutenant (10000 - 20000 Reputation Level)atlantisstorm User rank is First Lieutenant (10000 - 20000 Reputation Level)atlantisstorm User rank is First Lieutenant (10000 - 20000 Reputation Level)atlantisstorm User rank is First Lieutenant (10000 - 20000 Reputation Level)atlantisstorm User rank is First Lieutenant (10000 - 20000 Reputation Level)atlantisstorm User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 2 Weeks 7 h 28 m 48 sec
Reputation Power: 135
Have you tried setting up a robot text file in your website's root directory?
__________________
"Badges? We ain't got no badges. We don't need to badges! I don't have to show you any stinkin' badges!!"

Reply With Quote
  #3  
Old May 9th, 2008, 03:20 AM
bholabhala bholabhala is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2004
Posts: 77 bholabhala Negative: is most likely a SPAMMER and a traitor to the cause. 
Time spent in forums: 9 h 54 m 14 sec
Reputation Power: 0
Yes

I have robots.txt in my root and this is what it says
PHP Code:
 User-agent: *

Disallow: /admin 


Is this not enuff to stop those BOTS?

Reply With Quote
  #4  
Old May 9th, 2008, 09:35 PM
Scorpions4ever's Avatar
Scorpions4ever Scorpions4ever is online now
Banned ;)
Dev Shed God 5th Plane (7000 - 7499 posts)
 
Join Date: Nov 2001
Location: Glendale, Los Angeles County, California, USA
Posts: 7,432 Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level) 
Time spent in forums: 4 Weeks 1 Day 22 h 36 m 4 sec
Reputation Power: 784
Not if the bot doesn't respect robots.txt. You can do a few things:
1. Configure your webserver or your firewall to disallow by IP address.
2. Configure your webserver to disallow particular user-agents if the crawl bots use distinct user agents.
__________________
Up the Irons
What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
"Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
Down with Sharon Osbourne

Puzzle of the Month solved by sizeablegrin, etienne141 and L7Sqr, superior C/C++ programmers of the month

Reply With Quote
Reply

Viewing: Dev Shed ForumsOperating SystemsUNIX Help > Data miners, a big problem...


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 5 hosted by Hostway