November 22nd, 1999, 03:19 PM
We run a busy site that is powered by PHP and MySql, we provide a lot of free training content and have over 35,000 files totalling over 500mb.
We are getting mass downloaded far too often and our bandwidth costs are crippling us.
Does anyone know a way to stop this mass download of our site.
Using PHP we check the browser type against our list of banned browsers and don't provide the links if a banned browser is being used - this has helped. However the banned browser list grows everyday and a new offline browser becomes available nearly everyday and not only that most of them allow you to spoof the browser type anyway.
We are thinking of introducing a session based site that monitors timestamps against session id's and bans users that download too quickly, does anyone have any thoughts on this?
Any help would be greatly appreciated.
November 22nd, 1999, 11:13 PM
We're actually seeing a similar problem here at DevShed at this very moment.
When I wrote my own banner software for 32bit.com some time ago, I wanted to make sure I didn't bother sending banners to robots, so I made my robots.txt file a php3 file that registered the ip and useragent of the robot. If it came back within a fixed time period to the site, I would not bother serving ads to it. I ought to clean up that code and put it out. Many of the offline browsers I've seen will read the robots.txt file, even if they do choose to ignore it. That's not a solution, but it could help catch a few more.
Some times it's just as easy to dump a whole class c in the firewall, but that's probably not what you intend to do
November 23rd, 1999, 04:01 AM
Thhx for the reply. It is a shame that your site is being targetted too. I suppose thats what happens when you offer valued information free from charge - some users just want more than is on offer ;o)
Since the posting I have delved deeper and I am sure there is a solution in session management with perhaps PHPLIB. I have requeted that it be installed on our server by our ISP.
I am going to try something out over the next week or so, with session id's and timestamps and also tie that in with our banned user agent routines.
If you would like to know how I get on or perhaps even help out, drop me an email at email@example.com and perhaps we can get something in place that will at least stop all but the very sneaky.
November 23rd, 1999, 12:57 PM
Another idea I've toyed with is to use the "robot catcher" to feed "robot-friendly" pages that may not look quite as pretty, but that are optimized to show up in good places in the search engines. Gotta love PHP!