
March 23rd, 2006, 08:44 AM
|
|
Registered User
|
|
Join Date: Jan 2006
Posts: 6
Time spent in forums: 51 m 38 sec
Reputation Power: 0
|
|
|
Filtering search spider "clicks"
I posted a question over in the PHP forum because that's the language our ad management software is written in, but then when I saw this forum I thought it would probabably be more appropriate for this forum. My apologies. Mods - if this is a violation, then you can delete the topic from the other forum.
The advertising management software we're using on our site does not filter out "clicks" incurred when search spiders follow the ad links. I'm trying to figure out the best way to do that. I have some ideas, but it doesn't seem like any of them are a complete solution.
1) Use a robots.txt file to tell search spiders not to access the URL of the click processing script. This would probably reduce spider clicks dramatically, but it won't do anything about the spam bots that ignore robots.txt
2) Check the HTTP_REFERRER variable and throw out clicks with no referrer. After some limited observation, I found that while this did eliminate all the spider clicks it also threw out about half the legitimate clicks. I guess some browsers do not send the referrer data.
3) Check the IP address or user-agent against a list of known spiders. The downside of this is that the list has to be constantly updated.
4) Check the time of the last click from that IP address. If the click is within a second of the last click, disregard it, and if comparing to a list of known bots, add the IP to the list. The downside of this is some people use "browser accelerators" that automatically follow links on a page so that when the user clicks on a link that page has already loaded. It's difficult (impossible?) to distinguish between that users real clicks and when their accelerator is following the ad links.
Any suggestions as to a better approach? Anyone know how the server stats programs differentiate between real visitors and search spiders?
Thanks!
|