The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages
> Perl Programming
|
Help Please -how to track Website changes?
Discuss Help Please -how to track Website changes? in the Perl Programming forum on Dev Shed. Help Please -how to track Website changes? Perl Programming forum discussing coding in Perl, utilizing Perl modules, and other Perl-related topics. Perl, the Practical Extraction and Reporting Language, is the choice for many for parsing textual information.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

December 1st, 2012, 11:25 AM
|
|
Registered User
|
|
Join Date: Aug 2007
Posts: 29
Time spent in forums: 3 h 56 m 26 sec
Reputation Power: 0
|
|
|
Help Please -how to track Website changes?
Hi,
I posted this question in PHP forum, and one of the members said if they had a preference, they would do this in PERL.
My question here is, would anyone be interested in helping develop this? How tough does this sound?
Ideally, I'd like to get some sort of email notice, whenever a certain change is made (removal of certain links).
ANOTHER MEMBERS APPROACH:
This would not necessarily need PHP. If I were doing this I would periodically grab the pages of interest and store them somewhere. I'd then do a diff on the latest page and the corresponding previously stored page. I would then analyze those diff's and make the appropriate notifications.
As a matter of personal preference I would use perl to do this.
HERE IS MY ORIGINAL QUESTION BELOW:
Sorry if I have posted this in the complete wrong place. I have no knowledge on web programming what so ever.
I was hoping someone could help out.
Is there a way to track certain changes to other websites?
I work for a manufacturers sales rep firm. Ideally, we want to rep the best manufacturers'. If you can imagine, the best are usually already taken, unless that manufacturer feels that rep firm isn't doing well for them. At that point, the rep firm gets dropped, and usually the rep firm will remove them from their website.
Thats where we want to take action and pick up these manufacturers while they are open. However, its very hard to find out without constantly checking every rep firms website. Either that, or just word of mouth.
It would be great if there was a way to track the removal of say, certain links (manufacturer's links) or images from a certain web page. Ideally, we would want to get notified when a change occurs, and then we can go see what link was removed.
For example, thsi rep firm lists all their Manufacturers here:
http://www.yando.com/caline.htm
Say CREE Microwave is removed. We would want some way to be notified when that happens.
Any ideas on how to achieve this?
|

December 1st, 2012, 11:37 AM
|
 |
!~ /m$/
|
|
Join Date: May 2004
Location: Reno, NV
|
|
It's easy in perl, and in several other languages as well. A simple perl script could be launched from cron to check websites at whatever interval you wanted.
Are you looking to do the work yourself, or to hire someone? If you are doing this yourself, you'll want to start with LWP ; specifically the UserAgent module, and give the resulting pages to HTML::Parser.
Not mandatory, but the List::Compare module would be really handy as well.
|

December 1st, 2012, 12:28 PM
|
|
Registered User
|
|
Join Date: Aug 2007
Posts: 29
Time spent in forums: 3 h 56 m 26 sec
Reputation Power: 0
|
|
Hi Keith,
Definitely not myself. I took a C++ class 15 years ago, and thats about the extent of my programming knowledge
Are you or someone else you can recommend willing to help out?
Just let me know what you would charge, and I can talk to my boss and the rest of my team about it.
If you could PM me the contact details, i can contact you.
Thanks
Quote: | Originally Posted by keath It's easy in perl, and in several other languages as well. A simple perl script could be launched from cron to check websites at whatever interval you wanted.
Are you looking to do the work yourself, or to hire someone? If you are doing this yourself, you'll want to start with LWP ; specifically the UserAgent module, and give the resulting pages to HTML::Parser.
Not mandatory, but the List::Compare module would be really handy as well. |
|

December 2nd, 2012, 09:37 AM
|
 |
!~ /m$/
|
|
Join Date: May 2004
Location: Reno, NV
|
|
|
Someone else here may be interested, but I don't hire out for some reason I'm not sure of. I think there is probably a better forum, or another section of the forum for jobs like that.
The one thing to be aware of is that your competitors' websites will probably change their presentation from time to time, which could necessitate changes to a tracking script. In other words, it's the sort of script that is going to need occasional maintenance.
|

December 6th, 2012, 05:54 AM
|
|
Registered User
|
|
Join Date: Dec 2012
Posts: 5
Time spent in forums: 58 m 25 sec
Reputation Power: 0
|
|
Quote: | Originally Posted by keath Someone else here may be interested, but I don't hire out for some reason I'm not sure of. I think there is probably a better forum, or another section of the forum for jobs like that.
The one thing to be aware of is that your competitors' websites will probably change their presentation from time to time, which could necessitate changes to a tracking script. In other words, it's the sort of script that is going to need occasional maintenance. |
It you just wanted notification when the page changed, you could do the following:
1. Grab the web page via LWP::Simple
2. Calculate md5 checksum of the page
- Take into account any variables in the html that you may need to strip, such as headers with generation date/times in.
3. Compare against most recently stored checksum - if it differs, send an SMS or email to select recipients.
I can code this in Perl, PHP or C# if you're interested.
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|