January 28th, 2005, 11:36 AM
MD5 is the only real way to do this since as you mentioned, not all pages have the same headers
. Still, there is a problem doing this with dynamic pages, particularly those that include dynamically generated adds since the page will be different every time but the content that you're actually interested in wont have changed.
You could download the page and use a module like filecmp or difflib to find the exact changes. This could even make it possible for you to select the parts of the page you are interested in (on a per-change basis). Support for marking a page as Dynamic would also be handy
It's a tricky problem to solve, but a good one if you can solve it well!
Hope this helps,
programming language development: www.netytan.com – Hula