June 17th, 2013, 02:33 PM
Join Date: Feb 2003
Location: Laguna Niguel, CA - USA
Time spent in forums: 5 h 5 m 56 sec
Reputation Power: 11
How would you make a "duplicate page analyzer"?
I am trying to find the best way to create a PHP script that analyzes two webpages and returns, in percentage, the difference between them.
Something similar to this tool:
I have been struggling to create a script that returns similar results, and looks like it is harder than expected. A simple "line-to-line" comparison script doesn't seem to be the best approach, and to apply an hash to each file is useful just to tell you if the pages are exactly the same or not, there is no way to quantify "how much a page is different by the other one"... how would you approach such a kind of problem? How would you make the engine for such a kind of script?
Any thoughts and ideas are very welcome!
Thank you in advance to anyone.
Virtual Sheet Music