June 17th, 2013, 01:33 PM
How would you make a "duplicate page analyzer"?
I am trying to find the best way to create a PHP script that analyzes two webpages and returns, in percentage, the difference between them.
Something similar to this tool:
I have been struggling to create a script that returns similar results, and looks like it is harder than expected. A simple "line-to-line" comparison script doesn't seem to be the best approach, and to apply an hash to each file is useful just to tell you if the pages are exactly the same or not, there is no way to quantify "how much a page is different by the other one"... how would you approach such a kind of problem? How would you make the engine for such a kind of script?
Any thoughts and ideas are very welcome!
Thank you in advance to anyone.
June 17th, 2013, 02:20 PM
The special keyword to search for is "diff".
June 18th, 2013, 04:09 PM
Wow, thank you very much, I didn't know about that. I think that's really it!
I appreciated your help.