#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2003
    Location
    Laguna Niguel, CA - USA
    Posts
    121
    Rep Power
    12

    How would you make a "duplicate page analyzer"?


    Hello here.

    I am trying to find the best way to create a PHP script that analyzes two webpages and returns, in percentage, the difference between them.

    Something similar to this tool:
    http://www.webconfs.com/similar-page-checker.php


    I have been struggling to create a script that returns similar results, and looks like it is harder than expected. A simple "line-to-line" comparison script doesn't seem to be the best approach, and to apply an hash to each file is useful just to tell you if the pages are exactly the same or not, there is no way to quantify "how much a page is different by the other one"... how would you approach such a kind of problem? How would you make the engine for such a kind of script?

    Any thoughts and ideas are very welcome!

    Thank you in advance to anyone.
    Fabrizio Ferrari

    Virtual Sheet Music
    http://www.virtualsheetmusic.com
  2. #2
  3. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,961
    Rep Power
    9397
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2003
    Location
    Laguna Niguel, CA - USA
    Posts
    121
    Rep Power
    12
    Wow, thank you very much, I didn't know about that. I think that's really it!

    I appreciated your help.

    Best,
    Fab.
    Fabrizio Ferrari

    Virtual Sheet Music
    http://www.virtualsheetmusic.com

IMN logo majestic logo threadwatch logo seochat tools logo