#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    3
    Rep Power
    0

    Checking large strings of data for errors


    Hi,

    If I have a very long string of data, such as an essay, and I have an almost identical essay with a few minor changes and/or errors, and I want to compare the two and count how many differences there are, how would I go about this?

    In comparing short strings (one sentence) , I have used a server and a client with an engine (EssayEngine) which extends Remote, and imports java.rmi.Remote.

    In the client program, I have declared a final string with the original sentence, and a final string with the changed sentence.

    I have then compared the two using a registry lookup on the engine, and an engine check to compare the two strings. This only shows the position of one (the first error).

    int h = engine.check(string1, string2);
    (This is wrapped in a try catch.)

    I just don't know how to scale up the program to check for multiple errors and count them. Any pointers would be great, thanks.
  2. #2
  3. Contributing User
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Aug 2010
    Location
    Eastern Florida
    Posts
    3,719
    Rep Power
    348
    Can the long Strings be broken up into smaller Strings that can be compared?
    If not, then two Strings could be compared byte by byte.
    A problem is re-synching the compare when a difference is found.
    For example with these two Strings:
    abxcde
    vs
    abcde
    The first one has an extra x in it. When the compare finds that x in the first string does not match c in the second string, what should it do? If the x is treated as an insert, then the rest of the characters will match.
    Last edited by NormR; June 11th, 2013 at 07:22 AM.

IMN logo majestic logo threadwatch logo seochat tools logo