#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2004
    Posts
    40
    Rep Power
    10

    comparing lines of two text files


    i receive a large text file at least once a day containing all order details (sample file below)

    each day i receive a new file with any new orders appended on the end which i will then process. However, i also need to check if any of the old data has been changed at all and if so i need to take the data and create an output file with all the data for the new and amended jobs.

    Is there a quick and easy way to compare two large files so that i can just use the new lines and the amended lines

    T62/6098C,MW6030,Emb Flower Scoop,0027,17721,10,Orange,02142562,01-JUL-05,1,BSHRU37,25,WL08A,,,,PB01A,CL03B,,,,,SMB,WL10A,ST01S,,,PS01S,,,,,,,,,43,00,TR,020000002142562000 17
    T62/6098C,MW6030,Emb Flower Scoop,0027,17721,12,Orange,02142579,01-JUL-05,1,BSHRU37,25,WL08A,,,,PB01A,CL03B,,,,,SMB,WL10A,ST01S,,,PS01S,,,,,,,,,48,00,TR,020000002142579000 15
    T62/6098C,MW6030,Emb Flower Scoop,0027,17721,14,Orange,09235809,01-JUL-05,1,BSHRU41,25,WL08A,,,,PB01A,CL03B,,,,,SMB,WL10A,ST01S,,,PS01S,,,,,,,,,51,00,TR,020000009235809000 11
    T62/6098C,MW6030,Emb Flower Scoop,0027,17721,16,Orange,09235816,01-JUL-05,1,BSHRU41,25,WL08A,,,,PB01A,CL03B,,,,,SMB,WL10A,ST01S,,,PS01S,,,,,,,,,47,00,TR,020000009235816000 19
    T62/6098C,MW6030,Emb Flower Scoop,0027,17721,18,Orange,09235823,01-JUL-05,1,BSHRU41,25,WL08A,,,,PB01A,CL03B,,,,,SMB,WL10A,ST01S,,,PS01S,,,,,,,,,31,00,TR,020000009235823000 17
    T62/2307D,PU1242TJ,Broderie Gypsy,0024,17722,10,Rose,07760532,08-JUL-05,1,BSHRU37,21,WL08A,,,,,CL03B,,,,,SMB,WL10A,ST01S,,,PS01S,,,,,,,,,110,00,TR,02000000776053200015
    T62/2307D,PU1242TJ,Broderie Gypsy,0024,17722,12,Rose,07760556,08-JUL-05,1,BSHRU37,21,WL08A,,,,,CL03B,,,,,SMB,WL10A,ST01S,,,PS01S,,,,,,,,,110,00,TR,02000000776055600011
    T62/2307D,PU1242TJ,Broderie Gypsy,0024,17722,14,Rose,07760563,08-JUL-05,1,BSHRU41,21,WL08A,,,,,CL03B,,,,,SMB,WL10A,ST01S,,,PS01S,,,,,,,,,120,00,TR,02000000776056300019
  2. #2
  3. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2005
    Posts
    588
    Rep Power
    64

    Smile


    Use readlines() to read in the two files as two list_of_lines (list1 and list2), then convert each to a set_of_lines (set1 = sets.Set(list1)). You can then make a third set of the difference between one set_of_lines and the other (set3 = set1 - set2). This set would contain the lines of set 1 that are not in set2.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    London, England
    Posts
    1,585
    Rep Power
    1373
    The solution that Dietrich gives will sort-of work, but (a) it will lose the ordering of the lines and remove duplicates, (b) if a line has been modified then you will end up with the original in one set and the new line in the other set, with no way to match them up, and (c) it reads both files into memory, which could cause problems if the files are very large.

    An alternative is to use the difflib module. This is a library for creating tools like the unix 'diff' program - it does exactly what you need, and a lot more. For example, when a line has changed it can show you where in the line the change is.

    Dave - The Developers' Coach
    Last edited by DevCoach; June 6th, 2005 at 12:44 PM.
  6. #4
  7. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2005
    Posts
    588
    Rep Power
    64

    Smile


    Thanks Dave!

    Looked into help('difflib'), very interesting! Didn't know it existed!

IMN logo majestic logo threadwatch logo seochat tools logo