#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    45
    Rep Power
    11

    how to store Diff


    Is tehre a way i can compare and store the diff to third file
    Code:
    import os 
    
    file1=('num1.log')
    file2=('num2.log')
    file3=('num3.log')
    for i in file(file1):
        i=i.split()
        print (' '.join(i)),
    print "\n"
    for j in file(file2):
        j=j.split()
        print(' '.join(j)),
  2. #2
  3. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    Do you require it in a specific format?
    You can call an external program like diff:

    Code:
    import os
    os.system("diff %s %s > %s"%(file1,file2,file3))
    BTW you don't need ( ) around the print statements and filename assignements.

    grim
    Last edited by Grim Archon; July 7th, 2004 at 07:10 AM.
  4. #3
  5. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    You could also try the difflib module. See the Python docs for an example.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    45
    Rep Power
    11
    I tried your code it creates an empty file called file3.Doesn't it copy the diffs from file1 and file2 to file3

    Thanks
  8. #5
  9. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    If file1 contents are the same as file2 contents then the output will be nothing (also true if they happen to be the same file).

    I'm gusssing that you have diff on your platform.

    grim
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    45
    Rep Power
    11
    contents in file1 and file2 are not exactly same
    contents of file1=1, 2, 3, 4, 5, 6, 56,
    contents of file2=2, 6, 12, 58, 96, 56,
    contents of file3 should be =2, 6, 56

    Thanks
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    London, England
    Posts
    1,585
    Rep Power
    1373
    Originally Posted by pyton
    contents in file1 and file2 are not exactly same
    contents of file1=1, 2, 3, 4, 5, 6, 56,
    contents of file2=2, 6, 12, 58, 96, 56,
    contents of file3 should be =2, 6, 56

    Thanks
    What you want is not the file difference but the file intersection.

    For general file comparison you can use difflib or an external diff program, as grim said. However these are mainly for doing context diffs, which show a line by line difference - for your example data it would display something like

    Code:
    1- 1, 2, 3, 4, 5, 6, 56, 
    1+ 2, 6, 12, 58, 96, 56,
    this shows that a line has been removed and a new one added, which is not what you want.

    From your example I think what you really want to do is not show the difference, but to remove it altogether and show the common data. This is not a normal use for diff, although difflib could be used to do this with some careful coding.

    If the data files are always going to be comma separated lists of values then one possibility would be to read the values into sets and use set the intersection method to find common values. This is assuming that the order of the values is not important, since sets are unordered.

    Some questions to think about:

    1) are the files in CSV format? If so then you can use the csv module to read them in.

    2) do you want to compare the files line by line, i.e. only comparing line 1 of file 1 with line 1 of file 2?

    3) is the order of the entries important? What about duplicate values? If neither of these are important then you can use sets.

    4) what is the context of the problem? What is the higher level problem that this is trying to solve? There may be other ways of solving the higher-level problem than doing a diff between files.

    Dave - The Developers' Coach
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    45
    Rep Power
    11
    Yes I got the point I want to compare the files line by line

    Thanks
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    45
    Rep Power
    11
    Hi
    Thanks for the help and advice

    Here is something i tried but it displays all the contents of file1 and file2 and also it put's "+" and "- " sign i don't know what it means

    Haven't tried to store it to file3 as the result is not what i want (want the differences)
    Code:
    import os 
    
    file1=('num1.log')
    file2=('num2.log')
    file3=('num3.log')
    first=file(file1).readlines()
    secon=file(file2).readlines()
    diff=difflib.ndiff(first,secon)
    
    for line in diff:
        line=line.strip()
        #total=len(line)
        print line.rstrip()#total
    Thanks for any help
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    45
    Rep Power
    11
    would appriciate for some help for the previous query
    Thanks
  20. #11
  21. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    I think DevCoach explained this for you

    The difflib produces an output that programmatically describes the differences between the files (this is actually what I thought you required by the description in your original post). However, you say you actually just want an output file that just contains by line what is in file2 but not in file1 .

    If you answer DevCoach's questions 1,3 amd 4 we may be able to help further.

    It would help if you posted a real example of file1 and file 2 where the difference between them is clear. (2 or 3 lines from each file is enough).



    BTW
    file1=('num1.log')
    file2=('num2.log')
    file3=('num3.log')

    would normally be ...
    file1='num1.log'
    file2='num2.log'
    file3='num3.log'

    grim
  22. #12
  23. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    45
    Rep Power
    11
    I want check line by line

    Here are the line for file1 and file2

    file1
    Consistency checks are internal tests which software engineers have placed system code. The primary function of consistency checks is to ensure the stability and integrity of internal operating system data.

    file2
    Consistency checks are internal tests which software engineers have placed system code. The primary function of consistency checks is to ensure the stability and integrity of internal operating system data. Numerous consistency checks are interlaced throughout

    the answer in file3 should be the line
    Numerous consistency checks are interlaced throughout
  24. #13
  25. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    Based on the info you have provided here is a basic solution I am sure you can extend to do what you want:
    Code:
    file1 = 'num1.log'
    file2 = 'num2.log'
    file3 = 'num3.log'
    first = file(file1).readlines()
    secon = file(file2).readlines()
    
    for n in range(len(first)): 
        lenf = len(first[n].strip())
        diff = secon[n][lenf: ].strip()
        print "Line %s: ", diff
    Last edited by Grim Archon; July 8th, 2004 at 04:59 AM.
  26. #14
  27. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    45
    Rep Power
    11
    Thanks for all the help I shall modify the code according to my needs once again
    Thanks
  28. #15
  29. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    You could always use this generator to loop over one or more objects at the same time.

    Code:
    def group(*objects):
        while True:
            results = []
            for object in objects:
                results.append(object.next())
            yield tuple(results)
    
    for a, b in group(file('file1.txt'), file('file2.txt')):
        if a != b: print b,
    This has is limitations, since it will only iterate to the end of the smallest file but this could easily be extended if need be. But the example does show how easy it is to compare two lines in a file.

    Have fun,

    Mark.
    Last edited by netytan; July 8th, 2004 at 08:59 AM.
    programming language development: www.netytan.com Hula


IMN logo majestic logo threadwatch logo seochat tools logo