What you want is not the file difference but the file intersection.
Originally Posted by pyton
For general file comparison you can use difflib or an external diff program, as grim said. However these are mainly for doing context diffs, which show a line by line difference - for your example data it would display something like
this shows that a line has been removed and a new one added, which is not what you want.
1- 1, 2, 3, 4, 5, 6, 56,
1+ 2, 6, 12, 58, 96, 56,
From your example I think what you really want to do is not show the difference, but to remove it altogether and show the common data. This is not a normal use for diff, although difflib could be used to do this with some careful coding.
If the data files are always going to be comma separated lists of values then one possibility would be to read the values into sets and use set the intersection method to find common values. This is assuming that the order of the values is not important, since sets are unordered.
Some questions to think about:
1) are the files in CSV format? If so then you can use the csv module to read them in.
2) do you want to compare the files line by line, i.e. only comparing line 1 of file 1 with line 1 of file 2?
3) is the order of the entries important? What about duplicate values? If neither of these are important then you can use sets.
4) what is the context of the problem? What is the higher level problem that this is trying to solve? There may be other ways of solving the higher-level problem than doing a diff between files.
Dave - The Developers' Coach