#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2004
    Posts
    3
    Rep Power
    0

    How to compare files in python?


    I need to read from files with ***.dat extension and compare them. The result should be a holding file with all the information from the files, but the information musstn't be written twice
    e.g

    file 1
    *****
    User Adrian
    age 34
    money 4e+23
    money2 2e+21

    file2
    *****
    User Momba
    age 30
    money 4e+23
    money2 2e+22
    money3 3e+12

    holdingfile
    *******
    Users: 2
    Averageage: 32
    money 8e+46 (sum of money)
    money2 4e + 43 (sum of money2)
    money3 3e+12 (from file2)

    etc...

    the problem is, when i read data, which i manage with open() func, i need to convert the data in numbers, because i want to make stattistics(average, etc...).

    f = open(all files)
    f.readlines()

    How can I convert read data into lists or dicts?
    Can I use dicts to set the data for example:
    money as key and 8e+46 as value and then calculate with them, or is this the wrong way?
    please help [and excuse my bad english]!
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2004
    Posts
    84
    Rep Power
    11
    I'm not sure of a function to convert your data into a dictionary, so that's a problem you'll have to solve (it's not too difficult). however, you can convert a string into a list with the split() function, and optionally provide the character to split the string on. example:

    Code:
    s = 'hello world'
    print s.split() #by default will split on blank spaces, returns output ["hello", "world"]
    print s.split("w") #returns ["hello ", "orld"]
    so from there you can split your data up. then you can convert the values you will be working with mathematically to integers with the int() function, which just takes the value you wish to convert to an integer as its argument. it will raise a ValueError if you supply it with a value that cannot be cast to an integer, like the string 'hello'.
  4. #3
  5. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Why not loop over the file names, and then iterate over each line in each file and append the line if it isn't already present . Something like this...

    Code:
    #!/usr/bin/env python
    
    paths = ('file1.dat', 'file2.dat', 'file3.dat', 'file4.dat')
    lines = []
    
    for path in paths:
        for line in file(path):
            if line not in lines: lines.append(line)
    
    file('results.dat', 'w').writelines(lines)
    That should handle the output file anyway .

    Hope this helps,

    Mark.
    programming language development: www.netytan.com Hula

  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2004
    Posts
    3
    Rep Power
    0
    thanks, but the script has to run over a bundle of files, which has different names, such as:

    ciabdervin.dat
    e1.dat
    qb.dat

    so i have to find a way to search for ***.dat files and read them. Nevertheless my problem is to map the data in a dict
  8. #5
  9. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Well are the files in the same directory or do you need to search your entire computer? But even then, if you know what files you want to work with ahead of tile all you have to do is change the names in 'paths' to point at these files.

    Also, what data do you want to put in the dictionary? Really cant say i understood what you're after from your first post...

    You can use the int() function to type-cast the numbers in the string before summing them.

    Mark.
    programming language development: www.netytan.com Hula

  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2004
    Posts
    3
    Rep Power
    0
    thanks guys, i solved this problem with your help.....
    sorry, but i have sometimes problems to express myself.

    greets

IMN logo majestic logo threadwatch logo seochat tools logo