#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    3
    Rep Power
    0

    Parse Multiple Files using Python and Write Data to a Single File


    Hi, I am new to Python programming and I am trying to parse multiple files and get the data from each file and write it to a single file. The one file I would like to create the data just has a unique data based upon the primary keys and its associated data (similar to how we create the pivot table in excel). I am attaching the sample text file. All the text files has three sections and each section will be parsed to a separate text file.

    Test Data 123

    Section1
    PrimaryKey Primary Key2 Primary Key3 TestData1 TestData2 TestData3
    Key1 Data1 Sample1 119 100 0.920336134
    Key1 Data2 Sample2 120 101 0.921666667
    Key1 Data3 Sample3 115 96 0.914782609
    Key2 Data1 Sample1 77 58 0.833246753
    Key2 Data2 Sample2 66 47 0.792121212
    Key3 Data1 Sample1 106 87 0.900754717

    Section2
    PrimaryKey Primary Key2 Primary Key3 TestData1 TestData2 TestData3 TestData4 TestData5 TestData6
    Key1 Data1 Sample1 119 100 0.856 0.859 0.862 0.865
    Key1 Data2 Sample2 120 101 0.876 0.879 0.882 0.885
    Key1 Data3 Sample3 115 96 0.896 0.899 0.902 0.905
    Key2 Data1 Sample1 77 58 0.916 0.919 0.922 0.925
    Key2 Data2 Sample2 66 47 0.936 0.939 0.942 0.945
    Key3 Data1 Sample1 106 87 0.956 0.959 0.962 0.965
    Key1 Data1 Sample1 116 97 0.976 0.979 0.982 0.985
    Key1 Data2 Sample2 101 82 0.996 0.999 1.002 1.005
    Key1 Data3 Sample3 106 87 1.016 1.019 1.022 1.025
    Key2 Data1 Sample1 61 42 1.036 1.039 1.042 1.045

    Section3
    Column1 Column2
    Test1 DataTest
    Test2 DataTest1
    Test3 DataTest2
    Test4 DataTest3
    Test5 DataTest4
    Test6 DataTest5
    Test7 DataTest6
    Test8 DataTest7

    Thanks
    rk
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    158
    Rep Power
    3
    You haven't really given us enough information to determine how to choose which input data goes into the output. NTL, here are some ideas.

    Initialize a list for the output: outlist=[]
    Put the names of the input files in a list: infiles=["/blah/blah/blah/file1.txt", "/blah,blah,blah/file2.txt",...]
    Loop through the input files and split each line:
    Code:
    for f in infiles:
        fid=open(f)
        for rec in fid:
          data=rec.split()
    Now key1=data[0], etc, and the data is data[3:]
  4. #3
  5. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,837
    Rep Power
    480

    spreadsheet rant


    I agree with rrashkin. Whew! Thought I was the only one who never used an excel pivot table.

    3 parts to each of multiple files, but you showed only one input example. There's no test---an expected output for this input. And I'm perplexed by the columns labeled
    `TestData' in the first two sections but fields named
    `DataTest' in the third section.

    When I started reading about pivot tables I found that excel has functions named sumif and countif . The need for these functions marks the disparity between a general purpose solution that I prefer versus the exceedingly specific functionality that Microsoft marketing research has discovered to be useful. I suppose the pivot table is a somewhat general solution to some problem.

    Programming spreadsheets with variable names like QQ$3 instead of the descriptive creations we programmers invent. Unreal. The few times I've had to prepare a spreadsheet I named the cells, at least the constants.
    [code]Code tags[/code] are essential for python code and Makefiles!
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    3
    Rep Power
    0
    Originally Posted by rrashkin
    You haven't really given us enough information to determine how to choose which input data goes into the output. NTL, here are some ideas.

    Initialize a list for the output: outlist=[]
    Put the names of the input files in a list: infiles=["/blah/blah/blah/file1.txt", "/blah,blah,blah/file2.txt",...]
    Loop through the input files and split each line:
    Code:
    for f in infiles:
        fid=open(f)
        for rec in fid:
          data=rec.split()
    Now key1=data[0], etc, and the data is data[3:]
    Sorry rrashkin about my vague question. To clarify the question, we will have three sections in each of the input files and all the data from the input files from each section should be written to a separate text file. For example we have 2 input files:

    Input File1:
    Section1
    PKey1 PKey2 PKey3 Data1 Data2 Data3
    Key1 Key2 Key3 80 100 0.90
    Key1 Key2 Key4 85 101 0.89
    Key2 Key3 Key4 100 125 0.89

    Input File2:
    Section1
    PKey1 PKey2 PKey3 Data1 Data2 Data3
    Key1 Key2 Key3 85 110 0.90
    Key1 Key2 Key4 80 151 0.89
    Key2 Key3 Key4 102 135 0.99
    Key3 Key4 Key5 110 167 0.87

    Output file for Section:
    -------------------XXXXX-File1-XXXX---XXXXX-File2-XXXXX
    PKey1 PKey2 PKey3 Data1 Data2 Data3 Data1 Data2 Data3
    Key1 Key2 Key3 80 100 0.90 85 110 0.90
    Key1 Key2 Key4 85 101 0.89 80 151 0.89
    Key2 Key3 Key4 100 125 0.89 102 135 0.99
    Key3 Key4 Key5 N/A N/A N/A 110 167 0.87

    If a data exists in one file but not in the other we need to put "N/A" in the file where it does not exist. The same output follows for the other 2 sections.

    Hope I clarified the question.

    Thanks
    rk
  8. #5
  9. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    158
    Rep Power
    3
    Originally Posted by buggsbunny4
    Sorry rrashkin about my vague question. To clarify the question, we will have three sections in each of the input files and all the data from the input files from each section should be written to a separate text file. For example we have 2 input files:

    Input File1:
    Section1
    PKey1 PKey2 PKey3 Data1 Data2 Data3
    Key1 Key2 Key3 80 100 0.90
    Key1 Key2 Key4 85 101 0.89
    Key2 Key3 Key4 100 125 0.89

    Input File2:
    Section1
    PKey1 PKey2 PKey3 Data1 Data2 Data3
    Key1 Key2 Key3 85 110 0.90
    Key1 Key2 Key4 80 151 0.89
    Key2 Key3 Key4 102 135 0.99
    Key3 Key4 Key5 110 167 0.87

    Output file for Section:
    -------------------XXXXX-File1-XXXX---XXXXX-File2-XXXXX
    PKey1 PKey2 PKey3 Data1 Data2 Data3 Data1 Data2 Data3
    Key1 Key2 Key3 80 100 0.90 85 110 0.90
    Key1 Key2 Key4 85 101 0.89 80 151 0.89
    Key2 Key3 Key4 100 125 0.89 102 135 0.99
    Key3 Key4 Key5 N/A N/A N/A 110 167 0.87

    If a data exists in one file but not in the other we need to put "N/A" in the file where it does not exist. The same output follows for the other 2 sections.

    Hope I clarified the question.

    Thanks
    rk
    Actually, I thought I gave you a general solution to your problem. You still have a question?
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    3
    Rep Power
    0
    Originally Posted by rrashkin
    Actually, I thought I gave you a general solution to your problem. You still have a question?
    Based upon what I understood, I think it will write all the data in to that data[] dictionary right? My question is how are we looking up the data based upon the primary keys and writing it to the columns based upon from which input file the data came from like I shown in my previous post
  12. #7
  13. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    158
    Rep Power
    3
    In my scheme, "data" is a temporary list holding the data read in. If you know there are always 3 keys and 3 values (1 for each key) you could add them to a dictionary (I would build a new dictionary for each input file):
    dct_data[data[0]]=data[3]
    dct_data[data[1]]=data[4]
    dct_data[data[2]]=data[5]

IMN logo majestic logo threadwatch logo seochat tools logo