Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old March 9th, 2013, 03:43 PM
buggsbunny4 buggsbunny4 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2013
Posts: 3 buggsbunny4 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 39 m 8 sec
Reputation Power: 0
Parse Multiple Files using Python and Write Data to a Single File

Hi, I am new to Python programming and I am trying to parse multiple files and get the data from each file and write it to a single file. The one file I would like to create the data just has a unique data based upon the primary keys and its associated data (similar to how we create the pivot table in excel). I am attaching the sample text file. All the text files has three sections and each section will be parsed to a separate text file.

Test Data 123

Section1
PrimaryKey Primary Key2 Primary Key3 TestData1 TestData2 TestData3
Key1 Data1 Sample1 119 100 0.920336134
Key1 Data2 Sample2 120 101 0.921666667
Key1 Data3 Sample3 115 96 0.914782609
Key2 Data1 Sample1 77 58 0.833246753
Key2 Data2 Sample2 66 47 0.792121212
Key3 Data1 Sample1 106 87 0.900754717

Section2
PrimaryKey Primary Key2 Primary Key3 TestData1 TestData2 TestData3 TestData4 TestData5 TestData6
Key1 Data1 Sample1 119 100 0.856 0.859 0.862 0.865
Key1 Data2 Sample2 120 101 0.876 0.879 0.882 0.885
Key1 Data3 Sample3 115 96 0.896 0.899 0.902 0.905
Key2 Data1 Sample1 77 58 0.916 0.919 0.922 0.925
Key2 Data2 Sample2 66 47 0.936 0.939 0.942 0.945
Key3 Data1 Sample1 106 87 0.956 0.959 0.962 0.965
Key1 Data1 Sample1 116 97 0.976 0.979 0.982 0.985
Key1 Data2 Sample2 101 82 0.996 0.999 1.002 1.005
Key1 Data3 Sample3 106 87 1.016 1.019 1.022 1.025
Key2 Data1 Sample1 61 42 1.036 1.039 1.042 1.045

Section3
Column1 Column2
Test1 DataTest
Test2 DataTest1
Test3 DataTest2
Test4 DataTest3
Test5 DataTest4
Test6 DataTest5
Test7 DataTest6
Test8 DataTest7

Thanks
rk

Reply With Quote
  #2  
Old March 10th, 2013, 01:02 PM
rrashkin's Avatar
rrashkin rrashkin is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2012
Location: 39N 104.28W
Posts: 97 rrashkin User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 Day 15 h 24 m 4 sec
Reputation Power: 2
You haven't really given us enough information to determine how to choose which input data goes into the output. NTL, here are some ideas.

Initialize a list for the output: outlist=[]
Put the names of the input files in a list: infiles=["/blah/blah/blah/file1.txt", "/blah,blah,blah/file2.txt",...]
Loop through the input files and split each line:
Code:
for f in infiles:
    fid=open(f)
    for rec in fid:
      data=rec.split()

Now key1=data[0], etc, and the data is data[3:]

Reply With Quote
  #3  
Old March 10th, 2013, 02:07 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Aug 2011
Posts: 3,458 b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 4 Days 6 h 26 m 43 sec
Reputation Power: 403
spreadsheet rant

I agree with rrashkin. Whew! Thought I was the only one who never used an excel pivot table.

3 parts to each of multiple files, but you showed only one input example. There's no test---an expected output for this input. And I'm perplexed by the columns labeled
`TestData' in the first two sections but fields named
`DataTest' in the third section.

When I started reading about pivot tables I found that excel has functions named sumif and countif . The need for these functions marks the disparity between a general purpose solution that I prefer versus the exceedingly specific functionality that Microsoft marketing research has discovered to be useful. I suppose the pivot table is a somewhat general solution to some problem.

Programming spreadsheets with variable names like QQ$3 instead of the descriptive creations we programmers invent. Unreal. The few times I've had to prepare a spreadsheet I named the cells, at least the constants.
__________________
[code]Code tags[/code] are essential for python code!

Reply With Quote
  #4  
Old March 10th, 2013, 04:33 PM
buggsbunny4 buggsbunny4 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2013
Posts: 3 buggsbunny4 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 39 m 8 sec
Reputation Power: 0
Quote:
Originally Posted by rrashkin
You haven't really given us enough information to determine how to choose which input data goes into the output. NTL, here are some ideas.

Initialize a list for the output: outlist=[]
Put the names of the input files in a list: infiles=["/blah/blah/blah/file1.txt", "/blah,blah,blah/file2.txt",...]
Loop through the input files and split each line:
Code:
for f in infiles:
    fid=open(f)
    for rec in fid:
      data=rec.split()

Now key1=data[0], etc, and the data is data[3:]


Sorry rrashkin about my vague question. To clarify the question, we will have three sections in each of the input files and all the data from the input files from each section should be written to a separate text file. For example we have 2 input files:

Input File1:
Section1
PKey1 PKey2 PKey3 Data1 Data2 Data3
Key1 Key2 Key3 80 100 0.90
Key1 Key2 Key4 85 101 0.89
Key2 Key3 Key4 100 125 0.89

Input File2:
Section1
PKey1 PKey2 PKey3 Data1 Data2 Data3
Key1 Key2 Key3 85 110 0.90
Key1 Key2 Key4 80 151 0.89
Key2 Key3 Key4 102 135 0.99
Key3 Key4 Key5 110 167 0.87

Output file for Section:
-------------------XXXXX-File1-XXXX---XXXXX-File2-XXXXX
PKey1 PKey2 PKey3 Data1 Data2 Data3 Data1 Data2 Data3
Key1 Key2 Key3 80 100 0.90 85 110 0.90
Key1 Key2 Key4 85 101 0.89 80 151 0.89
Key2 Key3 Key4 100 125 0.89 102 135 0.99
Key3 Key4 Key5 N/A N/A N/A 110 167 0.87

If a data exists in one file but not in the other we need to put "N/A" in the file where it does not exist. The same output follows for the other 2 sections.

Hope I clarified the question.

Thanks
rk

Reply With Quote
  #5  
Old March 11th, 2013, 10:50 AM
rrashkin's Avatar
rrashkin rrashkin is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2012
Location: 39N 104.28W
Posts: 97 rrashkin User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 Day 15 h 24 m 4 sec
Reputation Power: 2
Quote:
Originally Posted by buggsbunny4
Sorry rrashkin about my vague question. To clarify the question, we will have three sections in each of the input files and all the data from the input files from each section should be written to a separate text file. For example we have 2 input files:

Input File1:
Section1
PKey1 PKey2 PKey3 Data1 Data2 Data3
Key1 Key2 Key3 80 100 0.90
Key1 Key2 Key4 85 101 0.89
Key2 Key3 Key4 100 125 0.89

Input File2:
Section1
PKey1 PKey2 PKey3 Data1 Data2 Data3
Key1 Key2 Key3 85 110 0.90
Key1 Key2 Key4 80 151 0.89
Key2 Key3 Key4 102 135 0.99
Key3 Key4 Key5 110 167 0.87

Output file for Section:
-------------------XXXXX-File1-XXXX---XXXXX-File2-XXXXX
PKey1 PKey2 PKey3 Data1 Data2 Data3 Data1 Data2 Data3
Key1 Key2 Key3 80 100 0.90 85 110 0.90
Key1 Key2 Key4 85 101 0.89 80 151 0.89
Key2 Key3 Key4 100 125 0.89 102 135 0.99
Key3 Key4 Key5 N/A N/A N/A 110 167 0.87

If a data exists in one file but not in the other we need to put "N/A" in the file where it does not exist. The same output follows for the other 2 sections.

Hope I clarified the question.

Thanks
rk


Actually, I thought I gave you a general solution to your problem. You still have a question?

Reply With Quote
  #6  
Old March 11th, 2013, 06:38 PM
buggsbunny4 buggsbunny4 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2013
Posts: 3 buggsbunny4 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 39 m 8 sec
Reputation Power: 0
Quote:
Originally Posted by rrashkin
Actually, I thought I gave you a general solution to your problem. You still have a question?


Based upon what I understood, I think it will write all the data in to that data[] dictionary right? My question is how are we looking up the data based upon the primary keys and writing it to the columns based upon from which input file the data came from like I shown in my previous post

Reply With Quote
  #7  
Old March 12th, 2013, 11:09 AM
rrashkin's Avatar
rrashkin rrashkin is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2012
Location: 39N 104.28W
Posts: 97 rrashkin User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 Day 15 h 24 m 4 sec
Reputation Power: 2
In my scheme, "data" is a temporary list holding the data read in. If you know there are always 3 keys and 3 values (1 for each key) you could add them to a dictionary (I would build a new dictionary for each input file):
dct_data[data[0]]=data[3]
dct_data[data[1]]=data[4]
dct_data[data[2]]=data[5]

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > Parse Multiple Files using Python and Write Data to a Single File

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap