The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages
> Python Programming
|
Data processing script
Discuss Data processing script in the Python Programming forum on Dev Shed. Data processing script Python Programming forum discussing coding techniques, tips and tricks, and Zope related information. Python was designed from the ground up to be a completely object-oriented programming language.
|
|
 |
|
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

January 7th, 2013, 04:14 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 23
Time spent in forums: 4 h 20 m 58 sec
Reputation Power: 0
|
|
|
Data processing script
I am rather new to programming and I am currently trying to write a script to help me process some data. The file with the data I am trying to process is a little lengthy so I have it uploaded to my dropbox as a text document here. What I am trying to do is have the script search for the variable "surface 101", "surface 105" etc. and then store the data located in the very next line so that I can eventually graph it. I first successfully made a script to read one entry but now I want to expand it so that it will continue reading all of the data. The first script looks like this:
Code:
def reports(output):
with open(output) as f:
#r is a variable used to help find the point where the results start.
#two lines after 'cell 1' is the energy and intensity values
r = str(' surface 101')
c = 0 #intilizes the c variable used in the while loop
#The following while loop skips each line until it comes across the line defined in r
while c != r:
line = f.readline()
c = line.strip()
line = f.readline() #reads the energy, intensity and SEM
tally = []
SEM = []
numbers = line.strip().split()
tally.append(numbers[0])
SEM.append(numbers[1])
print(tally)
I have now added a second while loop to hopefully continue running the script until all "surface XX" variables are recorded and it looks like this:
Code:
def reports(output):
with open(output) as f:
#r is a variable used to help find the point where the results start.
count = 1
x = 101
r = str('surface 101')
c = 0 #intilizes the c variable used in the while loop
#The following while loop skips each line until it comes across the line defined in r
while count < 11
while c != r:
line = f.readline()
c = line.strip()
count += 1
x += 4
line = f.readline() #reads the energy and SEM
tally = []
SEM = []
numbers = line.strip().split()
tally.append(numbers[0])
SEM.append(numbers[1])
print(tally)
print(SEM)
I have two questions regarding this second piece of code.
1. the r variable is currently defined as
Code:
r = str('surface 101')
I believe I need to set it so that the "101" becomes whatever x is but I am not sure how to reference the variable x
2. Does it look like I am on the right track for what I am trying to accomplish? I hope this all makes sense. I am so new to programming it is difficult to explain what I am trying to do. Thanks!
|

January 7th, 2013, 04:44 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 23
Time spent in forums: 4 h 20 m 58 sec
Reputation Power: 0
|
|
Just as I continue to work on this I think I MIGHT have figured out how to reference the x variable inside of my r variable. When I run the script however I do come up with the following error:
Code:
>>> from report_test import reports
>>> reports('test.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "report_test.py", line 22, in reports
tally.append(numbers[0])
IndexError: list index out of range
This is my new updated code:
Code:
def reports(output):
with open(output) as f:
#r is a variable used to help find the point where the results start.
count = 1
x = 101
r = str('surface %d' % x)
c = 0 #intilizes the c variable used in the while loop
#The following while loop skips each line until it comes across the line defined in r
while count < 11:
while c != r:
line = f.readline()
c = line.strip()
line = f.readline() #reads the energy and SEM
tally = []
SEM = []
numbers = line.strip().split()
tally.append(numbers[0])
SEM.append(numbers[1])
count += 1
x += 4
print(tally)
print(SEM)
Thanks!
|

January 7th, 2013, 07:33 PM
|
 |
Contributing User
|
|
|
|
|
Sorry, where's your input file please?
Usually gawk is a better language to use for this sort of task. Python will work. Because the string is already a string,
r = str('surface %d' % x)
is effectively the same as
r = 'surface %d'%x
__________________
[code] Code tags[/code] are essential for python code!
|

January 7th, 2013, 08:36 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 23
Time spent in forums: 4 h 20 m 58 sec
Reputation Power: 0
|
|
Thanks for your reply. I have no idea what happened to the input file link. Would it have been removed because I don't have enough posts yet? Gawk would be better but the end goal of this is to graph all the data which I believe python will be better for. I have this section working now with your help and this is what it looks like:
Code:
def reports(output):
with open(output) as f:
count = 1 #this defines the number of tallies in the output file
x = 101 #this is the first surface number
r = str('surface %d' % x) #r is a variable used to help find the point where the results start.
c = 0 #intilizes the c variable used in the while loop
#The following while loop skips each line until it comes across the line defined in r
tally = []
SEM = []
while count < 11:
while c != r:
line = f.readline()
c = line.strip()
line = f.readline() #reads the energy and SEM
numbers = line.strip().split()
tally.append(numbers[0])
SEM.append(numbers[1])
count += 1
x += 4
r = str('surface %d' %x)
print(tally)
print(SEM)
I THINK with what I have here building the rest of the data should be fairly straightforward and I have a good example to build off for graphing it but there's a good chance I'll come running back here soon for more help :P Thanks again for you quick help!
|

January 7th, 2013, 08:51 PM
|
 |
Contributing User
|
|
|
|
|
post dot your dot link dot cleverly dot com
I usually write data files then plotted with gnuplot. Many steps, many processes, nice graphs.
|

January 7th, 2013, 08:53 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 23
Time spent in forums: 4 h 20 m 58 sec
Reputation Power: 0
|
|
OK this is the last issue before I start graphing the data. Here is what I have:
Code:
def reports(output):
with open(output) as f:
count = 1 #this defines the number of tallies in the output file
x = 101 #this is the first surface number
r = str('surface %d' % x) #r is a variable used to help find the point where the results start.
d = 0
c = 0 #intilizes the c variable used in the while loop
p = 0
y = 0
#The following while loop skips each line until it comes across the line defined in r
Flux = []
SEM = []
Depth = []
Change = []
while count < 11:
while c != r:
line = f.readline()
c = line.strip()
line = f.readline() #reads the energy and SEM
numbers = line.strip().split()
Flux.append(numbers[0])
SEM.append(numbers[1])
Depth.append(d)
p = ((Flux[0]-Flux[%s]) / Flux[0]) * 100 %y
Change.append(p)
d += 4
count += 1
x += 4
r = str('surface %d' %x)
print(SEM)
print(Depth)
On this section:
Code:
p = ((Flux[0]-Flux[%s]) / Flux[0]) * 100 %y
I am trying to have Flux[0] subtracted from Flux starting at 0 and increasing by 1. What I have listed here does not work.
|

January 7th, 2013, 08:57 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 23
Time spent in forums: 4 h 20 m 58 sec
Reputation Power: 0
|
|
|
Here is the input file:
https://dl [dot] dropbox [dot] com/u/14334980/test.txt
|

January 7th, 2013, 09:15 PM
|
 |
Contributing User
|
|
|
|
|
Warning! Alchemist converts gold to lead.
Code:
p = ((Flux[0]-Flux[-1]]) / Flux[0]) * 100
LIST[-1] # is the object at greatest index.
Flux[y] # would work, if you also update y
Solution with gnuplot and gawk.
File named /tmp/SURF
Code:
gnuplot> plot "<gawk 'a{print;a=0}($1==\"surface\")&&(2==NF){a=1}' /tmp/SURF" u ($0):1 w l
Last edited by b49P23TIvg : January 7th, 2013 at 10:18 PM.
|

January 7th, 2013, 09:24 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 23
Time spent in forums: 4 h 20 m 58 sec
Reputation Power: 0
|
|
So this is output using the -1 method:
Code:
>>> from report_test import reports
>>> reports('o')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "report_test.py", line 29, in reports
p = ((Flux[0]-Flux[-1]) / Flux[0]) * 100
TypeError: unsupported operand type(s) for -: 'str' and 'str'
So it looks like it doesn't like me subtracting a string from a string right? Can I convert these to integers or is there another syntax for subtracting strings?
|

January 7th, 2013, 09:33 PM
|
 |
Contributing User
|
|
|
|
oops. My programs work better if I test them first.
You'll need float(string)
Code:
p = '%g'%(100*((float(Flux[0])-float(Flux[-1])) / float(Flux[0])))
|

January 7th, 2013, 09:55 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 23
Time spent in forums: 4 h 20 m 58 sec
Reputation Power: 0
|
|
Quote: | Originally Posted by b49P23TIvg oops. My programs work better if I test them first.
You'll need float(string)
Code:
p = '%g'%(100*((float(Flux[0])-float(Flux[-1])) / float(Flux[0])))
|
Thanks it works!
So what does the '%g'% do? I understand adding the floats but not the other one.
|

January 7th, 2013, 10:13 PM
|
 |
Contributing User
|
|
|
|
Converts the number back to a string.
new string format
old string format
Must be a better description in the tutorial.
Code:
NB. www.jsoftware.com j session
NB. find best fit y = a exp(b x)
NB. A are the data
A=:0".'2.25607E-04 2.51727E-04 2.04769E-04 1.60019E-04 1.22105E-04 9.10189E-05 6.63603E-05 4.74191E-05 3.32884E-05 2.30198E-05 '
A=: }. A NB. behead the vector to remove the bad point.
[COEF=:(^. %. i.@:# ^/ 0 1"_)A NB. find coefficients of linear fit to log data
_8.17403 _0.301005
fit=: ^@:(COEF&p.) NB. verb fit
# A NB. tally A
9
i. 9 NB. integers
0 1 2 3 4 5 6 7 8
fit i. # A
0.00028188 0.000208612 0.000154388 0.000114259 8.456e_5 6.25807e_5 4.63144e_5 3.4276e_5 2.53668e_5
<.0.5+100*(%~ (- fit@:i.@:#)) A NB. percent error of the fit
_12 _2 4 6 7 6 2 _3 _10
Last edited by b49P23TIvg : January 7th, 2013 at 10:15 PM.
|

January 8th, 2013, 05:34 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 23
Time spent in forums: 4 h 20 m 58 sec
Reputation Power: 0
|
|
Here is my latest trouble. I am trying to get the program to output a script of all of my data. Here is what it looks like right now:
Code:
def reports(output):
with open(output) as f:
count = 1 #this defines the number of tallies in the output file
x = 101 #this is the first surface number
r = str('surface %d' % x) #r is a variable used to help find the point where the results start.
d = 0 #initializes depth variable
c = 0 #intilizes the c variable used in the while loop
p = 0 #initializes variable for calculating percent change
#The following while loop skips each line until it comes across the line defined in r
Flux = []
SEM = []
Depth = []
Change = []
while count < 10:
while c != r:
line = f.readline()
c = line.strip()
line = f.readline() #reads the energy and SEM
numbers = line.strip().split()
Flux.append(numbers[0])
SEM.append(numbers[1])
Depth.append(d)
p = '%g'%(100*((float(Flux[0])-float(Flux[-1])) / float(Flux[0]))) #calculates percent change
Change.append(p)
d += 4
count += 1
x += 4
r = str('surface %d' %x)
table = open('Report.txt', 'w')
table.write('================================================================ \n')
table.write('MCNPX Simulation For XXXXX \n')
table.write('================================================================ \n')
table.write('================================================================ \n')
table.write('Depth Flux %Reduction SEM \n')
table.write('================================================================ \n')
a=0
while a < 100: #this value is equal to the number of tallies
table.write(str(Depth[a].ljust(20)) + ' ' +
str(Flux[a].ljust(20)) + ' ' +
str(Change[a].ljust(20)) + ' ' +
str(SEM[a]) + '\n')
a = a + 1
print('Report created')
table.close()
This is what happens when I run it:
Code:
>>> from report_test import reports
>>> reports('o')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "report_test.py", line 49, in reports
float(SEM[a]) + '\n')
AttributeError: 'int' object has no attribute 'ljust'
I have tried adding on the .ljust to the SEM variable as well but the error is exactly the same with that. The section I added that is causing the error is this:
Code:
table = open('Report.txt', 'w')
table.write('================================================================ \n')
table.write('MCNPX Simulation For XXXXX \n')
table.write('================================================================ \n')
table.write('================================================================ \n')
table.write('Depth Flux %Reduction SEM \n')
table.write('================================================================ \n')
a=0
while a < 100: #this value is equal to the number of tallies
table.write(str(Depth[a].ljust(20)) + ' ' +
str(Flux[a].ljust(20)) + ' ' +
str(Change[a].ljust(20)) + ' ' +
str(SEM[a]) + '\n')
a = a + 1
print('Report created')
table.close()
|

January 8th, 2013, 05:59 PM
|
 |
Contributing User
|
|
|
|
Try this program. Mostly I only moved some right parentheses to the left a bit. Your report headings should include dimensions. % is the only one you show.
Code:
def reports(output):
with open(output) as f:
count = 1 #this defines the number of tallies in the output file
x = 101 #this is the first surface number
r = str('surface %d' % x) #r is a variable used to help find the point where the results start.
d = 0 #initializes depth variable
c = 0 #intilizes the c variable used in the while loop
p = 0 #initializes variable for calculating percent change
#The following while loop skips each line until it comes across the line defined in r
Flux = []
SEM = []
Depth = []
Change = []
while count < 10:
while c != r:
line = f.readline()
c = line.strip()
line = f.readline() #reads the energy and SEM
numbers = line.strip().split()
Flux.append(numbers[0])
SEM.append(numbers[1])
Depth.append(d)
p = '%g'%(100*((float(Flux[0])-float(Flux[-1])) / float(Flux[0]))) #calculates percent change
Change.append(p)
d += 4
count += 1
x += 4
r = str('surface %d' %x)
table = open('Report.txt', 'w')
table.write('================================================================ \n')
table.write('MCNPX Simulation For XXXXX \n')
table.write('================================================================ \n')
table.write('================================================================ \n')
table.write('Depth Flux %Reduction SEM \n')
table.write('================================================================ \n')
for (D,F,C,S,) in zip(Depth,Flux,Change,SEM,): ####### zip is roughly a matrix transposition.
table.write(str(D).ljust(20) + ' ' +
str(F).ljust(20) + ' ' +
str(C).ljust(20) + ' ' +
str(S) + '\n')
print('Report created')
table.close()
Last edited by b49P23TIvg : January 8th, 2013 at 06:03 PM.
|

January 8th, 2013, 06:45 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 23
Time spent in forums: 4 h 20 m 58 sec
Reputation Power: 0
|
|
Quote: | Originally Posted by b49P23TIvg Try this program. Mostly I only moved some right parentheses to the left a bit. Your report headings should include dimensions. % is the only one you show. |
Well that works nicely. I really appreciate your help. You're making my life too easy! So did the error have something to do with the variables being strings and it wanted them to be integers? I like your approach of putting the data in a matrix and using a for loop as opposed to a while loop. It certainly makes the code more robust. I do know I need to put units on the report I was going to research to see if there was another way to go about it. For example the units of flux are neutrons/(cm^2*s) which looks a little messy. I was going to see if there was a way to output the report as a .pdf file rather than a .txt and somehow use special characters like sub and superscripts. No idea if that is possible at this point or if it would involve more coding than I want to do. Next step is getting it to graph and then I can modify this to create a couple different types of reports I need with ease.
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|