1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2013
    Rep Power

    Find/Print Specific lines in a text file


    I am new to programming but am hoping it will be able to speed a few things up for me.

    However I am struggling to work out a (relatively simple) Python3 script that does the following and am wondering if you could help me out.

    I have a list of text files (~900KB each) that are the output of a computational chemistry code. [Essentially the files are the result of two calculations, the second starting with the result of the first].

    At some point in each, (after a varying number of iteration steps) there will be the lines:

    **** Optimisation achieved ****
    Final energy = -348.67740315 eV
    Final Gnorm = 0.00037832

    [N.B. This first optimisation achieved will (should) always be present in these files but it is necessary for me to check this and that it gives the same energy.]

    After some data, the file goes on to have the lines:

    Total number of defects = 1
    Total charge on defect = -4.00
    Defect centre is at 1.0000 0.0000 0.0000 Frac

    The file goes on and at some point later on (again after varying number of iteration steps) there will be the lines:

    **** Optimisation achieved ****
    Final defect energy = 64.41932012
    Final defect Gnorm = 0.00000283

    [N.B. This second optimisation achieved will not always be present. But if optimisation is not achieved a a warning note is given and the energies are still printed but are not of any interest to me.]

    [N.B. The numbers here are taken from an example file (and are not the only numeric values within the file)]

    I know how to open/read each file within the directory. I also know how to make and write to a new file.

    My problem, however is that I am unsure as to how to find the first 'Optimisation achieved and print the line 'Final Energy =...' and then go onto find and print the lines 'Charge on defect...' and 'Defect centre...'. Finally I want to be able to find and print the line 'Final defect energy=...' but only if the second optimisation is achieved.

    Thus, I am wondering if you could suggest how I could go about this - please be patient with me, as I said I have only recently started trying to use programming (I have looked at tutorials but don't see how to transfer onto my specific problem) and am not fully up to speed with the syntax.

    Any help would be much appreciated.

  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Rep Power
    [edit]I recall you said you're new to programming. The notes at the start of the python program explain a finite state machine and this particular implementation. Please learn them. Thanks, Dave.[/edit]

    I sure hope you're on a unix system. Firstly, gawk is stupendous for this task. Learn gawk. I'll show a program in python because I don't know that you have access to unix. Learn gawk.

    Let's say this is an input file:
    Let's begin  with useful information.  When  on a beat,  it's easy but
    costly to misidentify a lift as "the wind quit".  Especially at night,
    when the sleepy crew must take extra effort to look at the windex atop
    the mast or check the leeward jib telltales from the bow.  If you ease
    the sails or head up appropriately  you'll sail twice as fast as those
    who don't.  The race results aren't posted yet, but there was a period
    of light wind  last night when we significantly  out-performed the two
    boats in view.
    find the first 'Optimisation achieved and print the line 'Final Energy =...'
    **** Optimisation achieved ****
    Final energy = -348.67740315 eV              always display this line.
    Final Gnorm = 0.00037832 no one cares about the norm.
    'Charge on defect...' and 'Defect centre...'
    Total number of defects = 1
    Total charge on defect = -4.00                      display this line.
    Defect centre is at 1.0000 0.0000 0.0000 Frac       display this line.
    'Final defect energy=...'
    **** Optimisation achieved ****
    Final defect energy = 64.41932012         display iff 2nd optimisation.
    Final defect Gnorm = 0.00000283
    Result: We placed second overall,  second based on handicap.  Look for
    us in the spinnaker fleet  at the Buffalo harbor sailing club website:
    Such a  fast design is Damn  Yankee that she finishes  the race before
    the  pre-dawn calm.  Sailing  the entire  race in  good air  makes her
    difficult  to beat.   Wow.  A  few of  the handicap  times  are within
    minutes of us.
    For output you'll need the file name followed by the information you're sure you want.[code]'''
    This program implement a finite state machine. Works with current python2 and python3. Tested with good data and good file with bad chemistry.
    The variable named state starts with the value 0.
    Each time the program finds useful information it acts based on the
    value of state and the useful information. Then it changes the state.
    In this case end of file resets state to 0, all other interesting
    situations will increment state by 1. state can be reset to 0 by
    rerunning the program---that's my favorite choice because you can
    write a simple program in the shell that processes all files one at a
    time, as well as a simple program in some programming language to
    handle just one file. You haven't assured me that you've got unix to
    work with. This saddens me.

    We'll implement these rules, which for each line of an input are to be
    executed in this order:
        RULE state  EVENT              		     ACTIONS                            COMMENT
        0    0      start of file      		     print filename, increment state
        1    2      Final energy =                   display the line, increment state
        3    4      Defect centre is at              display the line, increment state
        4    6      Final defect energy =            display the line, increment state  hope I didn't miscount!  I did miscount, the program works but may no longer agree with this description.  Such is life.  Comments lie.
             ANY    Total number of defects          increment state
        last ANY    **** Optimisation achieved ****  increment state
    import sys
    def indent(ouf, line):
        ouf.write('  ' + line)
    def process(title, inf, ouf=sys.stdout):
            handle one file at a time.  Returns True iff the reaction simulation succeed.
            title prints as a heading printed to start the report.  (the file\'s name)
            inf is an object with a readline method.  Usually, open(filename)
            ouf is an object with a write method.  The destination of the report.  Usually, open(output_file,'w')
            Naming the states can be useful.  I didn\'t.  lex and bison are
            also good tools, but overpowered for this simple application.
        state = 0  # we agreed to start the state at 0 for each new file.
        ouf.write('{} desirable data from chemkin\n'.format(title))  # rule 0
        state += 1
        assert state == 1
        for line in inf:       # examine lines from input until finished.
            stripped_line = line.strip() # remove space characters at the ends of the string just in case this file came from a FORTRAN program.
            if (state == 2) and stripped_line.startswith('Final energy ='):  # rule 1
                indent(ouf, line)
                state += 1
            elif (state == 4) and stripped_line.startswith('Defect centre is at'):  # rule 3
                indent(ouf, line)
                state += 1
            elif (state == 6) and stripped_line.startswith('Final defect energy ='):  # rule 4
                indent(ouf, line)
                state += 1
            elif stripped_line.startswith('Total number of defects'):           # last 2 rules.
                state += 1
            elif stripped_line.startswith('**** Optimisation achieved ****'):
                state += 1
            if state == 7:  # We can finish before reading entire file.
                return True
        return False
    FILENAME = 'chemkin.result'
    with open('py.out', 'w') as ouf:
        with open(FILENAME, 'r') as inf:
            process(FILENAME, inf, ouf)
    Last edited by b49P23TIvg; September 8th, 2013 at 10:33 PM.
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2010
    Rep Power
    Once you've opened the file, you can iterate through it one line at a time, and check to see if the line reads "Optimisation acheived", e.g.:
    with open("datafile", 'r') as sourcefile, open("outputfile", 'w') as targetfile:
        for line in sourcefile:
            if line == "**** Optimisation achieved ****":
                #write your lines to the new file
                targetfile.write("Final Energy = ")
    Now if you want to do something different the second time the optimisation line is found, you could maybe create a boolean flag specifying it it was the first time or not, and add that in to the "if" statement.

    Edit: NM, his is better

IMN logo majestic logo threadwatch logo seochat tools logo