#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2014
    Posts
    3
    Rep Power
    0

    Output lines inside a certain text segment within a text file


    Hello everybody, I have a text file which is structured as follows:

    source_code_directories {
    c:\Folder_A\Folder_B\Project_Sources\Algorithm\Algorithm_X
    c:\Folder_A\Folder_B\Project_Sources\Algorithm\Algorithm_Y
    c:\Folder_A\Folder_B\Project_Sources\Algorithm\Algorithm_Z
    c:\Folder_A\Folder_B\Project_Sources\ZZ\Configuration\
    c:\Folder_A\Folder_B\Project_Sources\ZZ\Testing
    c:\Folder_A\Folder_B\Project_Sources\ZZ\Test_Outputs
    c:\Folder_A\Folder_B\Configuration\Gener_Sourcess
    source_code_includes {
    c:\Folder_A\Folder_B\Configuration\Gener_Sourcess\X
    c:\Folder_A\Folder_B\Project_Sources\Algorithm\Algorithm_X
    c:\Folder_A\Folder_B\Project_Sources\Algorithm\Algorithm_Y
    c:\Folder_A\Folder_B\Project_Sources\Algorithm\Algorithm_Z
    c:\Folder_A\Folder_B\Project_Sources\Algorithm\Algorithm_F
    c:\Folder_A\Folder_B\Project_Sources\FF\Test_Inputs
    c:\Folder_A\Folder_B\Project_Sources\FF
    c:\Folder_A\Folder_B\Libraries\Includes_A
    c:\Folder_A\Folder_B\Libraries\Includes_B
    }


    I'm writing a script, where a I can say (for example):
    Give me all lines inside "source_code_includes {" which contains the folder "Algorithm".

    I have written the following script:

    Code:
    def parsesegment(fh):
        # Yields all lines inside "segmentC"
        state = "out"
        for line in fh:
            line = line.strip() # in case there are whitespaces around
            if state == "out":
                if line.startswith("source_code_includes {"):
                    state = "in"
                    break
            elif state == "in":
                if line.startswith("}"):
                    state = "out"
                    break
                if "\Algorithm" in line:
    			yield line 
    
    with open('text_file.txt', 'r') as fh, open('file_to_output.txt', 'w') as fo:
        for line in parsesegment(fh):
            fo.write(line)
    The output file is empty I'm also relativerly new to python. Anybody an Idea or a better suggestion? I would be very grateful!
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    /dev/null
    Posts
    163
    Rep Power
    18
    The logic in your function seems to be inappropriate. You're breaking out of the loop, the moment you see the line start with "source_code_includes {".
    Last edited by noobie1000; February 18th, 2014 at 04:56 AM.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2014
    Posts
    2
    Rep Power
    0
    I'm new to Python as well, but this might help:

    Code:
    import re
    infile = open("file", "r")
    
    for result in re.findall('source_code_includes(.*?)\}', infile.read(), re.S):
       for line in result.split('\n'):
          if re.search('Algorithm', line):
             print line
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2014
    Posts
    3
    Rep Power
    0
    @Subbeh Thx, this solution works. But what if I would like to output also every line where the folder "ZZ" is.

    Code:
    if re.search('Algorithm' and 'include', line):
    is syntactially not correct. I know about the keyword "any", but how to use it in this context?
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2014
    Posts
    2
    Rep Power
    0
    Originally Posted by python_beginner
    @Subbeh Thx, this solution works. But what if I would like to output also every line where the folder "ZZ" is.

    Code:
    if re.search('Algorithm' and 'include', line):
    is syntactially not correct. I know about the keyword "any", but how to use it in this context?
    If you want to include the lines which contain ZZ as well (still within 'source_code_includes { ... }'), you can try something like this:

    Code:
    if re.search('(Algorithm|ZZ)', line):
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2014
    Posts
    3
    Rep Power
    0
    @sabesh, thx, it works!
  12. #7
  13. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,713
    Rep Power
    480
    Code:
    if re.search('Algorithm' and 'include', line):
    is syntactically correct.

    any is a built in python function, not a key word.

    Look up the `and' and `or' operations to understand why
    $ python -c "print('Algorithm' and 'include')" # evaluates to
    include


    any((re.search('Algorithm'), re.search('include')))

    any(re.search(word) for word in 'Algorithm include'.split())
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo