#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    22
    Rep Power
    0

    Need help in file column parsing


    Hello all,

    I have a very large file (input.txt) containing 8 columns of data in the following tab separated format:

    or30|1 or14|1353 or30 or14 0 0 63.7 98.3
    or30|1 or76|2491 or30 or76 0 0 65.1 98.3
    or30|1 or39|1309 or30 or39 0 0 72 98.3
    or30|1 or64|261 or30 or64 0 0 70.8 98.3
    or30|1 or35|1353 or30 or35 0 0 70.3 98.3
    or30|1 or60|639 or30 or60 0 0 69.2 98.3
    or30|1 or59|1597 or30 or59 0 0 72.2 78.3
    or30|1 or56|995 or30 or56 0 0 69.2 58.3
    or30|1 or52|852 or30 or52 0 0 89.2 34.3

    ......................
    .................

    I want to keep only those lines where last two columns are >= 70 and >= 90 respectively, and paste the result in a output file (output.txt). Based on the above input, the output file should be:

    or30|1 or39|1309 or30 or39 0 0 72 98.3
    or30|1 or64|261 or30 or64 0 0 70.8 98.3
    or30|1 or35|1353 or30 or35 0 0 70.3 98.3

    Any guidance is highly appreciated. Thanks in advance...
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2007
    Location
    Joensuu, Finland
    Posts
    434
    Rep Power
    67
    Untested:

    Code:
    with open('input.txt', 'r') as input_data:
        with open('output.txt', 'w') as output_data:
            for line in input_data:
                penult, ult = [float(num) for num in line.split('\t')[-2:]]
                if penult >= 70 and ult >= 90:
                    output_data.write(line)
    Last edited by SuperOscar; February 11th, 2013 at 05:12 AM.
    My armada: openSUSE 13.1 (home desktop, home laptop), Crunchbang Linux 11 (work laptop), Trisquel GNU/Linux 6.0.1 (mini laptop), Ubuntu 14.04 LTS (server), Android 4.2.1 (tablet), Windows 7 Ultimate (testbed)
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    22
    Rep Power
    0

    thanks


    Works fine... thanks SuperOscar..
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2007
    Location
    Joensuu, Finland
    Posts
    434
    Rep Power
    67
    Originally Posted by utpalmtbi
    Works fine... thanks SuperOscar..
    You’re welcome. I just removed the “buff = []” line from the start of the script; it was from an older idea.
    My armada: openSUSE 13.1 (home desktop, home laptop), Crunchbang Linux 11 (work laptop), Trisquel GNU/Linux 6.0.1 (mini laptop), Ubuntu 14.04 LTS (server), Android 4.2.1 (tablet), Windows 7 Ultimate (testbed)
  8. #5
  9. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,851
    Rep Power
    481
    gawk is easier for task
    Code:
    $ gawk -F"\t" '(70<=$(NF-1))&&(90<=$NF)' /tmp/d.dat
    or30|1	or39|1309	or30	or39	0	0	72	98.3
    or30|1	or64|261	or30	or64	0	0	70.8	98.3
    or30|1	or35|1353	or30	or35	0	0	70.3	98.3
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo