#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    20
    Rep Power
    0

    Extract lines from file


    Hi all,
    I have a huge file (input.txt) with twelve tab separated column of values such as:

    or30|1 or9|2240 47.17 918 459 10 1 908 50 951 4e-130 458
    or40|45 or3|2340 44.57 875 459 9 45 908 3 862 8e-103 367
    or30|1 or35|1353 98.46 909 14 0 1 909 47 955 0.0 1248

    ........
    ..

    From this file, I have to extract the lines which are less than or equal to 1e-10 (11 th column values) and paste it in an file (output.txt).

    Note that the values in the 11th columns may be present in diff. formats such as 0.0, 4, 1e-5 etc..

    Any help?? Thanks for ur consideration..
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2011
    Posts
    139
    Rep Power
    4

    Split


    Code:
    Data = open(filename,'r').readlines
    Output = open('Output.txt','w')
    
    for Line in Data :
      Line2=Line.strip()
      if Line2.split()[10] <= 1e-10 :
        Output.write(Line2+'\n')
    
    Output.close()
    
    Last edited by WynnDeezl; March 20th, 2013 at 01:39 PM.
  4. #3
  5. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,995
    Rep Power
    481
    Use gawk
    Code:
    $ gawk -F'\t' '$11 <= 1e-10' input.txt > output.txt

    Comments on this post

    • WynnDeezl disagrees : This didn't work like was described in the requirement.
    • partoj agrees
    Last edited by b49P23TIvg; March 20th, 2013 at 01:58 PM.
    [code]Code tags[/code] are essential for python code and Makefiles!
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    138
    Rep Power
    2
    Originally Posted by WynnDeezl
    Code:
    Data = open(filename,'r').readlines
    Output = open('Output.txt','w')
    
    for Line in Data :
      Line2=Line.strip()
      if Line2.split()[10] <= 1e-10 :
        Output.write(Line2+'\n')
    
    Output.close()
    
    Code:
    Traceback (most recent call last):
      File "val.py", line 4, in <module>
        for Line in Data :
    TypeError: 'builtin_function_or_method' object is not iterable
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    138
    Rep Power
    2
    Originally Posted by abhijit.bose
    Hi all,
    I have a huge file (input.txt) with twelve tab separated column of values such as:

    or30|1 or9|2240 47.17 918 459 10 1 908 50 951 4e-130 458
    or40|45 or3|2340 44.57 875 459 9 45 908 3 862 8e-103 367
    or30|1 or35|1353 98.46 909 14 0 1 909 47 955 0.0 1248

    ........
    ..

    From this file, I have to extract the lines which are less than or equal to 1e-10 (11 th column values) and paste it in an file (output.txt).

    Note that the values in the 11th columns may be present in diff. formats such as 0.0, 4, 1e-5 etc..

    Any help?? Thanks for ur consideration..
    Code:
    # "with" statement automatically closes the file when finished
    with open('input.txt','r') as infile:
      lines = infile.readlines()
    
      with open('output.txt','w') as outfile:
        for line in lines:
          val = line.split("\t")[10]
          if float(val.strip()) <= 1e-10:
            outfile.write(line)

    Comments on this post

    • abhijit.bose agrees
  10. #6
  11. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,995
    Rep Power
    481
    @WynnDeezl I doubt you've ever used gawk, and your program certainly doesn't work.

    Please note that abhijit.bose's sample data contains only 3 tab characters.
    Last edited by b49P23TIvg; March 20th, 2013 at 02:55 PM.
    [code]Code tags[/code] are essential for python code and Makefiles!
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2011
    Posts
    139
    Rep Power
    4

    Gawk


    Originally Posted by b49P23TIvg
    @WynnDeezl I doubt you've ever used gawk, and your program certainly doesn't work.

    Please note that abhijit.bose's sample data contains only 3 tab characters.


    Sorry. I didn't mean to sound rude. But when i tried that on a Linux command line it basically copied input.txt to output.txt

    And, yes you are correct. i've never used gawk.

  14. #8
  15. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,995
    Rep Power
    481

    Whoops!


    Code:
    $ gawk -F\\t '(10<NF)&&($11<=1e-10)' input.txt > output.txt
    Note, my original program was incorrect since it assumed there were enough fields, and furthermore my specification of tab character was WRONG.

    This has been another fine example of programs I don't test aren't likely to work.
    [code]Code tags[/code] are essential for python code and Makefiles!
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    138
    Rep Power
    2
    Originally Posted by b49P23TIvg
    Please note that abhijit.bose's sample data contains only 3 tab characters.
    Ah, I assumed that it only used tab as delimiter. What devilish program uses two different separators?
  18. #10
  19. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,995
    Rep Power
    481
    The original poster said tab separated. You got it right.

    When WynnDeezl said my gawk code copied input to output I knew it had to be incorrect since the input file example is silly*.

    In gawk an empty field interpreted as a number is 0, which is less than the tolerance. To correct that I included a test to make sure the number of fields is at least 11. Number of fields in a line is the NF variable. Then I tested my program and discovered that I had to escape the tab character correctly as well.

    *silly: substitute past tense of a four letter word.
    [code]Code tags[/code] are essential for python code and Makefiles!
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    20
    Rep Power
    0

    thanks


    Looks like the problem is well solved.. thank u all..
  22. #12
  23. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    20
    Rep Power
    0
    @ b49P23TIvg.. i definitely going to try gawk.. thanks

IMN logo majestic logo threadwatch logo seochat tools logo