Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9

    Challenging Question about grep and awk in UNIX...?


    if I have a file contains:
    Input:
    2406306 10087720 T,>,G S1,S2,S3
    42406306 10126312 T,>,C S1,S2,S3,S6,S7,S8,S9
    42406306 10363280 G,>,T S1,S2,S3,S10,S11,S12
    42406306 10363297 T,>,C S1,S3
    42406306 1040544 T,>,C S1,S2,S3,S6

    Output:
    2406306 10087720 T,>,G S1,S2,S3
    42406306 10126312 T,>,C S1,S2,S3,S6,S7,S8,S9
    42406306 1040544 T,>,C S1,S2,S3,S6

    What command line I should type to let the output selected the lines at least must contains S1,S2,S3 but exludes any from S10-S12?
    I got ask my senior and she advised that grep and awk can function it.
    I have used:
    awk '/S1,.*S2.*S3/' file
    the above command line is worked. Do they still have any others command line that can generate the desired output result.
    Thanks a lot for all advises.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Location
    Charlotte, NC
    Posts
    111
    Rep Power
    12
    Another solution
    Code:
    grep "S1.*S2.*S3" file | grep -v "S1[0-2]"
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by ldapswandog
    Another solution
    Code:
    grep "S1.*S2.*S3" file | grep -v "S1[0-2]"
    hi, your solution is almost function d...
    but it still will include the S2,S3 in consideration...
    For my output, I just want those that at least must got S1,S2,S3 at the same line only take as consideration...
    In between, you got any better suggestion?
    Thanks a lot for your advise...
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Location
    Charlotte, NC
    Posts
    111
    Rep Power
    12
    your question is not clear. the example gets the same result you obtained using your example. It will only return a line that contain all three elements S1, S2 and S3 it will not return a line that contains only S1 and S2 or S1 and S3 or even S2 and S3. If you want something different then you need to provide a better example of the output you want to see.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by ldapswandog
    your question is not clear. the example gets the same result you obtained using your example. It will only return a line that contain all three elements S1, S2 and S3 it will not return a line that contains only S1 and S2 or S1 and S3 or even S2 and S3. If you want something different then you need to provide a better example of the output you want to see.
    Hi, actually I got a file with list of the line that similar to my question (S1,S2,S3,....). When I applied the command line that you guide me, some lines just got S1,S2 only also will generate as my output result. Actually my output result just want to generate those line that must at least contains S1,S2,S3 but exclude S10-S12 at the same time. You got better command line can solve this problem?
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2006
    Posts
    2,632
    Rep Power
    1811
    A line is passed forward (output) if:

    it contains ANY ONE of S1, S2 or S3 (so, as an example, a line JUST with S1 in is required)?
    but NOT if the line contains any one of S10, S11 or S12 (so, as an example, the aforementioned line that contains S1 will be omitted if it also contains S11)?

    Basically I am trying to tease out your exact inclusion/exclusion criteria.

    For the sake of potential efficiency roughly what percentage of lines would be included (regardless of if they would also be excluded) and what percentage of lines excluded?
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Location
    Charlotte, NC
    Posts
    111
    Rep Power
    12
    Then you want any line that contains S1 OR S2 OR S3 but NOT S10-12

    Code:
    grep -E "S1|S2|S3" file | grep -v "S1[0-2]"
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by SimonJM
    A line is passed forward (output) if:

    it contains ANY ONE of S1, S2 or S3 (so, as an example, a line JUST with S1 in is required)?
    but NOT if the line contains any one of S10, S11 or S12 (so, as an example, the aforementioned line that contains S1 will be omitted if it also contains S11)?

    Basically I am trying to tease out your exact inclusion/exclusion criteria.

    For the sake of potential efficiency roughly what percentage of lines would be included (regardless of if they would also be excluded) and what percentage of lines excluded?
    hi,
    actually I want my output result just only takes consider those line that S1 and S2 and S3 but excludes S10, S11, S12 at the same time as my output result. You got any idea about to get my desired output?
    As long as can get the desired output result, we also can use few any other command line to deal with this problem.
    Really thanks a lot for your help.
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by ldapswandog
    Then you want any line that contains S1 OR S2 OR S3 but NOT S10-12

    Code:
    grep -E "S1|S2|S3" file | grep -v "S1[0-2]"
    hi,
    actually I want my output result just only takes consider those line that S1 and S2 and S3 but excludes S10, S11, S12 at the same time as my output result. You got any idea about to get my desired output?
    As long as can get the desired output result, we also can use few any other command line to deal with this problem.
    Really thanks a lot for your help.
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2006
    Posts
    2,632
    Rep Power
    1811
    Originally Posted by patrick chia
    hi,
    actually I want my output result just only takes consider those line that S1 and S2 and S3 but excludes S10, S11, S12 at the same time as my output result. You got any idea about to get my desired output?
    As long as can get the desired output result, we also can use few any other command line to deal with this problem.
    Really thanks a lot for your help.
    This is the part I/we are having problems understanding what your needs are.
    Let's forget about what you want included for the moment and just concentrate on what you wish to have excluded ...
    If a line has JUST S10 in it would you want it excluded?
    So if a line consisted of:
    Code:
    42406306 10363280 G,>,S10
    would you want to exclude it?
    Not knowing how your data is created we can only guess at what the possible combinations/order is, so we have to seek clarification. Is it a case of excluding if the line contains S10 AND S11 AND S12 or do you want exclusion on it containing S10 OR S11 OR S12?
    The same question for your inclusion criteria - but obviously on S1, S2 and S3
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  20. #11
  21. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by SimonJM
    This is the part I/we are having problems understanding what your needs are.
    Let's forget about what you want included for the moment and just concentrate on what you wish to have excluded ...
    If a line has JUST S10 in it would you want it excluded?
    So if a line consisted of:
    Code:
    42406306 10363280 G,>,S10
    would you want to exclude it?
    Not knowing how your data is created we can only guess at what the possible combinations/order is, so we have to seek clarification. Is it a case of excluding if the line contains S10 AND S11 AND S12 or do you want exclusion on it containing S10 OR S11 OR S12?
    The same question for your inclusion criteria - but obviously on S1, S2 and S3
    hi,simon...
    Thanks for your remind... I agree what you said... I think I really miss up to said that what is the things that I want to exclude...
    Condition for my output result:
    1. Include S1 and S2 and S3 at the same time, but
    2. Exclude any S10 or S11 or S12 or any combination from the S10, S11, S12.

    Thanks a lot for all of your advise...
  22. #12
  23. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2006
    Posts
    2,632
    Rep Power
    1811
    You could do this in one pass with awk (the script'd be a little messy) or you could do it in two passes with awk and/or grep. To do it in grep we would a) pass your input file through, with an inverted (-v) search of each of the strings you do not wish (-e S10 -e S11 -e S12) to see (you could also do that with a regexp) and take that output as pipe it into another grep which looks for the string you do wish to see "S1,S2,S3". The actual order you'd do them in, would depend on whatever pass cut down the majority of output:

    Code:
    grep -v -e S10 -e S11 -e S12 YourInputFile | grep "S1,S2,S3" > YourDesiredOutputFile
    As ldapswandog has said, you could use regexp - man grep for details of options (not all versions of grep handle multiple search criteria with -e directly).
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by SimonJM
    You could do this in one pass with awk (the script'd be a little messy) or you could do it in two passes with awk and/or grep. To do it in grep we would a) pass your input file through, with an inverted (-v) search of each of the strings you do not wish (-e S10 -e S11 -e S12) to see (you could also do that with a regexp) and take that output as pipe it into another grep which looks for the string you do wish to see "S1,S2,S3". The actual order you'd do them in, would depend on whatever pass cut down the majority of output:

    Code:
    grep -v -e S10 -e S11 -e S12 YourInputFile | grep "S1,S2,S3" > YourDesiredOutputFile
    As ldapswandog has said, you could use regexp - man grep for details of options (not all versions of grep handle multiple search criteria with -e directly).


    Thanks a lot, Simon...
    I think I FINALLY get my desired output result already...
    Really thanks for your help to solve this question...
    Thanks a lot again^_^
  26. #14
  27. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by SimonJM
    You could do this in one pass with awk (the script'd be a little messy) or you could do it in two passes with awk and/or grep. To do it in grep we would a) pass your input file through, with an inverted (-v) search of each of the strings you do not wish (-e S10 -e S11 -e S12) to see (you could also do that with a regexp) and take that output as pipe it into another grep which looks for the string you do wish to see "S1,S2,S3". The actual order you'd do them in, would depend on whatever pass cut down the majority of output:

    Code:
    grep -v -e S10 -e S11 -e S12 YourInputFile | grep "S1,S2,S3" > YourDesiredOutputFile
    As ldapswandog has said, you could use regexp - man grep for details of options (not all versions of grep handle multiple search criteria with -e directly).
    Hi, simon...
    I try again the command line that you suggested. I suddenly find out that the "grep" command line won't take the line that like " S1,S4,S2,S13,S15,S3,S6,S7,S8,S9" even though they also contains the "S1 and S2 and S3". Do you have any ideas about how to solve this problem? Actually those line that like what I mention just now, I also want take it as consideration. My desired output must contains S1 and S2 and S3 doesn't matter it was continuous or discontinuous S1 and S2 and S3.
    You know how to deal with this problem?
    Thanks a lot for your advise.
  28. #15
  29. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2006
    Posts
    177
    Rep Power
    237
    use awk
    Code:
    awk -F"," '{
      for(i=1;i<=NF;i++){   if ( $i~/S1|S2|S3/ ) ++d   }
      if ( d == 3) {   print $0  }
      d=0  
    }' file
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo