#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    2
    Rep Power
    0

    Return from leftmost match of string.


    Hi.

    Given a string like:
    "Hallo da. Ich bin Rolf. Ich arbeite hier. Tolle Sache."

    How would I replace everything up to, but not including "Ich (arbeite|bin)"?

    I tried:
    gsub(".*?Ich (arbeite|bin)","",$0); print$0}

    which returns "hier. Tolle Sache."

    How do I:
    1) Return the "Ich arbeite/Ich bin" part as well?
    2) Return from the first occurence of "Ich (arbeite|bin)", not the last. In this case, I'd want to return "Ich bin Rolf. Ich arbeite hier. Tolle Sache."

    Thank you.
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Location
    spaceBAR Central
    Posts
    229
    Rep Power
    42
    You could use grep, For example:
    Code:
    echo "Hallo da. Ich bin Rolf. Ich arbeite hier. Tolle Sache." | grep -Go '\(Ich.\{1,\}\)\(Ich.\{0,\}\)'
    Ich bin Rolf. Ich arbeite hier. Tolle Sache.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    2
    Rep Power
    0
    Originally Posted by spacebar208
    You could use grep, For example:
    Code:
    echo "Hallo da. Ich bin Rolf. Ich arbeite hier. Tolle Sache." | grep -Go '\(Ich.\{1,\}\)\(Ich.\{0,\}\)'
    Ich bin Rolf. Ich arbeite hier. Tolle Sache.
    Thanks. Given that I wanted this in an awk script, I couldn't straightforwardly use grep. Your comment, however, did make me realize that grepping, rather than subbing was the right thing to do.

    This seems to work fine now:
    Code:
    awk 'match($0,/Ich (arbeite|bin).*/) {print substr($0,RSTART,RLENGTH)}' txt.txt

IMN logo majestic logo threadwatch logo seochat tools logo