Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Posts
    8
    Rep Power
    0

    Question Help solve these expert-level regex puzzles


    Please note that I extracted this data from a post that I put online at StackOverflow. Unfortunately, the question has yet to be answered. I think the programmers here rock and will be able to solve it.

    **Example of input**

    vulture (wing)
    tabulations: one leg; two legs; flying
    father; master; patriarch
    mat (box)
    pedistal; blockade; pilar
    animal belly (oval)
    old style: naval
    jackal's belly; jester slope of hill (arch)
    key; visible; enlightened

    Basically, I'm having trouble with some more complicated regex commands. Most of the code I'm finding that uses regex is very simple, but I could use it in so many places if I could get good with it. Would you look at the kind of stuff I'm trying to do and see if you can convert any of it?

    1. Arrayize the word or words between the braces, "(" and ")".
    DESIRED OUTPUT EXAMPLE: $array = wing, box, oval, arch.

    2. Arrayize the first words following a new line, or the start of the example. Or, four spaces and an alpha character, from the first alpha character to the space+brace " (".
    DESIRED OUTPUT EXAMPLES2: $array2 = vulture, mat, animal belly, slope of hill.

    3. Arrayize words separated by semicolons on lines without colons.
    DESIRED OUTPUT EXAMPLE3: $array3 = $subarray0 = father, master, patriarch; subarray1 = pedistal, blockade, pilar; subarray2 = jackal's belly, jester; subarray3 = key, visible, enlightened.

    4. Arrayize lines separated by semicolons that follow the string "old style: ". If a line with a bracket, "(" appears before the next "old style: "-starting line, add a "null" subarray to the array.
    DESIRED OUTPUT EXAMPLE4: $array4 = $subarray0 = null; $subarray1 = null; $subarray2 = naval; $subarray3 = null.

    5. The same as 3, except the desired lines begin with the string "tabulations: ".
    DESIRED OUTPUT EXAMPLE5: $array5 = subarray0 = one leg, two legs; subarray1 = null; subarray2 = null; subarray3 = null.

    I am trying to figure out how to do this via PHP because PHP is the language I am most comfortable with, and it is the language I would use most in this way.

    -Edit-

    **First solution I was working on**

    Okay, so the first solution I was working on is to solve 3. I tried breaking the lines at the semicolons, and I was then hoping to grab the data, line-by-line and edit it further.

    $input = file_get_contents('explode.txt');
    foreach(explode("\n", $input) as $line){
    $words = explode(';', $line);
    foreach($words as $word){
    echo $word;
    }
    }

    Basically, looking at the output, the data ended up in the same format it was already in, only subtract the semicolons. This wasn't very useful, and I decided to stop.

    **Second solution I am working on**

    This is based around this line of code: `preg_match_all('/\;([^;]+)\}/', $myFile, $matches)`.

    Basically, I create a function that looks for a starting and ending string and try to use recursion tricks to isolate things. I thought it would be easiest to start with question 1 using this method:

    function get_between($startString, $endString, $myFile){
    preg_match_all('/\$startString([^$endString]+)\}/', $myFile, $matches);
    return $matches;
    }
    $myFile = file_get_contents('explode.txt');
    $list = get_between("&nbsp(", ")", $myFile);
    foreach($list as $list){
    echo $list;
    }

    This actually just returned the words `ArrayArray`, and I haven't been able to figure out how to get an appropriate output. I was also told this method would not work with nested blocks, and I'm worried that `$startsWith` and `$endsWith` may need to be single characters. I'm currently trying to verify that, as I believe this is the closest solution.

    **The third solution attempt**

    So, I had an initial idea of how I wanted to approach this, and I decided to go about it my own way. Again, I started with question 1 because it seemed easiest. It has the fewest exceptions

    function find_between($input,$start,$end) {
    if (strpos($input,$start) === false || strpos($input,$end) === false) {
    return false;
    } else {
    $start_position = strpos($input,$start)+strlen($start);
    $end_position = strpos($input,$end);
    return substr($input,$start_position,$end_position-$start_position);
    }
    }

    $myFile = file_get_contents('explode.txt');

    $output = find_between($myFile,'(',')');

    echo $output;

    As far as I can tell, this will work. The issue I'm having is with the recursion. I tried `foreach($output as $output){echo $output;}`, but this gave me an error. It seems obvious to me that it's because I haven't recursed and so haven't arrayized. The reason I stopped along this path is because I was told by several programmers that I was doomed to failure. So, I'm currently back to working on solution 2.

    P.S. - John Malcovich says hi.
  2. #2
  3. Turn left at the third duck
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Location
    Nelson, NZ
    Posts
    112
    Rep Power
    93
    Hi Fishtankalpha!

    It's great that you gave an example of the input.

    Can you please also show the exact output you are seeking? For me, an example is a thousand times easier to understand than instructions (such as "arrayize this" and "get the word or words following the colon and preceding the line break").

    Wishing you a beautiful weekend
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Posts
    8
    Rep Power
    0

    Yes,dude


    Originally Posted by ragax
    Hi Fishtankalpha!

    It's great that you gave an example of the input.

    Can you please also show the exact output you are seeking? For me, an example is a thousand times easier to understand than instructions (such as "arrayize this" and "get the word or words following the colon and preceding the line break").

    Wishing you a beautiful weekend
    I'm going to go through each of the questions and show the desired output, man. Thanks for getting back at me. I made some progress, working really hard at this stuff. I'm still only maybe 1/4 of the way done?

    Cheers! Happy New Years.


    -Edit-

    Got the output examples posted and revised the statements as to what they were. I'll keep working on the code and post if I figure it out, myself. Thanks for the attention.
  6. #4
  7. Turn left at the third duck
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Location
    Nelson, NZ
    Posts
    112
    Rep Power
    93
    Hey FishTankAlpha,

    Good work explaining what you need.
    I'll look at the arrays one by one.

    Here's the first array.

    PHP Code:
    $s='
    vulture (wing)
    tabulations: one leg; two legs; flying
    father; master; patriarch
    mat (box)
    pedistal; blockade; pilar
    animal belly (oval)
    old style: naval
    jackal\'s belly; jester slope of hill (arch)
    key; visible; enlightened'
    ;
    preg_match_all(',\(\K[^)]+,'$s$mPREG_PATTERN_ORDER);
    echo 
    '<pre>';
    print_r($m[0]);
    echo 
    '</pre>'
    Output:

    Code:
    Array
    (
        [0] => wing
        [1] => box
        [2] => oval
        [3] => arch
    )
  8. #5
  9. Turn left at the third duck
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Location
    Nelson, NZ
    Posts
    112
    Rep Power
    93
    Hi again!

    Please look at the post just above for the solution to Array 1.

    Sorry, I didn't understand what you wanted in Array 2. You said "first words following a new line". Yet "tabulations" and "old" (among others) were not in your output.

    My feeling is that your requirements have so many IFs that this is not a good task for straight regex. It's more a task for a series of regexes, with some programmatic control.

    The first thing you need to do is split the string into lines. Then apply a series of expressions.

    Here's a first pass at array 3. In the output, you will see that one of the arrays is not exactly what you are looking for---again because of these multiple IFs, which will probably require programmatic control and multiple expressions.

    I'm more interested in solving straight regexes, so I will leave you with this as a skeleton to improve on.
    Hopefully it's a few steps in the right direction.

    Wishing you a beautiful day!

    PHP Code:
    $s='vulture (wing)
    tabulations: one leg; two legs; flying
    father; master; patriarch
    mat (box)
    pedistal; blockade; pilar
    animal belly (oval)
    old style: naval
    jackal\'s belly; jester slope of hill (arch)
    key; visible; enlightened'
    ;
    $lines=preg_split(',\r\n,',$s);
    foreach (
    $lines as $line) {
    if(!
    preg_match(',:,',$line))
    preg_match_all(',\b(\w+)(?:;|$),'$line$mPREG_PATTERN_ORDER);
    echo 
    '<pre>';
    if (!empty(
    $m[0])) print_r($m[1]);
    echo 
    '</pre>';

    Output:
    Code:
    Array
    (
        [0] => father
        [1] => master
        [2] => patriarch
    )
    
    Array
    (
        [0] => pedistal
        [1] => blockade
        [2] => pilar
    )
    
    Array
    (
        [0] => belly
    )
    
    Array
    (
        [0] => key
        [1] => visible
        [2] => enlightened
    )
    Last edited by ragax; January 1st, 2012 at 05:29 PM. Reason: Clean up: filtered empty results from output
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Posts
    8
    Rep Power
    0

    About array2


    Originally Posted by ragax
    Hi again!

    Please look at the post just above for the solution to Array 1.

    Sorry, I didn't understand what you wanted in Array 2. You said "first words following a new line". Yet "tabulations" and "old" (among others) were not in your output.

    My feeling is that your requirements have so many IFs that this is not a good task for straight regex. It's more a task for a series of regexes, with some programmatic control.

    The first thing you need to do is split the string into lines. Then apply a series of expressions.

    Here's a first pass at array 3. In the output, you will see that one of the arrays is not exactly what you are looking for---again because of these multiple IFs, which will probably require programmatic control and multiple expressions.

    I'm more interested in solving straight regexes, so I will leave you with this as a skeleton to improve on.
    Hopefully it's a few steps in the right direction.

    Wishing you a beautiful day!

    PHP Code:
    $s='vulture (wing)
    tabulations: one leg; two legs; flying
    father; master; patriarch
    mat (box)
    pedistal; blockade; pilar
    animal belly (oval)
    old style: naval
    jackal\'s belly; jester slope of hill (arch)
    key; visible; enlightened'
    ;
    $lines=preg_split(',\r\n,',$s);
    foreach (
    $lines as $line) {
    if(!
    preg_match(',:,',$line))
    preg_match_all(',\b(\w+)(?:;|$),'$line$mPREG_PATTERN_ORDER);
    echo 
    '<pre>';
    if (!empty(
    $m[0])) print_r($m[1]);
    echo 
    '</pre>';

    Output:
    Code:
    Array
    (
        [0] => father
        [1] => master
        [2] => patriarch
    )
    
    Array
    (
        [0] => pedistal
        [1] => blockade
        [2] => pilar
    )
    
    Array
    (
        [0] => belly
    )
    
    Array
    (
        [0] => key
        [1] => visible
        [2] => enlightened
    )
    Hi! I actually solved this one. I was looking for any set of words between a string like this, " (", and the beginning of a line. The problem I am having is that the first line of a document doesn't count as a
    Code:
    newline
    .

    Here's what I have:

    Code:
    <?php
    $filename = "fakexample.txt";
    $file = fopen($filename, "rb");
    $myFile = fread($file, filesize($filename));
    function get_between($startString, $endString, $myFile, $startSafe, $endSafe){
      //Escape start and end strings.
      
        $startStringSafe = $startString;
      
      if($endSafe = 0){
        $endStringSafe = $endString;
      }
      elseif($endSafe = 1){
        $endStringSafe = preg_quote($endString, '/');
      }
      //non-greedy match any character between start and end strings. 
      //s modifier should make it also match newlines.
      preg_match_all("/$startStringSafe(.*?)$endStringSafe/m", $myFile, $matches);
      return $matches;
    }
    $list = get_between("^", " (", $myFile, 0, 1);
    foreach($list[1] as $list){
      echo $list."\n";
    }
    ?>
    I like your explanation, though. I actually have been working very hard on array 3, without any success at all. I keep getting errors and "ArrayArray"! OH MAN, Seeing "ArrayArray" really burns me. It's nice to have some of the stuff I need to do to achieve a result explained in a less complex example, like line splitting. Thanks
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Posts
    8
    Rep Power
    0
    If it helps, this is the regex that works in Ruby,
    Code:
    ^((?!(:|\()).)*$
    . I also got a lot of errors and warnings on line 13 of the code you submitted, and I'm workin on figuring out what it all means. When I used the Ruby code, I got this:

    `Array
    (
    [0] =>
    )
    Array
    (
    [0] =>
    )
    Array
    (
    [0] => d
    )`

    Or: "d".

    So basically, this worked....

    Code:
    <?php
    $filename = "fakexample.txt";
    $file = fopen($filename, "rb");
    $myFile = fread($file, filesize($filename));
    
    function get_lines($string, $myFile){
      if (preg_match_all("/$string/m", $myFile, $matches))
        return $matches[0];
      else return array();
    }
    
    // Match lines with ; but no :
    $string = '^((?!(:|\()).)*$';
    $lines = get_lines($string, $myFile);
    
    foreach($lines as $line){
      echo $line."\n";
    }
    ?>
    The workaround is a bit generic. I just excluded any lines containing colons or open parens. I was trying to use an inclusion method, only. I think that exclusion and inclusion is necessary to bang out the right result set, though.
  14. #8
  15. Turn left at the third duck
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Location
    Nelson, NZ
    Posts
    112
    Rep Power
    93
    You're welcome.

    ^((?!(:|\()).)*$
    That's a strange expression.
    It basically means "Match every character on this line, as long as it is not a colon or an opening parenthesis.
    If that's what you want to say, the usual (and better) syntax does not use a lookahead:

    Code:
    ^[^:(]*$
    Now you have to decide if that's really what you want to say. Probably not, since PHP is giving you empty arrays.

  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Posts
    8
    Rep Power
    0
    Originally Posted by ragax
    You're welcome.



    That's a strange expression.
    It basically means "Match every character on this line, as long as it is not a colon or an opening parenthesis.
    If that's what you want to say, the usual (and better) syntax does not use a lookahead:

    Code:
    ^[^:(]*$
    Now you have to decide if that's really what you want to say. Probably not, since PHP is giving you empty arrays.

    The non-lookahead version also gives this:
    Code:
    father; master; patriarch pedistal; blockade; pilar jackal's belly; jester key; visible; enlightened
    . It's kind of the desired output, but then it's really not.... I also don't think it's exactly what I want to say, though, because they aren't split, right? Ideally, array=[0]father, [1]master, ..., [10]enlightened. You can see that splitting at semicolons won't work because "pilar" and "jackal's belly" are not split by semicolons.
  18. #10
  19. Turn left at the third duck
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Location
    Nelson, NZ
    Posts
    112
    Rep Power
    93
    My last post was only to help you troubleshoot the syntax of the piece of regex that you said was broken in PHP.

    For the splitting and arrays, I have already replied in my post about item 3. The output goes quite far in the direction you wanted. At this stage I wonder if a programmatic solution might take you the rest of the way.

    Wishing you a fun day.
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Posts
    8
    Rep Power
    0
    Originally Posted by ragax
    My last post was only to help you troubleshoot the syntax of the piece of regex that you said was broken in PHP.

    For the splitting and arrays, I have already replied in my post about item 3. The output goes quite far in the direction you wanted. At this stage I wonder if a programmatic solution might take you the rest of the way.

    Wishing you a fun day.
    Thanks, ragax. I appreciate it. I had to go to the shrine, but I'm still working on it tonight and tomorrow. I'll let you know if I get set 3 done tomorrow.

    Peace (Y)
  22. #12
  23. Turn left at the third duck
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Location
    Nelson, NZ
    Posts
    112
    Rep Power
    93
    Hey bro,

    I got up early, so I cooked you up a solution to Array 3. Apart from splitting the subject into lines, it's a single-regex solution that builds the array programmatically.

    You can probably use the same idea for the other arrays.
    First the solution. I'll explain about generalizing in the next post.

    The code:
    PHP Code:
    $s='vulture (wing)
    tabulations: one leg; two legs; flying
    father; master; patriarch
    mat (box)
    pedistal; blockade; pilar
    animal belly (oval)
    old style: naval
    jackal\'s belly; jester slope of hill (arch)
    key; visible; enlightened'
    ;
    $lines=preg_split(',\r\n,',$s);
    $i=0;
    foreach (
    $lines as $line) {
    if(!
    preg_match(',:,',$line))
    preg_match_all(',(?x) # comment mode
    (?:\s*([^;]+);) # first matches (captured in match 1): everything before the ;
    | # or
    (?<=;)\s*\b(\w+)\b # last match (captured in match 2): just one word after the ; ,'
    ,
    $line$mPREG_PATTERN_ORDER);
    if (!empty(
    $m[0])) {
        
    $sz=count($m[0]);
        for (
    $j=0;$j<$sz;$j++)
        if(!empty(
    $m[1][$j])) $res[$i][$j]=$m[1][$j];
        else 
    $res[$i][$j]=$m[2][$j];
    $i++;
    }
    }  
    echo 
    '<pre>';
    print_r($res);
    echo 
    '</pre>'
    The Output:
    Code:
    Array
    (
        [0] => Array
            (
                [0] => father
                [1] => master
                [2] => patriarch
            )
    
        [1] => Array
            (
                [0] => pedistal
                [1] => blockade
                [2] => pilar
            )
    
        [2] => Array
            (
                [0] => jackal's belly
                [1] => jester
            )
    
        [3] => Array
            (
                [0] => key
                [1] => visible
                [2] => enlightened
            )
    
    )
  24. #13
  25. Turn left at the third duck
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Location
    Nelson, NZ
    Posts
    112
    Rep Power
    93
    Hi again,

    For the solution to Array 3, see the previous post.

    Okay, let's talk about what this does and how to generalize it.
    To match all the possible strings in your array, we're having to use multiple capture groups. (Unlike in my first pass at array 3 yesterday.) As a result, for each matching line, you end up with an array $m of matches that either live in $m[$i][1] or $m[$i][2]. The programming job after the regex is to
    build the array of results $res by parsing the arrays of matches $m[$i].

    What can be generalized about this?
    Look at the regex pattern. You see that there are two options separated by an alternation (|). For your other arrays, if you need more than two options, just add alternations. This will mean that your matches may also live in $m[$i][3], $m[$i][4] etc.

    You will just have to expand the programmatic building of the results array to account for that.

    Please let me know how this works for you.
    And don't hesitate if you have any questions about this solution.

    Wishing you a fun day.
  26. #14
  27. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Posts
    8
    Rep Power
    0

    Lots of questions...


    Well, of course I've got a lot of questions because I'm very new at regex. I can see where you're going, but I can't see how you got there. Take a look at what I'm reading into this by checking out the comments I added. I know what I'm seeing isn't correct, but I need to know close I was.

    I've noticed an interesting exception, also. I'll add info about it. I'm actually trying to use this site as a reference: http://www.regular-expressions.info/reference.html. I know I'm way off. How did you learn this stuff? Were you able to isolate and test parts of the code, somehow?

    PHP Code:
    <?php
     $s
    ='vulture (wing)
    tabulations: one leg; two legs; flying
    father; master; patriarch
    mat (box)
    pedistal; blockade; pilar
    animal belly (oval)
    old style: naval
    jackal\'s belly; jester
    slope of hill (arch)
    key; visible; enlightened
    tethering rope (hand)
    tabulations: church
    ores
    old style: jewelry'


    #Set lines to the result of splitting $s at line  breaks (using `\r\n` because you're on Windows.
    $lines=preg_split(',\r\n,',$s); 

    #set i equal to 0 for the following loop.
    $i=0
    foreach (
    $lines as $line) { 
      
    #if the line doesn't have a ":" in it...
      
    if(!preg_match(',:,',$line)) 
        
    #Why use a comma as a delimeter?  I thought / was the standard delimeter.  (?x) means either 'allow perl syntax' or 'ignore white space'?
        
    preg_match_all(',(?x) # comment mode 
        
        #The string may start with :, followed by one or more whitespace characters, then one or more new line characters semicolons.
        (?:\s*([^;]+);) # first matches (captured in match 1): everything before the ; 

       
        | # or 
        
        #Before the semicolon, there is any number of whie space characters and a word.  Then there is an alpha-numeric character plus a word boundary.
        (?<=;)\s*\b(\w+)\b # last match (captured in match 2): just one word after the ; ,'

      
    $line$mPREG_PATTERN_ORDER); 
      if (!empty(
    $m[0])) { 
        
    $sz=count($m[0]); 
        for (
    $j=0;$j<$sz;$j++) 
        if(!empty(
    $m[1][$j])) $res[$i][$j]=$m[1][$j]; 
        else 
    $res[$i][$j]=$m[2][$j]; 
    $i++; 

    }   
    echo 
    '<pre>'
    print_r($res); 
    echo 
    '</pre>';  
    ?>
    Originally Posted by ragax
    Hi again,

    For the solution to Array 3, see the previous post.

    Okay, let's talk about what this does and how to generalize it.
    To match all the possible strings in your array, we're having to use multiple capture groups. (Unlike in my first pass at array 3 yesterday.) As a result, for each matching line, you end up with an array $m of matches that either live in $m[$i][1] or $m[$i][2]. The programming job after the regex is to
    build the array of results $res by parsing the arrays of matches $m[$i].

    What can be generalized about this?
    Look at the regex pattern. You see that there are two options separated by an alternation (|). For your other arrays, if you need more than two options, just add alternations. This will mean that your matches may also live in $m[$i][3], $m[$i][4] etc.

    You will just have to expand the programmatic building of the results array to account for that.

    Please let me know how this works for you.
    And don't hesitate if you have any questions about this solution.

    Wishing you a fun day.
  28. #15
  29. Turn left at the third duck
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Location
    Nelson, NZ
    Posts
    112
    Rep Power
    93
    Hi FishtankAlpha,

    Great to hear from you.

    I'm actually trying to use this site as a reference: http://www.regular-expressions.info/reference.html.
    Jan's site really is a great place to start.

    How did you learn this stuff?
    If you want to know what I know, in my mind this is the best regex tutorial once you know the basics... but I produced it so I'm biased. Making it was a labor of love over a number of months, and it goes quite deep and quite far in some directions. Still adding to it though, wanting to buff up the PHP section.

    Were you able to isolate and test parts of the code, somehow?
    If you're talking about code, I'm on Windows and I use Xampp, which gives you a quick upload/test cycle. But you probably already know this.

    If you're talking about regex, I use RegexBuddy (here's a permanent link to the latest RB demo ), also by Jan, whose page you read. You really need this tool if you're going to be doing a lot of regex. Forget the freebies, this is worth the forty bucks. But you have to set it up right, otherwise it looks tiny and confusing. In particular, maximize it, close the history window, and read the RB section on the tutorial above. If I know someone is using RB, that makes it easier as I can tell you what settings to use.
    If you're not solid with the syntax, RB can help you write an expression. But even when you know the syntax, you still want to use RB. I just paste the test string in the subject window, write my expression in the top field, and immediately I can see if what I'm doing works, because all the matches and capture groups are highlighted. (And detailed in the bottom window if you choose List All Matches and Update Automatically).

    Why use a comma as a delimiter? I thought / was the standard delimiter.
    Yes, it's the standard delimiter in textbooks. But of the people I know who are proficient with regex, many use their own delimiters. Why? Well, first, to me the forward slash looks ugly, while the comma is unintrusive. Also the slash is a big pain as soon as you have actual slashes in your pattern, as you need to escape it, and you end up with things like "http:\/\/". The main idea is to use a delimiter you won't need to escape. If there had been commas in the pattern, I might have used a tilde ~. Remember, these delimiters are something PHP wraps around the raw regex. When you develop the expression on paper, screen or RB, you don't use a delimiter.

    Code:
    (?:\s*([^;]+);)
    The string may start with :, followed by one or more whitespace characters, then one or more new line characters semicolons.
    Ah, you are right that you were way off somewhere.

    1. The colon : is part of (?:, which is the syntax to indicate a non-capturing group. There are lots of pieces of regex syntax that start with (? It's confusing, see the linked tutorial at the top of the post for a section dealing just with disambiguating (? syntax.

    2. The whitespace is followed by one or more characters that are NOT semi-colons, i.e. [^;]+ then by a semi-colon. That's a standard clean way to match something without relying on dot-star, which can be a bit "dirty".

    Code:
    (?<=;)\s*\b(\w+)\b
    Before the semicolon, there is any number of whie space characters and a word. Then there is an alpha-numeric character plus a word boundary.
    Nope.

    1. Again see the reference about all the bits of (? syntax. The
    Code:
    (?<=;)
    is a lookbehind that means "if the character immediately preceding is a semicolon ;". So this side of the OR alternation captures the very last piece of string (the engine fails to match on the left side of the OR because the last piece of string does not end with a semicolon, but it succeeds on the right because a semicolon precedes the string).

    2. You say there is an alpha-numeric character, that's not right. The word is \w+ The word is captured by the parentheses. And we know it's a word because it's surrounded by the \b boundaries. I did this because you only wanted the first word on the last string ("jester")

    Feel free to ask some more, I'm really happy to help you get this crystal clear with this. Once this clicks, you will be unstoppable.

    Wishing you a fun day.
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo