#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Posts
    4
    Rep Power
    0

    Getting too many matches with my regex?


    I'm fairly new with PHP and regex's. I am attempting to match the pattern: word-word 1 or more times in a file. my code:
    Code:
     <?php
    
    $fIn = fopen( 'matchtext.dat', 'r' );
    if( ! $fIn )
         die( "Couldn't open the source file for reading<br/>");
    
    while( $line = fgets( $fIn ) )
         if( preg_match( '|\b([a-zA-Z]+-+)+[a-zA-Z]+|', $line, $matches ) ) {
                echo "<pre>"; print_r( $matches ); echo "</pre>";
         }
    
    ?>
    is returning an additional array entry which is the next-to-the-last word in the line with a hyphen :
    Array
    (
    [0] => learn-english-today
    [1] => english-
    )
    Array
    (
    [0] => that-than-which-nothing-greater-can-be-though
    [1] => be-
    )
    Array
    (
    [0] => that-than-which-nothing-greater-can-be-thought
    [1] => be-
    )

    Where did I go wrong? I only want the first!
  2. #2
  3. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,317
    Rep Power
    7170
    ... $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
    source

    ([a-zA-Z]+-+) is a parenthesized subpattern. However, the parenthesis are an essential part of your regex so you cannot remove them. If you only want the first element in the array then only use the first and just ignore the second.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Posts
    4
    Rep Power
    0
    Originally Posted by E-Oreo
    If you only want the first element in the array then only use the first and just ignore the second.
    Thanks:
    Do you mean the "[a-zA-Z]+|"? If so, that won't work because then it will find trailing hyphens (that-than-which-nothing-greater-can-be-)

    the second part in this case needs to find a final word after the last hyphen.
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Posts
    29
    Rep Power
    0
    Hi, jusserfinn,

    the subexpression [a-zA-Z]+-+ is captured several times. For example, in "learn-english-today", it will capture "learn-", then "english-". Only the last captured string (english-) will be saved.

    If you want to match the first word, use

    Code:
    <?php
    
    if( preg_match( '|\b([a-zA-Z]+)(?:-[a-zA-Z]+)+|', 'learn-english-today', $matches ) ) {
                echo "<pre>"; print_r( $matches ); echo "</pre>";
         }
    
    ?>
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Posts
    4
    Rep Power
    0
    Ok, I know there is something to be said about looking up your question in the manual (which I did), but loads more should be said about checking it twice!

    PHP.net says:

    Code:
    int preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] )
    "If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on."

    So simple, I must have skimmed over it.

    I fixed my code by adding [0] to the $matches in the echo statement:

    Code:
     <?php
    
    $fIn = fopen( 'matchtext.dat', 'r' );
    if( ! $fIn )
         die( "Couldn't open the source file for reading<br/>");
    
    while( $line = fgets( $fIn ) )
           if( preg_match( '|\b([a-zA-Z]+-+)+[a-zA-Z]+|', $line, $matches ) ) {
                    echo "<pre>"; print_r( $matches[0] ); echo "</pre>";
         }
    ?>
    Thanks for trying though!
  10. #6
  11. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,317
    Rep Power
    7170
    So simple, I must have skimmed over it.
    Twice actually. Since I quoted that exact line and linked you to the manual page in my first post.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Posts
    4
    Rep Power
    0
    Originally Posted by E-Oreo
    Twice actually. Since I quoted that exact line and linked you to the manual page in my first post.
    lol, Didn't even look at the quoted part, thought you were quoting me. I apologize.

IMN logo majestic logo threadwatch logo seochat tools logo