#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2005
    Location
    Vancouver, WA, USA
    Posts
    396
    Rep Power
    189

    Replace a small piece of a larger match


    Using PHP's regex implementation (preeg_match(), etc)

    I would like to match a larger string, and replace just a small portion of it.

    For example, I have a multi-line string I want to match multiple times from an HTML document:

    Code:
    <span class="onDate">
    <a rel="bookmark" title="4:11 am" href="/excuse-dust/">
    <span class="bl_sep">|</span>
    </span>
    And in each occurrence, I wish to remove:

    Code:
    <span class="bl_sep">|</span>
    But I need to match the larger match first, because the piece I need to remove will likely occur elsewhere as well.
    Thomas Tremain
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2005
    Location
    Vancouver, WA, USA
    Posts
    396
    Rep Power
    189
    I think I've come up with my best solution... Make an array of all matches of the larger string. Then copy that array into a second array that contains the changes.

    Then go back and replace the matches, with the modified strings, in the original document.
    Thomas Tremain
  4. #3
  5. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,904
    Rep Power
    1045
    Do not try to parse HTML with a regex.

    Regular expressions are great for parsing simple expressions like a date or a telephone number. But they are not even remotely powerful enough to parse a complex language like HTML. What you want is an HTML parser.
  6. #4
  7. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,296
    Rep Power
    7170
    With preg_replace you can use parts of the matched string in the replacement. The special tokens $1, $2, $3, etc. can be used in the replacement string and refer to the first, second, third, etc. subgroups in the matched pattern.

    Comments on this post

    • Jacques1 disagrees : How about encouring people to use the *right* tool rather than "helping" them with the wrong tool?
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2005
    Location
    Vancouver, WA, USA
    Posts
    396
    Rep Power
    189
    Originally Posted by E-Oreo
    With preg_replace you can use parts of the matched string in the replacement. The special tokens $1, $2, $3, etc. can be used in the replacement string and refer to the first, second, third, etc. subgroups in the matched pattern.
    That's what I was originally looking for, but could not find any such syntax documented.
    Thomas Tremain
  10. #6
  11. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,296
    Rep Power
    7170
    It's documented on the preg_replace manual page.

    replacement may contain references of the form \\n or (since PHP 4.0.4) $n, with the latter form being the preferred one. Every such reference will be replaced by the text captured by the n'th parenthesized pattern. n can be from 0 to 99, and \\0 or $0 refers to the text matched by the whole pattern. Opening parentheses are counted from left to right (starting from 1) to obtain the number of the capturing subpattern. To use backslash in replacement, it must be doubled ("\\\\" PHP string).

    When working with a replacement pattern where a backreference is immediately followed by another number (i.e.: placing a literal number immediately after a matched pattern), you cannot use the familiar \\1 notation for your backreference. \\11, for example, would confuse preg_replace() since it does not know whether you want the \\1 backreference followed by a literal 1, or the \\11 backreference followed by nothing. In this case the solution is to use \${1}1. This creates an isolated $1 backreference, leaving the 1 as a literal.
    It's useful to know, although as Jacques1 pointed out, regular expressions are pretty limited when it comes to parsing HTML. If you're just trying to pull some small things out of the page as a whole then they work OK, but if your patterns are at all related to the structure of the HTML document, then they'll start failing you.
    Last edited by E-Oreo; June 6th, 2013 at 10:40 AM.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around

IMN logo majestic logo threadwatch logo seochat tools logo