#1
  1. Confused badger
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Mar 2009
    Location
    West Yorkshire
    Posts
    1,161
    Rep Power
    492

    REGEX to find specific HTML


    Hello all
    Have a small problem which I've been trying to figure out all day so far to no avail.

    I have a string of HTML which I need to pick out all the A tags (and replace with some text but that's something for later on!).

    I have tried

    PHP Code:
    <[A-Za-z0-9_\-='":/\.].*></a> 
    but it selects everything up to the LAST </a> instead of the "next" one (I hope that makes sense!).

    Please, can someone help me find the right regex to use?
    Thanks a million in advance!!
    Last edited by badger_fruit; March 7th, 2013 at 08:05 AM.
    "For if leisure and security were enjoyed by all alike, the great mass of human beings who are normally stupefied by poverty would become literate and would learn to think for themselves; and when once they had done this, they would sooner or later realise that the privileged minority had no function and they would sweep it away"
    - George Orwell, 1984
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    22
    Rep Power
    0
    I'm assuming you have multi-line/"dot matches new line" turned on...

    Looks to be a problem with greedy vs. lazy.

    And I'm not sure what's the point of
    Code:
    [A-Za-z0-9_\-='":/\.].*
    because apart from excluding a few special chars you wouldn't expect to be in an <a> tag anyway it's basically just
    Code:
    ..*
    try
    Code:
    <a[^>]*?></a>
    Of course that would only match an <a> tag with nothing between the <a> and </a> so the following might be what you are shooting for
    Code:
    <a ([^>]*?)>([^<]*?)</a>
    Back reference \1 would collect all the attributes of the tag and \2 would have the display text.
  4. #3
  5. Confused badger
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Mar 2009
    Location
    West Yorkshire
    Posts
    1,161
    Rep Power
    492
    Originally Posted by acray
    Back reference \1 would collect all the attributes of the tag and \2 would have the display text.
    Do what now?
    Okay,sorry, I honestly have no idea about regex, all I know is that the code ...


    PHP Code:
    $text "<p style=\"text-align: center;\">
        <span style=\"line-height: 1.538em;\">
            <img src=\"http://www.example.com/pics/ln/20130210/100213_2013_grammy_arrivals_9/katy-perry-55th-annual-grammy-awards_3495470.jpg\" alt=\"Katy Perry, Grammys Dress 2013\" title=\"Katy Perry, Grammys Dress 2013\" width=\"500\" height=\"749\" style=\"vertical-align: top; display: block; margin-left: auto; margin-right: auto;\" />
            <a href=\"http://www.example.com/pictures/katy_perry/1-1\">
                <span style=\"font-size: x-small;\">Katy Perry's Grammys Dress Caused Quite A Stir</span>
            </a>
        </span>
    </p>
    <p>It's safe to say Katy Perry stole the entire show at the Grammy awards on Sunday evening (February 10, 2013) in a mint green Gucci dress that featured a rather revealing keyhole cut-out. It quickly caused excitement on Twitter, with fans of the star lauding her daring choice of dress. It wasn't just her followers who noticed either, with various amusing photographs of celebrities in awe of Perry's, ugh, assets, circulating online today.</p><p>Elton John, of all people, was caught out cheekily eyeing up Perry's dress, while television star Ellen DeGeneres made a joke of the elephant in the room, staring intently at Perry's chest as her girlfriend Portia de Rossi looked on. \"I was inspired by Priscilla Presley in the Seventies... Married to Elvis Presley, of course,\" Perry told Ryan Seacrest of her Grammys dress. Seacrest himself joked that he had luckily had plenty of practice at staying focused when at eye level. One man who did manage to keep his eyes on the ball was Perry's boyfriend John Mayer, who was photographed staring straight into Katy's eyes when the pair were snapped taking their seats. The 35-year-old blues guitarist admitted that he's been thinking about marrying Perry, 28, sometime in the future. When asked whether a wedding would be a possibility, Mayer said, \"Of course. I mean, I'm still the kid from Connecticut. That's what you do,\" according to the Daily Mail.</p>
    <p>
        <img src=\"http://www.example.com/pics/mn/20130210/100213_2013_grammy_arrivals_9/katy-perry-55th-annual-grammy-awards_3495445.jpg\" alt=\"Katy Perry, Grammys Dress 2013\" title=\"Katy Perry, Grammys Dress 2013\" width=\"300\" height=\"560\" style=\"vertical-align: top; margin-left: 12px; margin-right: 12px;\" />
        <img src=\"http://www.example.com/pics/mn/20130210/100213_2013_grammy_arrivals_9/katy-perry-55th-annual-grammy-awards_3495475.jpg\" alt=\"Katy Perry, Grammys Dress, 2013\" title=\"Katy Perry, Grammys Dress, 2013\" width=\"297\" height=\"560\" style=\"vertical-align: top;\" /></p><p style=\"text-align: center;\">
        <a href=\"http://www.example.com/pictures/katy_perry/1-1\">
            <span style=\"font-size: x-small;\">Katy Perry Ignored The Grammys' Dress Code Memo,&nbsp;Though She Looked Smoldering In Her Gucci Dress</span>
        </a>
    </p>
    "
    ;

    preg_match_all('/<a [^<>]+>(.*?)/i'$text,  $matches_a);
    echo 
    "A: ";
    print_r($matches_a);

    preg_match_all('/<a ([^>]*?)>([^<]*?)</a>/i'$text$matches_b);
    echo 
    "B: " ;
    print_r($matches_b); 
    Gives me ...

    A: Array
    (
    [0] => Array
    (
    [0] => <a href="http://www.example.com/pictures/katy_perry/1-1">
    [1] => <a href="http://www.example.com/pictures/katy_perry/1-1">
    )

    [1] => Array
    (
    [0] =>
    [1] =>
    )

    )
    B:
    I think I expected to see something like this :-
    EDIT: What I mean is 'What I WANT to see is this ... '

    A: Array
    (
    [0] => Array
    (
    [0] => <a href="http://www.example.com/pictures/katy_perry/1-1">
    [1] => <a href="http://www.example.com/pictures/katy_perry/1-1">
    )

    [1] => Array
    (
    [0] => <span style=\"font-size: x-small;\">Katy Perry's Grammys Dress Caused Quite A Stir</span>
    [1] => <span style=\"font-size: x-small;\">Katy Perry Ignored The Grammys' Dress Code Memo,&nbsp;Though She Looked Smoldering In Her Gucci Dress</span>
    )

    )
    Last edited by badger_fruit; March 7th, 2013 at 01:33 PM. Reason: clarification on requirements
    "For if leisure and security were enjoyed by all alike, the great mass of human beings who are normally stupefied by poverty would become literate and would learn to think for themselves; and when once they had done this, they would sooner or later realise that the privileged minority had no function and they would sweep it away"
    - George Orwell, 1984
  6. #4
  7. Confused badger
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Mar 2009
    Location
    West Yorkshire
    Posts
    1,161
    Rep Power
    492
    Well, after a full day of searching, testing and banging my head off of walls/floors/tables/chairs/cats/dogs, I *think* I have found a working solution ...

    PHP Code:
    preg_match_all('/<a\s[^>]*href=\"([^\"]*)\"[^>]*>(.*)<\/a>/siU'$text,  $matches_aPREG_PATTERN_ORDER); 
    Gives me ...

    A: Array
    (
    [0] => Array
    (
    [0] => <a href="http://www.example.com/pictures/katy_perry/1-1"><span style="font-size: x-small;">Katy Perry's Grammys Dress Caused Quite A Stir</span></a>
    [1] => <a href="http://www.example.com/pictures/katy_perry/1-1"><span style="font-size: x-small;">Katy Perry Ignored The Grammys' Dress Code Memo,&nbsp;Though She Looked Smoldering In Her Gucci Dress</span></a>
    )

    [1] => Array
    (
    [0] => http://www.example.com/pictures/katy_perry/1-1
    [1] => http://www.example.com/pictures/katy_perry/1-1
    )

    [2] => Array
    (
    [0] => <span style="font-size: x-small;">Katy Perry's Grammys Dress Caused Quite A Stir</span>
    [1] => <span style="font-size: x-small;">Katy Perry Ignored The Grammys' Dress Code Memo,&nbsp;Though She Looked Smoldering In Her Gucci Dress</span>
    )

    )
    Wooop woop
    "For if leisure and security were enjoyed by all alike, the great mass of human beings who are normally stupefied by poverty would become literate and would learn to think for themselves; and when once they had done this, they would sooner or later realise that the privileged minority had no function and they would sweep it away"
    - George Orwell, 1984
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    22
    Rep Power
    0
    Trying to read a regexp still makes my eyes bleed--not to mention all the extra control stuff when doing it in something like PHP...

    So since you think you got it working I only glanced what you came up with.

    One thing that jumped out at me was the lazy vs greedy thing I mentioned earlier. Unless you used something to change the default operation of * from greedy to lazy, you could run into problems with different data sources.

    But take that with a grain of salt, doing anything significant with regexp involves lots of banged heads for me too. If it seems to be working...

    In any case, I hope I was able to provide some direction.

IMN logo majestic logo threadwatch logo seochat tools logo