#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    12
    Rep Power
    0

    Regular expression matching


    if( $numb =~ m/545958\s[<i> (.*) </i> ]<font color='green'></font><br><center><p><span class="Estilo15">****************************************</span></p></center>/g) {
    my $result = $1;

    how to use this on perl .. can't extract $1 help
  2. #2
  3. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,897
    Rep Power
    3886
    Are you sure your regular expression matched your text? Without seeing what the contents of $numb are, it's hard to say what your problem is.

    BTW I've changed the thread title: please try to describe your problem in the title.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,932
    Rep Power
    1225
    That regex will generate this warning.
    Unmatched [ in regex; marked by <-- HERE in m/545958\s[ <-- HERE <i> (.*) </
    You have several syntax issues i.e., failure to escape several key characters.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    830
    Rep Power
    496
    In addition, something like:

    Code:
    [<i> (.*) </i> ]
    does not make sense, since square brackets define character classes. Or, if you want to match an actual '[' in the input, you need to escape it.

    I add that something like:

    Code:
    <i> (.*) </i>
    is usually a very bad idea, because it may match, for example, a longer string, such as "<i> foo</i>bar some other words<i>baz</i>", which could lead to a failure of the overall regex depending on how it is built. BTW, it would also match the string "<i></i>", because the * quantifier is 0 or more of the preceding character.

    If you want to make sure to capture the words between these HTML tags, you should at lerast use something like:

    Code:
    /<i>[^<]+/
    so that the match will stop at the next opening of a tag.

    Finally, using regex to parse HTML is usually not a good idea, there are several good modules to do that much better than regexes. But if you nonetheless insist on using regexes, do it only on very limited HTML strings where you really know very well the structure (certainly not on Web pages), and... try to do it right.

IMN logo majestic logo threadwatch logo seochat tools logo