#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Posts
    41
    Rep Power
    10

    Newbie Please Help


    I was progressing (I thought) pretty well at understanding regex using ruby but something I thought should work is not working

    <title>Philadelphia 76ers vs. New York Knicks

    Code:
    /<title>(\w+\s){1,3}\svs/
    In my mind should return:
    <title>Philadelphia 76ers vs

    Obviously it doesn't or I wouldn't be here, I'm guessing that when you use the ranges on the metacharacters there is a trick to grouping them together?

    I was running through the tutorial at regular-expressions.info and thought I knew what I was doing but after 30 minutes I don't wanna keep throwing things at it that will frustrate me. I know I'm missing either a trick or something dead easy, regular expressions have always been a bugaboo for me - they just don't click - I thought they were and it's vital for my project that I get it - but I just don't see this.

    Any help - even a push in the right direction would be vastly appreciated.

    Thank you
  2. #2
  3. CSS & JS/DOM Adept
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jul 2004
    Location
    USA (verifiably)
    Posts
    20,128
    Rep Power
    4304
    I suspect that it's not working because your code requires two spaces (or other white-space characters) before the "vs".

    Try this, which will require one or more white-space characters (technically, one plus zero or more):
    Code:
    /<title>(\w+\s){1,3}\s*vs/
    Spreading knowledge, one newbie at a time.

    Check out my blog. | Learn CSS. | PHP includes | X/HTML Validator | CSS validator | Common CSS Mistakes | Common JS Mistakes

    Remember people spend most of their time on other people's sites (so don't violate web design conventions).
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Posts
    41
    Rep Power
    10
    Originally Posted by Kravvitz
    I suspect that it's not working because your code requires two spaces (or other white-space characters) before the "vs".

    Try this, which will require one or more white-space characters (technically, one plus zero or more):
    Code:
    /<title>(\w+\s){1,3}\s*vs/
    That didn't work sadly, let's say we drop the vs issue


    Code:
    /<title>(\w+\s{1,3}/
    Still won't yeild the right issue - it yields only 76ers.

    One Solution I came up with today at work is

    Code:
    /\w+\svs/
    which does yield 76ers vs which I can work with

    However this was just the first half of my problem

    Here is the full line of what I'm working with

    Code:
    <title>Philadelphia 76ers vs. New York Knicks - Box Score - January 11, 2012 - ESPN</title>
    What I'd LIKE to isolate is Philadelphia 76ers AND New York Knicks

    If you know the NBA you know where i'm going with this, the city / names between the vs will vary file after file but I need to extract that information to populate a database
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Posts
    41
    Rep Power
    10

    Solved?


    So after some trial and error and a bit more study I believe I have a solution (that I've tested by adding additional words between <title> and vs

    For anyone interested in such a scan - what worked in ruby was this

    Code:
    /<title>([\w+\s+]*)vs/
    The parentheses and the [] combination works while the single parentheses didn't - and ([]) works differently than ([])

    I confirmed that this works by adding additional words and the regex works each time.
  8. #5
  9. CSS & JS/DOM Adept
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jul 2004
    Location
    USA (verifiably)
    Posts
    20,128
    Rep Power
    4304
    Oh, right. The "{1,3}" after the capturing group would only capture the last time it's used.

    Congrats on finding a solution yourself. In case you're interested in an alternative solution...

    The "?:" at the beginning of the inner pair of parenthesis makes it just a plain group instead of a capturing group.
    Code:
    /<title>((?:\w+\s){1,3})\s*vs/
    Spreading knowledge, one newbie at a time.

    Check out my blog. | Learn CSS. | PHP includes | X/HTML Validator | CSS validator | Common CSS Mistakes | Common JS Mistakes

    Remember people spend most of their time on other people's sites (so don't violate web design conventions).
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Posts
    41
    Rep Power
    10
    I haven't fully completed the regular-expressions.info tutorial yet my understanding of the ?: was that it had to do with back references?

    In the end - after some thought, the information I'm going to need is that right before the vs and right before the - (I need only the team name to identify the team abbreviation from my other table for insertion into another table - it's all part of a parsing system I'm building to download nba box scores and shot charts) so I'll have to figure those out, but I was glad I solved the issue just because regex has always been a problem to me - I'm glad I toughed it out. I'll look at that ?: more deeply so I can learn the difference between a plain group and capturing group.

    I think what you're saying is that if it's a capturing group it's looking for the same thing over and over that it captured the first time (like philadelphia, repeatedly?), but I thought that only referred to the back references?

IMN logo majestic logo threadwatch logo seochat tools logo