The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages - More
> Regex Programming
|
Newbie Please Help
Discuss Newbie Please Help in the Regex Programming forum on Dev Shed. Newbie Please Help Regular expressions forum covering PCRE and POSIX techniques, practices, and standards. Regular expressions help shorten coding time by providing the ability to compact many lines of code into one string.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

September 17th, 2012, 10:42 PM
|
|
Contributing User
|
|
Join Date: Jan 2009
Posts: 41
  
Time spent in forums: 7 h 8 m 57 sec
Reputation Power: 9
|
|
|
Newbie Please Help
I was progressing (I thought) pretty well at understanding regex using ruby but something I thought should work is not working
<title>Philadelphia 76ers vs. New York Knicks
Code:
/<title>(\w+\s){1,3}\svs/
In my mind should return: <title>Philadelphia 76ers vs
Obviously it doesn't or I wouldn't be here, I'm guessing that when you use the ranges on the metacharacters there is a trick to grouping them together?
I was running through the tutorial at regular-expressions.info and thought I knew what I was doing but after 30 minutes I don't wanna keep throwing things at it that will frustrate me. I know I'm missing either a trick or something dead easy, regular expressions have always been a bugaboo for me - they just don't click - I thought they were and it's vital for my project that I get it - but I just don't see this.
Any help - even a push in the right direction would be vastly appreciated.
Thank you
|

September 18th, 2012, 04:49 PM
|
 |
CSS & JS/DOM Adept
|
|
Join Date: Jul 2004
Location: USA
|
|
I suspect that it's not working because your code requires two spaces (or other white-space characters) before the "vs".
Try this, which will require one or more white-space characters (technically, one plus zero or more):
Code:
/<title>(\w+\s){1,3}\s*vs/
|

September 18th, 2012, 07:38 PM
|
|
Contributing User
|
|
Join Date: Jan 2009
Posts: 41
  
Time spent in forums: 7 h 8 m 57 sec
Reputation Power: 9
|
|
Quote: | Originally Posted by Kravvitz I suspect that it's not working because your code requires two spaces (or other white-space characters) before the "vs".
Try this, which will require one or more white-space characters (technically, one plus zero or more):
Code:
/<title>(\w+\s){1,3}\s*vs/
|
That didn't work sadly, let's say we drop the vs issue
Code:
/<title>(\w+\s{1,3}/
Still won't yeild the right issue - it yields only 76ers.
One Solution I came up with today at work is
which does yield 76ers vs which I can work with
However this was just the first half of my problem
Here is the full line of what I'm working with
Code:
<title>Philadelphia 76ers vs. New York Knicks - Box Score - January 11, 2012 - ESPN</title>
What I'd LIKE to isolate is Philadelphia 76ers AND New York Knicks
If you know the NBA you know where i'm going with this, the city / names between the vs will vary file after file but I need to extract that information to populate a database
|

September 18th, 2012, 08:39 PM
|
|
Contributing User
|
|
Join Date: Jan 2009
Posts: 41
  
Time spent in forums: 7 h 8 m 57 sec
Reputation Power: 9
|
|
|
Solved?
So after some trial and error and a bit more study I believe I have a solution (that I've tested by adding additional words between <title> and vs
For anyone interested in such a scan - what worked in ruby was this
Code:
/<title>([\w+\s+]*)vs/
The parentheses and the [] combination works while the single parentheses didn't - and ([]) works differently than ([])
I confirmed that this works by adding additional words and the regex works each time.
|

September 18th, 2012, 09:11 PM
|
 |
CSS & JS/DOM Adept
|
|
Join Date: Jul 2004
Location: USA
|
|
Oh, right. The "{1,3}" after the capturing group would only capture the last time it's used.
Congrats on finding a solution yourself.  In case you're interested in an alternative solution...
The "?:" at the beginning of the inner pair of parenthesis makes it just a plain group instead of a capturing group.
Code:
/<title>((?:\w+\s){1,3})\s*vs/
|

September 19th, 2012, 10:05 AM
|
|
Contributing User
|
|
Join Date: Jan 2009
Posts: 41
  
Time spent in forums: 7 h 8 m 57 sec
Reputation Power: 9
|
|
|
I haven't fully completed the regular-expressions.info tutorial yet my understanding of the ?: was that it had to do with back references?
In the end - after some thought, the information I'm going to need is that right before the vs and right before the - (I need only the team name to identify the team abbreviation from my other table for insertion into another table - it's all part of a parsing system I'm building to download nba box scores and shot charts) so I'll have to figure those out, but I was glad I solved the issue just because regex has always been a problem to me - I'm glad I toughed it out. I'll look at that ?: more deeply so I can learn the difference between a plain group and capturing group.
I think what you're saying is that if it's a capturing group it's looking for the same thing over and over that it captured the first time (like philadelphia, repeatedly?), but I thought that only referred to the back references?
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|