Software Design
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming Languages - MoreSoftware Design

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old August 10th, 2003, 05:06 PM
balance balance is offline
.
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2002
Posts: 296 balance User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
regular expression pattern not always matching quite right

this regex match pattern works fine for what i want, apart from in some situations:

Code:
([0-9]+) +([0-9]+) +obj.*/Length +([0-9]+)( +([0-9]) +(R))?



here's some example data it works fine for:

Code:
19 0 obj
<< /S 36 /Filter /FlateDecode /Length 20 0 R >>


from the bracketed parts from the pattern from the above data i get 19 0 20 0 R. that's good. that's what i want.

here's another example of data that also works fine:

Code:
15 0 obj
<< /Length 1848 /Filter [ /FlateDecode ] >> 


from that i get 15 0 1848 which is again good.

here's a bit of data that my pattern goes wrong with:

Code:
810 0 obj
<< /Mask [ 3 3 ] /Type /XObject /Subtype /Image /Width 16 /Height 16 
/BitsPerComponent 8 /ColorSpace 820 0 R /Filter /FlateDecode /Length 808 0 R 
/ID 809 0 R >> 


the reason it's wrong for me, is because it matches 820 0 R that follows on from /ColorSpace rather than 808 0 R that follows on from /Length. i don't know how to change my pattern to make sure it gets only the info from after /Length.

there's basically 2 situations that can occur as the first 2 bits of data show. a number, a number and an R after /Length, or a number after /Length. that's the info i want to extract (as well as the very first two numbers, but that bit's working fine). so there's only a problem when there's a number, number, R sequence that comes before and doesn't follow /Length. it matches it, even though it doesn't follow /Length.

how can i change my pattern to make sure i only get informatin that follows /Length's?

btw this i'm using this from a cocoa (os x) wrapper that's made up of pcre 4.0 regex

thanks.

Last edited by balance : August 10th, 2003 at 05:15 PM.

Reply With Quote
  #2  
Old August 10th, 2003, 05:20 PM
icrf's Avatar
icrf icrf is offline
Perl Monkey
Dev Shed Intermediate (1500 - 1999 posts)
 
Join Date: May 2003
Location: the far end of town where the Grickle-grass grows
Posts: 1,856 icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level)icrf User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 4 Days 10 h 28 m 31 sec
Reputation Power: 103
Send a message via AIM to icrf
Sounds like a bug. /Length is in the pattern, so it should require that in order to match at all, then anchoring the last three groups after that point in the string.

The only thing that comes to mind is the / before Length in the pattern. I don't know what characters need escaping or how exactly some of the nuances work with pcre. I use Perl a lot and am familliar with its regex syntax, but don't know how the interface is used with pcre's various ports.

Reply With Quote
  #3  
Old August 10th, 2003, 05:42 PM
balance balance is offline
.
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2002
Posts: 296 balance User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
Quote:
Sounds like a bug. /Length is in the pattern, so it should require that in order to match at all, then anchoring the last three groups after that point in the string.


you could be right but because i'm not too good with regex i thought it was more likely me. thing is the ( +([0-9]) +(R))? part is optional so seems to be more detached in a way, if you see what i mean, but saying that it does follow on from after /Length. that's what i expected - that part to only kick into action once a /Length occurs

i'll email the author of the implementation of the wrapper i'm using.

Quote:
The only thing that comes to mind is the / before Length in the pattern.

i just tried. no difference. same problem.

Quote:
I use Perl a lot and am familliar with its regex syntax, but don't know how the interface is used with pcre's various ports.


pcre: perl compatible regular expression. there's different implementations in perl maybe?

well thanks for the promt response - i'll email as i said. see what happens. thanks a lot.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreSoftware Design > regular expression pattern not always matching quite right


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway
Stay green...Green IT