August 10th, 2003, 04:06 PM
regular expression pattern not always matching quite right
this regex match pattern works fine for what i want, apart from in some situations:
([0-9]+) +([0-9]+) +obj.*/Length +([0-9]+)( +([0-9]) +(R))?
here's some example data it works fine for:
from the bracketed parts from the pattern from the above data i get 19 0 20 0 R. that's good. that's what i want.
19 0 obj
<< /S 36 /Filter /FlateDecode /Length 20 0 R >>
here's another example of data that also works fine:
from that i get 15 0 1848 which is again good.
15 0 obj
<< /Length 1848 /Filter [ /FlateDecode ] >>
here's a bit of data that my pattern goes wrong with:
the reason it's wrong for me, is because it matches 820 0 R that follows on from /ColorSpace rather than 808 0 R that follows on from /Length. i don't know how to change my pattern to make sure it gets only the info from after /Length.
810 0 obj
<< /Mask [ 3 3 ] /Type /XObject /Subtype /Image /Width 16 /Height 16
/BitsPerComponent 8 /ColorSpace 820 0 R /Filter /FlateDecode /Length 808 0 R
/ID 809 0 R >>
there's basically 2 situations that can occur as the first 2 bits of data show. a number, a number and an R after /Length, or a number after /Length. that's the info i want to extract (as well as the very first two numbers, but that bit's working fine). so there's only a problem when there's a number, number, R sequence that comes before and doesn't follow /Length. it matches it, even though it doesn't follow /Length.
how can i change my pattern to make sure i only get informatin that follows /Length's?
btw this i'm using this from a cocoa (os x) wrapper that's made up of pcre 4.0 regex
Last edited by balance; August 10th, 2003 at 04:15 PM.
August 10th, 2003, 04:20 PM
Sounds like a bug. /Length is in the pattern, so it should require that in order to match at all, then anchoring the last three groups after that point in the string.
The only thing that comes to mind is the / before Length in the pattern. I don't know what characters need escaping or how exactly some of the nuances work with pcre. I use Perl a lot and am familliar with its regex syntax, but don't know how the interface is used with pcre's various ports.
August 10th, 2003, 04:42 PM
you could be right but because i'm not too good with regex i thought it was more likely me. thing is the ( +([0-9]) +(R))? part is optional so seems to be more detached in a way, if you see what i mean, but saying that it does follow on from after /Length. that's what i expected - that part to only kick into action once a /Length occurs
i'll email the author of the implementation of the wrapper i'm using.
i just tried. no difference. same problem.
pcre: perl compatible regular expression. there's different implementations in perl maybe?
well thanks for the promt response - i'll email as i said. see what happens. thanks a lot.