#1
  1. No Profile Picture
    .
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Posts
    296
    Rep Power
    12

    regular expression pattern not always matching quite right


    this regex match pattern works fine for what i want, apart from in some situations:

    Code:
    ([0-9]+) +([0-9]+) +obj.*/Length +([0-9]+)( +([0-9]) +(R))?

    here's some example data it works fine for:

    Code:
    19 0 obj
    << /S 36 /Filter /FlateDecode /Length 20 0 R >>
    from the bracketed parts from the pattern from the above data i get 19 0 20 0 R. that's good. that's what i want.

    here's another example of data that also works fine:

    Code:
    15 0 obj
    << /Length 1848 /Filter [ /FlateDecode ] >>
    from that i get 15 0 1848 which is again good.

    here's a bit of data that my pattern goes wrong with:

    Code:
    810 0 obj
    << /Mask [ 3 3 ] /Type /XObject /Subtype /Image /Width 16 /Height 16 
    /BitsPerComponent 8 /ColorSpace 820 0 R /Filter /FlateDecode /Length 808 0 R 
    /ID 809 0 R >>
    the reason it's wrong for me, is because it matches 820 0 R that follows on from /ColorSpace rather than 808 0 R that follows on from /Length. i don't know how to change my pattern to make sure it gets only the info from after /Length.

    there's basically 2 situations that can occur as the first 2 bits of data show. a number, a number and an R after /Length, or a number after /Length. that's the info i want to extract (as well as the very first two numbers, but that bit's working fine). so there's only a problem when there's a number, number, R sequence that comes before and doesn't follow /Length. it matches it, even though it doesn't follow /Length.

    how can i change my pattern to make sure i only get informatin that follows /Length's?

    btw this i'm using this from a cocoa (os x) wrapper that's made up of pcre 4.0 regex

    thanks.
    Last edited by balance; August 10th, 2003 at 05:15 PM.
  2. #2
  3. Perl Monkey
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    May 2003
    Location
    the far end of town where the Grickle-grass grows
    Posts
    1,860
    Rep Power
    109
    Sounds like a bug. /Length is in the pattern, so it should require that in order to match at all, then anchoring the last three groups after that point in the string.

    The only thing that comes to mind is the / before Length in the pattern. I don't know what characters need escaping or how exactly some of the nuances work with pcre. I use Perl a lot and am familliar with its regex syntax, but don't know how the interface is used with pcre's various ports.
  4. #3
  5. No Profile Picture
    .
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Posts
    296
    Rep Power
    12
    Sounds like a bug. /Length is in the pattern, so it should require that in order to match at all, then anchoring the last three groups after that point in the string.
    you could be right but because i'm not too good with regex i thought it was more likely me. thing is the ( +([0-9]) +(R))? part is optional so seems to be more detached in a way, if you see what i mean, but saying that it does follow on from after /Length. that's what i expected - that part to only kick into action once a /Length occurs

    i'll email the author of the implementation of the wrapper i'm using.

    The only thing that comes to mind is the / before Length in the pattern.
    i just tried. no difference. same problem.

    I use Perl a lot and am familliar with its regex syntax, but don't know how the interface is used with pcre's various ports.
    pcre: perl compatible regular expression. there's different implementations in perl maybe?

    well thanks for the promt response - i'll email as i said. see what happens. thanks a lot.

IMN logo majestic logo threadwatch logo seochat tools logo