#1
  1. Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Sep 2007
    Location
    outside Washington DC
    Posts
    2,642
    Rep Power
    3700

    Using regex to parse arguments


    I'm working on parsing a string from an RFC, and I can't get my regex to work. So I've written a small Java program to test. I don't understand the results, so I can't figure out what I'm doing wrong.

    The applicable section deals with a "type=" string.

    The regex that I'm using is:
    Code:
    type=(HOME|WORK|PREF|MSG|CELL)(,(HOME|WORK|PREF|MSG|CELL))*(;type=(HOME|WORK|PREF|MSG|CELL)(,(HOME|WORK|PREF|MSG|CELL))*)*
    The specs are that there can be either a series of type=X separated by semicolons,
    type=X;type=Y;type=Z
    or you can have a series of arguments,
    type=X,Y,Z
    where the X values are keywords

    Code:
    private static final String teltypesarg = "HOME|WORK|PREF|MSG|CELL";
    private static final String teltypeseq = "type=("+teltypesarg + ")(,(" + teltypesarg +"))*";
    private static final String teltypefull = teltypeseq + "(;"+teltypeseq + ")*";
    static final Pattern teltypesPat = Pattern.compile(teltypefull, Pattern.CASE_INSENSITIVE);
        String[] tests = {
            "type=CELL,pref:(301) 996-1054",
            "type=INTERNET;type=WORK;type=pref:jiabr@comcast.net",
            "type=CELL,pref,msg:(703) 304-8914",
        };
        System.out.println(teltypefull);
        for (String s : tests) {
            System.out.println(s);
            Matcher m = teltypesPat.matcher(s);
            if ( m.find()) {
                for ( int j =1; j <= m.groupCount(); j++)
                    System.out.println("gc: " + j + " = " + m.group(j) );
            }
        }
    It seems to work fine for the "type=X;type=Y" model
    The output doesn't do a proper greedy match with the series of keywords separated by commas. such as

    Code:
    type=CELL,pref,msg:(703) 304-8914
    gc: 1 = CELL
    gc: 2 = ,msg
    gc: 3 = msg
    gc: 4 = null
    gc: 5 = null
    gc: 6 = null
    gc: 7 = null
    Thanks
    pat
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2007
    Posts
    765
    Rep Power
    929
    If I'm not mistaken, Java uses a PCRE. One of the limitations of that is the regex
    Code:
    ( pattern )*
    only captures the last time it matches.

    You could try this instead for a comma separated list
    Code:
    ( pattern (?: , pattern )* )
    Though if I were doing it, I would use String.split() first on the semi-colon then on the equals then on the comma.

    Comments on this post

    • fishtoprecords agrees : thanks, I'll try that
    sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);

IMN logo majestic logo threadwatch logo seochat tools logo