#1
  1. Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Sep 2007
    Location
    outside Washington DC
    Posts
    2,642
    Rep Power
    3700

    Grouping has me stuck


    I'm using Java's regex libraries, and I can't get grouping to work. Or at least I can't get it to work they way I want.

    What I want is to match from the begining of the string up to, but not including any number of trailing semicolon characters. I expected that grouping it would let the first group be the characters I want.
    But no.

    Here is a code snipet:
    Code:
    static final Pattern pat = Pattern.compile("^(.*?);*$");
    
    private static final String[] list = {
    "abc;",
    "N:Berger;Gary;;;",
     "EMAIL;type=INTERNET;type=pref:halberman@alum.mit.edu"};
    
    private void bar(String arg) {
        Matcher m = pat.matcher(arg);
        int count = 0;
        while(m.find()) {
            count++;
            System.out.println("Match number "+count);
            System.out.println("start(): "+m.start());
            System.out.println("end(): "+m.end());
            System.out.println(arg.substring(m.start(), m.end()));
            for (int i = 0; i < m.groupCount(); i++) {
                System.out.println(m.group(i));
            }
        }
    }
    Any pointers greatly appreciated.
  2. #2
  3. No Profile Picture
    Hang your freedom higher.
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2005
    Posts
    659
    Rep Power
    158
    hi, there are 2 little problems with what you're doing.

    1. Your regular expression ("^(.*?);*$") is saying match smallest string which ends with zero or more semi-colons.

    It should be "^(.*?);.*$" - match smallest string which end with semi-colon and zero or more of any character.

    2. If/When you actually find a match your group count will be 1 as you only have 1 set of parenthesis, so your loop ...

    Code:
        for (int i = 0; i < m.groupCount(); i++) 
        {
            System.out.println(m.group(i));
        }
    ...will never show m.group(1). You need to change it to ...

    Code:
        for (int i = 0; i <= m.groupCount(); i++) 
        {
            System.out.println(m.group(i));
        }
    Note : group(0) just means the whole string you're testing against.

    Comments on this post

    • fishtoprecords agrees : thanks
    Last edited by atlantisstorm; February 21st, 2009 at 11:08 AM.
    "Badges? We ain't got no badges. We don't need no badges! I don't have to show you any stinkin' badges!!"
  4. #3
  5. Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Sep 2007
    Location
    outside Washington DC
    Posts
    2,642
    Rep Power
    3700
    Thanks, it was the missing <= that threw me.

IMN logo majestic logo threadwatch logo seochat tools logo