#1
  1. Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Sep 2007
    Location
    outside Washington DC
    Posts
    2,642
    Rep Power
    3699

    Groups driving me crazy


    I thought I understood groups in regex, but I'm trying to do something that I expect to be trivial, and I'm not getting it.

    Code:
        Pattern valsPattern = Pattern.compile("^((.*)[\\s,]*)*");
        String vals =  "Fee, Fie, Foe";
        Matcher m = valsPattern.matcher(vals);
        if (m.find()) {
            System.out.println(m.groupCount());
            for (int i = 0; i <= m.groupCount(); i++) {
                System.out.printf("%d %s\n", i, m.group(i));
            }
        } else {
            System.out.println(m.toString());
        }
    I just want to pick out the keywords in the input string. And recognize but not care about any , or whitespace

    I've tried tons of changes to the Pattern, starting with
    Pattern.compile("^((\\w*)[\\s,]*)*");

    What am I missing?
    Thanks
  2. #2
  3. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,068
    Rep Power
    9398
    Normally a regular expression will only match one pattern. You'd only get one keyword out of it. You need a loop around Matcher.find().

    Also, your expression won't really do what you want. Try
    Code:
    (^|,)\s*([^,]+)(?=,|$)
    There may be whitespace after each keyword.

    Comments on this post

    • fishtoprecords agrees
  4. #3
  5. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,897
    Rep Power
    3886
    In that pattern, you have two capturing groups, so for this pattern, groupCount() will always return 2. This is despite the fact that the second of your groups may match several different things during the course of the entire match: in this situation, the captured string will be replaced each time. A group refers to a set of parentheses in your pattern: not to an occasion of a set of parentheses matching some text.

    The reason your second group is empty at the end is because you've used .* which matches an empty string on the last time it matches.

    With the specific example you've posted, I'd use the String.split method, but I'm assuming this is just an example. Here's a suggestion: explicitly match the pattern as many times as it can, capturing a new group each time. e.g.
    Code:
        Pattern valsPattern = Pattern.compile("(\\w+)");
        String vals =  "Fee, Fie, Foe";
        Matcher m = valsPattern.matcher(vals);
        while (m.find()) {
            System.out.println("Groups: " + m.groupCount());
            for (int i = 0; i <= m.groupCount(); i++) {
                System.out.printf("%d %s\n", i, m.group(i));
            }
        }

    Comments on this post

    • fishtoprecords agrees
    • Matt1776 agrees : I was also thinking string split - but could be an example :)
  6. #4
  7. Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Sep 2007
    Location
    outside Washington DC
    Posts
    2,642
    Rep Power
    3699
    Thanks

IMN logo majestic logo threadwatch logo seochat tools logo