#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    2
    Rep Power
    0

    Unhappy Parsing a string with groups defined by {}


    Hi all !

    I really need your help. I'm working on a regex since 2 days
    I need to extract the values from a string like "property1 {value1} property2 {value2} ...". I don't know how property's name and value are separed, neither how the differents properties are separated, neither how many properties does the string contain (and I don't care ).
    I just need to extract {value1}, {value2},..., {value n}.

    Following, an example of a string :
    "name:{andlio}, descriprion:{I'm a {young} man}, age:{28}"

    The extracted properties values should be :
    • andlio
    • I'm a {young} man
    • 28


    Here is my regex, but it doesn't work :
    Code:
    ([^{]*\{)(.*)(\}[^}]*)
    Please help
  2. #2
  3. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Hi,

    nested expressions cannot be parsed with regexes -- unless you're using the pseudo-regexes of Perl, which support recursion.

    Contrary to popular belief, regexes are no all-powerful parsing tool. They are in fact the most primitive grammar. They work well for simple expressions like dates or telephone numbers, but they completely fail at more complex stuff like this.

    You either need to use a "real" parser with a more powerful grammar (this is probably overkill), or you need to do some handiwork in your application: Search for the next opening brace in the string and set a counter to -1. In the following substring, search for braces and increment the counter for each closing brace and decrement it for each opening brace. When it's 0, you've found the final closing brace. Repeat the procedure for the following substring.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  4. #3
  5. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,294
    Rep Power
    9400
    Perl/PCRE does recursion (a few ways) while .NET does balancing. Java doesn't do either.

    A workaround is a loop of removing all {...}s such that the ... doesn't contain {s or }s (guaranteeing that you only replace the innermost set) until no more replacements can be made, then checking if there are any {s or }s remaining.

    But that's a bad solution. Instead use a simple string scanner that counts characters left to right: each { is +1 and each } is -1. The rules are (1) the count never goes below 0 (otherwise there's an unbalanced closing brace) and (2) the final count at the end of the string is 0 (otherwise there's an unbalanced opening brace).
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    2
    Rep Power
    0
    Hi all !

    First of all, many thanks for your answers.
    I tried to develop my own logic, but it's very complicated I think.

    I found a regular expression on the web : @[^{]+{(?:[^{}]|{[^{}]*})*} (blog[dot]stevenlevithan[dot]com/archives/regex-recursion)

    I rewrote it and I obtained : \{(?:[^{}]|\{[^{}]*})*}
    I tested it on a website (I can't post the link sorry) and It seems working. Howerver, it doesn't work if a nested braket is not closed.
  8. #5
  9. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,294
    Rep Power
    9400
    I've got bad news: I, and I don't think Jacques either, aren't too keen on helping you towards a solution that we've bold told you isn't a good idea.

    I mean, is there any particular reason you really want to use a regular expression? It's not like a string parser is any degree difficult: what I described is very simple and fast.

IMN logo majestic logo threadwatch logo seochat tools logo