#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2013
    Posts
    2
    Rep Power
    0

    Smile Help Deciphering Regex


    I recently came across a piece of unfamiliar code.

    It is using the following regular expression:

    Code:
    Regex.Split(value, @"\n(?=(?:[^""]*""[^""]*"")*(?![^""]*""))");
    It looks to me like it is splitting on the new line character \n but after that I am totally lost.

    Can someone help me decipher this regex?

    Any help would be greatly appreciated.
  2. #2
  3. Jealous Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,302
    Rep Power
    9400
    Removing the syntax,
    Code:
    \n(?=(?:[^"]*"[^"]*")*(?![^"]*"))
    It simplifies to
    Code:
    \n(?=...)
    - The (?=) is a positive assertion and reads something like "and at this point there must follow ... but don't count it towards how far into the string the expression matches".
    - The opposite is (?!) which is a negative assertion: "at this point there must not follow..."
    - (?:) is just a regular set of parentheses with the difference that they don't capture.
    - [^...] is a negated character set: any character not contained in the set.

    So
    Code:
    \n         # newline,
    (?=        # and after it must be
      (?:      #   this uncaptured group consisting of (
        [^"]*  #     any number of non-quotes
        "      #     followed by a quote
        [^"]*  #     followed by any number of non-quotes
        "      #     followed by another quote
      )*       #   ), all repeated zero or more times,
      (?!      #   which is not followed by (
        [^"]*  #     any number of non-quotes
        "      #     and a quote
      )        #   )
    )          # )
    Last edited by requinix; September 3rd, 2013 at 05:45 PM. Reason: fixed as per Laurent
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Location
    Paris area, France
    Posts
    843
    Rep Power
    496
    Code:
    [^"]*"[^"]*"
    Code:
        [^"]  #     not-a-quote
        *"    #     followed by any number of quotes
        [^"]  #     followed by another not-a-quote
        *"    #     followed by any number of quotes
    Hmm, I think this is an error. I would rather say:

    Code:
        [^"]* #     any number of non-quotes
        "     #     followed by a quote
        [^"]* #     followed by  any number of non-quotes
        "     #     followed by a quote
    Same for the last negative forward assertion (?![^"]*"): which is not followed (by any number of non quotes followed by one quote).

    Comments on this post

    • requinix agrees
  6. #4
  7. Jealous Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,302
    Rep Power
    9400
    Wow, I really got that wrong didn't I? Such a silly mistake...

    You are entirely correct.
    Last edited by requinix; September 3rd, 2013 at 05:46 PM.

IMN logo majestic logo threadwatch logo seochat tools logo