Thread: Simple question

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2010
    Posts
    5
    Rep Power
    0

    Simple question


    hi,
    i have these strings:
    STRING:TEXT
    STRING:\:->
    etc

    i want to split them to {STRING,TEXT} and {STRING,:->}

    i tried many options but i was sure this would work:
    [^\\]:

    it removes the G from string and keeps the \.
    i am using Java
    String[] strArr = str.split("[^\\]:");

    BTW i can choose any delimiter i want before the :-> it could be \\ or xx or anything i choose for it to be.

    any solutions?
    Thanks
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2010
    Location
    Norcross, GA
    Posts
    8
    Rep Power
    0
    Try this:

    Code:
    (?<!\\):
    The (?<!...) part is called a zero-width negative lookbehind assertion, which is a fancy term for "check if something is NOT to the left (in this case, a backslash), but don't actually capture it, just look". That's why your G is being removed, because your pattern was actually capturing it.

    Comments on this post

    • ManiacDan agrees
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2010
    Posts
    5
    Rep Power
    0

    thanks, still not working


    Thanks i did try it and it doesn't work
    i tried a few others and they didn't work either
    String[] strArr = str.split("(?<!\\):");

    is the ! and the ^ the same?
    is the syntax correct for java?

    when i try what you sent me it i get (note how it ignores the second \) :
    java.util.regex.PatternSyntaxException: Unclosed group near index 7
    (?<!\):
    ^
    at java.util.regex.Pattern.error(Unknown Source)
    at java.util.regex.Pattern.accept(Unknown Source)
    at java.util.regex.Pattern.group0(Unknown Source)
    at java.util.regex.Pattern.sequence(Unknown Source)
    at java.util.regex.Pattern.expr(Unknown Source)
    at java.util.regex.Pattern.compile(Unknown Source)
    at java.util.regex.Pattern.<init>(Unknown Source)
    at java.util.regex.Pattern.compile(Unknown Source)
    at java.lang.String.split(Unknown Source)
    at java.lang.String.split(Unknown Source)
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2010
    Location
    Norcross, GA
    Posts
    8
    Rep Power
    0
    Hm, my gut tells me that this is more of a string error than a regex error. The error message you pasted shows only a single "\", which would escape the ")", which explains why your compiler thinks that the group is not closed.

    It has been many years since I've touched Java, so I'm afraid I can't offer much advice in that area. Try double escaping the "\" by typing in "\\\\" and see what happens. I know, that looks terribly ugly.

    To answer your other questions, no, "!" and "^" are not really equivalent.

    Comments on this post

    • ishnid agrees : Yup. Double the backslashes for a Java String
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2010
    Posts
    5
    Rep Power
    0

    ok, almost there :->


    i removed the \\ . it was too confusing for now.
    i am using x as a delimeter
    "STRING:x:->"

    what i want is {STRING,:->}
    what i am getting is {STRING,x:->}

    with this code:
    String[] strArr = str.split("(?<!x):");
    how do i "capture" the delimiter as well (in this case the 'x')?
    Thanks again for your help
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2010
    Location
    Norcross, GA
    Posts
    8
    Rep Power
    0
    Well, not capturing the "\" was the only tricky part. If you no longer care about it, then the regex to split on ":x" becomes:

    Code:
    :x
    You don't even need regex anymore.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2010
    Posts
    5
    Rep Power
    0

    ok, correction


    ok, i didn't make myself clear.
    i need for this string
    "STRING:\:->"
    to get
    {STRING,:->}

    and for STRING:\hello
    to get {STRING,\hello}


    so...i need to remove the \: if it exists or : for the rest of the cases that is why i need regex.
    Thanks again for all your help
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2010
    Location
    Norcross, GA
    Posts
    8
    Rep Power
    0
    The original pattern I gave is pretty much the solution for splitting your input into pairs. After that is done, then you could do a string replace on the second value of each pair to replace all "\:" with ":".

    Or, if the input you're trying to parse always starts with "STRING:", then you probably don't even need regex since the part you want to parse away is static. For example, you could just do a string split (not regex split) with "STRING:".
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2010
    Posts
    5
    Rep Power
    0

    thanks


    Ok, thanks lonekorean, i will try and see how to resolve this.
    maybe the best thing is to remove the first char after the split if it is '\'. i thought there would be a cleaner solution...but i guess not
    Thanks for all your help
  18. #10
  19. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,898
    Rep Power
    3887
    Originally Posted by regex
    Ok, thanks lonekorean, i will try and see how to resolve this.
    maybe the best thing is to remove the first char after the split if it is '\'. i thought there would be a cleaner solution...but i guess not
    Thanks for all your help
    Just the first character?

    Generally, this sort of situation arises when using a backslash to 'escape' another character (in this case to differentiate between the colon character that is a delimiter and the one that's actually part of the data). Are you absolutely guaranteed that the only colons in the data that need escaping are going to be at the very start of the segment?

    If not, then you need to unescape all the colon characters after you've done your split. On first instinct, that might suggest that you just remove all backslash characters. However, there might be backslash characters in the data too, which would themselves be escaped (i.e. represented by \\) to differentiate themselves from the backslashes that are designed to escape things. So basically you need to remove any backslash characters that aren't themselves preceded by another backslash.

IMN logo majestic logo threadwatch logo seochat tools logo