#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    4
    Rep Power
    0

    Regex "digit-digit"


    I am trying to create regular expression that will parse digits input like: digit1 - digit2 OR digit1 - digit2,digit3 - digit4,digit5 - digit6,digit7 - digit8......
    With examples:

    ex1. 1-3
    ex2. 1-3,5-7
    ex3. 1-3,5-7,10-15
    ex4. 1-3,7-10,12-15,19-25
    ...

    Please note that last char should not be "," and there should not be two or more "-" like 1-5-10!

    If anyone can help me with JavaScript regex, thanks in advance!
  2. #2
  3. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,743
    Rep Power
    9397
    \d is a digit and (...)* will repeat whatever is inside there as many times as possible, possibly not at all. Also ^ marks the beginning of the string and $ marks the end.

    Hint: the expression will have the number stuff repeated twice.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    4
    Rep Power
    0
    Originally Posted by requinix
    \d is a digit and (...)* will repeat whatever is inside there as many times as possible, possibly not at all. Also ^ marks the beginning of the string and $ marks the end.

    Hint: the expression will have the number stuff repeated twice.
    Thank you. I read those from regex tutorials. But I don't know how to write all regex string!
    Note that with "..." I mean ETC. It is not part from input
  6. #4
  7. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,743
    Rep Power
    9397
    And note that with "..." I mean whatever you want to put in there. It is not part of the regex.

    You should have all the parts you need to construct the expression. Give it a first shot and we'll take it from there.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    It is not clear to me whether you know which digits you want to match or whether you want to match any digit and "-" combination.

    I.e., for example #1 (1-3): do you want to match 1, 2, or 3, or do you want to match any digit range. In the first case, I would do a character class such as [1-3] or [123], in the second case I would do something like \d\-\d.

    Please be more specific.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    4
    Rep Power
    0
    Any digit, something like
    X-Y
    X-Y,K-M

    where X,Y,K,M are any possible digit
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    You could start with something like this:

    Code:
    ((\d-\d,?)|\d,?)+
    which should match (a digit, a dash, a digit and an optional comma) or (a digit and an optional comma), the whole thing at least once or repeated any number of times.

    However, I do not know enough about your data, the above expression may match things that you do not want to match (for example, it will match a single number in your data.

    If you don't want this to happen, and if a single digit (without interval) cannot happen, then it might be better to try this:

    Code:
    (\d-\d,)*(\d-\d)
    which will match (a number, a dash, a number and a comma) 0 or several times, followed by (a digit, a dash and a digit).

    There may be further refinements (for example, can there be spaces between two intervals?), but it all depends on your real data, and we don't know enough about that.
  14. #8
  15. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,910
    Rep Power
    1045
    Hi,

    yeah, your description is pretty confusing. You're talking about single digits all the time, but your example includes numbers with multiple digits.

    So I guess what you actually want is

    Code:
    /^\d+-\d+(?:,\d+-\d+)*$/
    If you also want to rule out leading zeros (like in 01), that would be

    Code:
    /^(?:[1-9]\d*|0)-(?:[1-9]\d*|0)(?:,(?:[1-9]\d*|0)-(?:[1-9]\d*|0))*$/
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    Jacques, the original requireement did not say that the string should contain only these patterns, so I think the string start and end anchors (^and $) should probably be taken away (unless, of course, the OP clarifies this point differently).

    I did not pay attention to the examples with 2-digit numbers. If this can happen, I would change my suggestion to:

    Code:
    (\d+-\d+,)*(\d+-\d+)
    But again, regex is often a delicate balance between matching everything that need to be matched, and not matching what should not be matched. For this, a very precise description of the data is needed, we are very far from that.
  18. #10
  19. Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jan 2004
    Location
    New Springfield, OH
    Posts
    1,173
    Rep Power
    1469
    Is this for parsing page ranges? If so, these work.
    Code:
    ^(\s*\d+\s*\-\s*\d+\s*,?|\s*\d+\s*,?)+$          (allows spaces)
    
    ^(\d+-\d+,?|\d+,?)+$                             (does not allow spaces)
    
    ^(\d+|\d+-\d+)(,?=(\d+|\d+-\d+))*$               (a different approach)
    As mentioned, you could remove the ^ and $ is these patterns should match inside other strings. If you're using this for some form of validation, they should be left in so that a match doesn't allow any extra information.
    Last edited by Nilpo; November 9th, 2012 at 10:28 AM.
    Don't like me? Click it.

    Scripting problems? Windows questions? Ask the Windows Guru!

    Stay up to date with all of my latest content. Follow me on Twitter!

    Help us help you! Post your exact error message with these easy tips!
  20. #11
  21. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,910
    Rep Power
    1045
    Originally Posted by Laurent_R
    Jacques, the original requireement did not say that the string should contain only these patterns, so I think the string start and end anchors (^and $) should probably be taken away (unless, of course, the OP clarifies this point differently).
    A substring match obviously would make no sense in this case. Since he doesn't extract anything, it would be same as simply checking for \d+-\d+. The optional stuff after that makes no difference (unless he's specifically looking for index information or something).

    The whole task only makes sense if he wants to check if a complete string matches this pattern.



    Originally Posted by Laurent_R
    I did not pay attention to the examples with 2-digit numbers. If this can happen, I would change my suggestion to:

    Code:
    (\d+-\d+,)*(\d+-\d+)
    Putting the comma in the first pattern will force the regex parser to backtrack at the last entry. So it's better to make the first part mandatory and put the comma in the last part (which also makes more sense when you read it).

    ------------


    Originally Posted by Nilpo
    Code:
    ^(\s*\d+\s*\-\s*\d+\s*,?|\s*\d+\s*,?)+$          (allows spaces)
    
    ^(\d+-\d+,?|\d+,?)+$                             (does not allow spaces)
    
    ^(\d+|\d+-\d+)(,?=(\d+|\d+-\d+))*$               (a different approach)
    He specifically ruled out lists ending with a comma, so that won't work.
  22. #12
  23. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    Originally Posted by Jacques1
    A substring match obviously would make no sense in this case. Since he doesn't extract anything, it would be same as simply checking for \d+-\d+. The optional stuff after that makes no difference (unless he's specifically looking for index information or something).
    We just don't know whether the OP wants to extract something or not. I used capturing parens because I suspected the aim was to capture the ranges, you used non capturing parens because you suspected something different. That's the point: the description of the requirement is far too vague. Therefore, we can only give some tips, but not figure out a complete solution.

    Originally Posted by Jacques1
    Putting the comma in the first pattern will force the regex parser to backtrack at the last entry. So it's better to make the first part mandatory and put the comma in the last part (which also makes more sense when you read it).
    No, I do not think there is backtracking in my regex: it just gradually matches the string with the first part of the regex, and when the first part fails, it tries the second part. If the second part matches, there is no backtracking; if it fails, yes, it backtracks, but just once. I have just tried it on a string with 5,000 ranges, the result is immediate.

    To tell the true, my regex will successfully match the string even if the last range is followed by the comma (but it will not capture the comma), and yours will also if you have to remove the end of line anchor. My assumption was that having a comma at the end of the matches is not wrong, but that what was required was that it should match the last interval even if there is no comma at the end. Here again, the requirement is vague.

    If this is to be avoided, then I would have to change my regex to prevent the match if there is a trailing comma. I would then add that the last matched interval must be followed by something else than a comma or by the end of the string:

    Code:
    (\d+-\d+,)*(\d+-\d+)[^,]|$
  24. #13
  25. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    4
    Rep Power
    0
    Seems that we have more solutions. There are few solution for my task. Thank you for all your posts and ideas!

IMN logo majestic logo threadwatch logo seochat tools logo