November 7th, 2012, 03:53 PM
I am trying to create regular expression that will parse digits input like: digit1 - digit2 OR digit1 - digit2,digit3 - digit4,digit5 - digit6,digit7 - digit8......
Please note that last char should not be "," and there should not be two or more "-" like 1-5-10!
November 7th, 2012, 04:55 PM
\d is a digit and (...)* will repeat whatever is inside there as many times as possible, possibly not at all. Also ^ marks the beginning of the string and $ marks the end.
Hint: the expression will have the number stuff repeated twice.
November 7th, 2012, 05:03 PM
Thank you. I read those from regex tutorials. But I don't know how to write all regex string!
Originally Posted by requinix
Note that with "..." I mean ETC. It is not part from input
November 7th, 2012, 06:05 PM
And note that with "..." I mean whatever you want to put in there. It is not part of the regex.
You should have all the parts you need to construct the expression. Give it a first shot and we'll take it from there.
November 8th, 2012, 01:09 AM
It is not clear to me whether you know which digits you want to match or whether you want to match any digit and "-" combination.
I.e., for example #1 (1-3): do you want to match 1, 2, or 3, or do you want to match any digit range. In the first case, I would do a character class such as [1-3] or , in the second case I would do something like \d\-\d.
Please be more specific.
November 8th, 2012, 03:50 PM
Any digit, something like
where X,Y,K,M are any possible digit
November 9th, 2012, 03:15 AM
You could start with something like this:
which should match (a digit, a dash, a digit and an optional comma) or (a digit and an optional comma), the whole thing at least once or repeated any number of times.
However, I do not know enough about your data, the above expression may match things that you do not want to match (for example, it will match a single number in your data.
If you don't want this to happen, and if a single digit (without interval) cannot happen, then it might be better to try this:
which will match (a number, a dash, a number and a comma) 0 or several times, followed by (a digit, a dash and a digit).
There may be further refinements (for example, can there be spaces between two intervals?), but it all depends on your real data, and we don't know enough about that.
November 9th, 2012, 05:29 AM
yeah, your description is pretty confusing. You're talking about single digits all the time, but your example includes numbers with multiple digits.
So I guess what you actually want is
If you also want to rule out leading zeros (like in 01), that would be
November 9th, 2012, 09:55 AM
Jacques, the original requireement did not say that the string should contain only these patterns, so I think the string start and end anchors (^and $) should probably be taken away (unless, of course, the OP clarifies this point differently).
I did not pay attention to the examples with 2-digit numbers. If this can happen, I would change my suggestion to:
But again, regex is often a delicate balance between matching everything that need to be matched, and not matching what should not be matched. For this, a very precise description of the data is needed, we are very far from that.
November 9th, 2012, 10:26 AM
Is this for parsing page ranges? If so, these work.
As mentioned, you could remove the ^ and $ is these patterns should match inside other strings. If you're using this for some form of validation, they should be left in so that a match doesn't allow any extra information.
^(\s*\d+\s*\-\s*\d+\s*,?|\s*\d+\s*,?)+$ (allows spaces)
^(\d+-\d+,?|\d+,?)+$ (does not allow spaces)
^(\d+|\d+-\d+)(,?=(\d+|\d+-\d+))*$ (a different approach)
Last edited by Nilpo; November 9th, 2012 at 10:28 AM.
November 9th, 2012, 10:52 AM
A substring match obviously would make no sense in this case. Since he doesn't extract anything, it would be same as simply checking for \d+-\d+. The optional stuff after that makes no difference (unless he's specifically looking for index information or something).
Originally Posted by Laurent_R
The whole task only makes sense if he wants to check if a complete string matches this pattern.
Putting the comma in the first pattern will force the regex parser to backtrack at the last entry. So it's better to make the first part mandatory and put the comma in the last part (which also makes more sense when you read it).
Originally Posted by Laurent_R
He specifically ruled out lists ending with a comma, so that won't work.
Originally Posted by Nilpo
November 9th, 2012, 01:00 PM
We just don't know whether the OP wants to extract something or not. I used capturing parens because I suspected the aim was to capture the ranges, you used non capturing parens because you suspected something different. That's the point: the description of the requirement is far too vague. Therefore, we can only give some tips, but not figure out a complete solution.
Originally Posted by Jacques1
No, I do not think there is backtracking in my regex: it just gradually matches the string with the first part of the regex, and when the first part fails, it tries the second part. If the second part matches, there is no backtracking; if it fails, yes, it backtracks, but just once. I have just tried it on a string with 5,000 ranges, the result is immediate.
Originally Posted by Jacques1
To tell the true, my regex will successfully match the string even if the last range is followed by the comma (but it will not capture the comma), and yours will also if you have to remove the end of line anchor. My assumption was that having a comma at the end of the matches is not wrong, but that what was required was that it should match the last interval even if there is no comma at the end. Here again, the requirement is vague.
If this is to be avoided, then I would have to change my regex to prevent the match if there is a trailing comma. I would then add that the last matched interval must be followed by something else than a comma or by the end of the string:
November 9th, 2012, 03:04 PM
Seems that we have more solutions. There are few solution for my task. Thank you for all your posts and ideas!