#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    3
    Rep Power
    0

    Need a Regex Guru


    i'm working with an xsd that cannot change that has a regex restriction value that determines the validation format. i am taking this xsd and automagically generating a wrapper class which then is generically parsed via reflection.

    what i'd like to know, is it possible to parse an example of the following regex strings (these could be anything since i won't know what string i'll encounter):

    [0-9]{4}-((0[1-9])|(1[012]))-((0[1-9])|([12][0-9])|(3[01]))T(([01][0-9])|(2[0-3])):[0-5][0-9](:[0-5][0-9])?(Z|([\+\-](([01][0-9])|(2[0-3]))(:[0-5][0-9])?))?

    -OR-

    (([01][0-9])|(2[0-3])):[0-5][0-9](:[0-5][0-9])?(Z|([\+\-](([01][0-9])|(2[0-3]))(:[0-5][0-9])?))?

    -OR-

    $|([A-Z][A-Z]?)

    to obtain it's constiuent parts to create a format string thusly: "yyyy-mm-ddThh:mm:ss" ?

    or better yet, dynamically create a new replacement regex based on the validation regex?

    i know i'm probably talking about a lexer/parser/tokenizer etc, but just looking at this it *should* be possible, right?

    please note, the point is to determine the format strictly from the regex validation string and it cannot change. i'd liek to dynamically parse it to create a replace regex string or generate a format string to use with a DateTime object etc..

    i appreciate your help in advance..
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    830
    Rep Power
    496
    I understand what you are talking about, but not really what you want.

    Code:
    [0-9]{4}-((0[1-9])|(1[012]))-((0[1-9])|([12][0-9])|(3[01]))
    is clearly a right way to match yyyy-mm-dd and check that the date is within some valid constraints, but do you need to be so strict?

    Wouldn't something like this:

    Code:
    \d{4}(-\d\d){2}
    be sufficient to match a validly formatted expression looking as a date?

    (I did not say a valid date, but your quite complicated expression will also fail to see that 2012-02-29 or even 2012-02-31 is not a valid date. So, the bottom line, is: do you want to validate a date format, in which case my much simpler expression might be sufficient, or do you want to validate a date, in which case a regular expression is probably not what you are looking for).
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    3
    Rep Power
    0
    Originally Posted by Laurent_R
    I understand what you are talking about, but not really what you want.

    Code:
    [0-9]{4}-((0[1-9])|(1[012]))-((0[1-9])|([12][0-9])|(3[01]))
    is clearly a right way to match yyyy-mm-dd and check that the date is within some valid constraints, but do you need to be so strict?

    Wouldn't something like this:

    Code:
    \d{4}(-\d\d){2}
    be sufficient to match a validly formatted expression looking as a date?

    (I did not say a valid date, but your quite complicated expression will also fail to see that 2012-02-29 or even 2012-02-31 is not a valid date. So, the bottom line, is: do you want to validate a date format, in which case my much simpler expression might be sufficient, or do you want to validate a date, in which case a regular expression is probably not what you are looking for).
    so to be a bit more in-depth, the regex string above isn't mine, it's part of an overall xsd (xsd:restriction/xsdattern) that needs to be untouched, bugs and all unfortunately ;(

    which is another reason why i would like to be able to dynamically generate a new replace string based off of their validation string. since my code walks the object via reflection, i cannot obtain enough information dynamically to discern what the format is supposed to be. thus my choices are to somehow be able to figure out what format is inferred by the validation regex, or to hard code the format into my dynamic code..

    we will be using this automation scheme to download the xsd -> autogen the wrapper classes from it -> generate a gui with the appropriate controls per type -> serializing the object after manipulation -> calling a webservice with the resultant xml. using it this way means that if they ever make changes all we ever have to do is obtain the xsd, and then everything else is autogen'd from it, thus no recoding etc..

    unfortunately, i have no control over the regexes contained in the pattern elements, they are what they are ;(
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    830
    Rep Power
    496
    Just taking again only the first part of your regex to simplify the discussion:

    Code:
    [0-9]{4}-((0[1-9])|(1[012]))-((0[1-9])|([12][0-9])|(3[01]))
    will match, for example, "2012-12-29", but will not capture the year. "((0[1-9])|(1[012]))" will capture "12" and "((0[1-9])|([12][0-9])|(3[01]))" will capture "29". From there on, you can try to figure out what substitution will be possible.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    3
    Rep Power
    0
    Originally Posted by Laurent_R
    Just taking again only the first part of your regex to simplify the discussion:

    Code:
    [0-9]{4}-((0[1-9])|(1[012]))-((0[1-9])|([12][0-9])|(3[01]))
    will match, for example, "2012-12-29", but will not capture the year. "((0[1-9])|(1[012]))" will capture "12" and "((0[1-9])|([12][0-9])|(3[01]))" will capture "29". From there on, you can try to figure out what substitution will be possible.
    i am by no stretch of the imagination completely up on regex, let alone being a guru..

    would you happen to have something to start me off? or a link that might deal with something like this?
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    830
    Rep Power
    496
    These are useful tutorials on regular expression in Perl:

    Quick introduction:
    http://perldoc.perl.org/perlrequick.html

    More detailed tutorial:
    http://perldoc.perl.org/perlretut.html

    Very detailed reference:
    http://perldoc.perl.org/perlre.html

    I think it will explain many things to you, even if you are using another programming language than Perl (most modern regex packages are directly derived from the Perl regexes), but you can also look into the tutorials for your own language (it is not Perl).

IMN logo majestic logo threadwatch logo seochat tools logo