#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2009
    Location
    West Mersea, Essex, UK
    Posts
    8
    Rep Power
    0

    Question Regex for checking max lines and line lengths for a text block


    Hi,

    I'm looking for a regular expression to return a match against a block of text ONLY if it contains NOT more than a certain number of lines and NO line exceeds a certain number of characters (the maxima will vary).

    It will be used to validate a text area on a web page that (for reasons beyond my control) must then be split into a fixed number of lines of a specified maximum length.

    It sounds to me like a simple problem but I suspect is much more challenging then it sounds. Can anyone assist?
  2. #2
  3. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    938
    Originally Posted by Shiresmith
    Hi,

    ...

    Can anyone assist?
    Sure, where are you stuck exactly?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2009
    Location
    West Mersea, Essex, UK
    Posts
    8
    Rep Power
    0

    Unhappy


    Originally Posted by prometheuzz
    Sure, where are you stuck exactly?
    Apart from my sad lack of knowledge...

    regex is clearly designed to detect patterns, not the absence of them. My requirement is more a check that certain cases are not present.

    I can work out an expression to detect for example up to 5 lines of text of up to 30 characters expression (^.{0,30}$){0,5} but how do I generate a "no match" if there are more than x lines or any line contains more than y characters? Can it actually be done?
  6. #4
  7. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    938
    Originally Posted by Shiresmith
    Apart from my sad lack of knowledge...
    Nonsense! You were already pretty close yourself.

    Originally Posted by Shiresmith
    regex is clearly designed to detect patterns, not the absence of them. My requirement is more a check that certain cases are not present.
    You've got a good point. But there's a pattern here as well.

    Originally Posted by Shiresmith
    I can work out an expression to detect for example up to 5 lines of text of up to 30 characters expression (^.{0,30}$){0,5} but how do I generate a "no match" if there are more than x lines or any line contains more than y characters? Can it actually be done?
    Okay, by default, the ^ and $ depict the start- and end of the input string respectively. They sometimes mean start- and end of a line, but we don't want that in your particular case.

    Now, let's first describe what a valid line is in your case:

    line := between 0 and 30 characters ({0,30}) other than new line characters ([^\r\n]), followed by a line break (\r?\n), or the end of the string ($) (this lats part is important!). In regex, this would look like:

    Code:
    [^\r\n]{0,30}(\r?\n|$)
    Now you're interested in exactly five of those:

    Code:
    ([^\r\n]{0,30}(\r?\n|$)){5}
    But, before those five characters, there should be the start-of-the-string, and at the end, there should come the end-of-the-string:

    Code:
    ^([^\r\n]{0,30}(\r?\n|$)){5}$
    which is the final regex. This final regex will only match a string consisting of 5 lines of text (no more and no less) which in their turn have between 0 and 30 characters in them.

    Hope that helps.

    Comments on this post

    • Shiresmith agrees
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2009
    Location
    West Mersea, Essex, UK
    Posts
    8
    Rep Power
    0
    Originally Posted by prometheuzz

    Code:
    ^([^\r\n]{0,30}(\r?\n|$)){5}$
    which is the final regex. This final regex will only match a string consisting of 5 lines of text (no more and no less) which in their turn have between 0 and 30 characters in them.

    Hope that helps.
    That works perfectly! Special thanks for explaining the logic behind it too!

    One last question (if I may be so bold?). When I use the expression I actually get a match for 0 to 5 lines which while being exactly what I really need, sounds different to what you were expecting

    This final regex will only match a string consisting of 5 lines of text (no more and no less)
    I was expecting to have to change the expression to
    Code:
    ^([^\r\n]{0,30}(\r?\n|$)){0,5}$
    but did not have to. Any ideas why?
  10. #6
  11. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    938
    Originally Posted by Shiresmith
    That works perfectly! Special thanks for explaining the logic behind it too!
    You're welcome.

    Originally Posted by Shiresmith
    One last question (if I may be so bold?). When I use the expression I actually get a match for 0 to 5 lines which while being exactly what I really need, sounds different to what you were expecting



    I was expecting to have to change the expression to
    Code:
    ^([^\r\n]{0,30}(\r?\n|$)){0,5}$
    but did not have to. Any ideas why?
    That should not be the case. Could you post the code and actual text you're matching against?
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2009
    Location
    West Mersea, Essex, UK
    Posts
    8
    Rep Power
    0
    Originally Posted by prometheuzz
    You're welcome.

    That should not be the case. Could you post the code and actual text you're matching against?
    I'm testing using a regular expression validator within an asp.net web page so I don't really have any code to show you. Could it be that ye olde microsoft are following slightly different rules to the rest of the world (as they all too often do)
    Whereas
    1
    2
    3

    5
    6
    correctly doesn't match and neither does
    1
    2
    1234567890123456789012345678901
    4
    5
    yet
    1
    matches as does
    1
    2
    and
    1
    2
    3
    4
    Shall we just silently curse microsoft and move on?
  14. #8
  15. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    938
    Originally Posted by Shiresmith
    ...
    Shall we just silently curse microsoft and move on?
    Tempting, but the bug lies in my regex! I sometimes get overly-confident in my regex-skills and don't test what I post...
    Here's what happened: when there are less than 5 lines, this part of the regex: [^\r\n]{0,30}$, which is interpreted as zero characters followed by the end of the string, matches as many times until the total 5 matches are reached. We need to remove that end-of-the-string characters outside the {5} group.

    So, the proper way to do this is:
    1 - match 0 to 30 characters other than new line characters followed by a new line character (^([^\r\n]{0,30}(\r?\n)));
    2 - repeat step one 4 times (a chunk of text containing n lines has n-1 new line characters in it!) ({4});
    3 - consume the rest of the string until the end-of-the-string ([^\r\n]*$).

    Making that the final regex:

    Code:
    ^([^\r\n]{0,30}(\r?\n)){4}[^\r\n]*$
    To be sure, I tested this with PHP, and all works well:

    PHP Code:
    $tests = array(
      
    "1\n2\n3\n\n5\n6",
      
    "1\n2\n1234567890123456789012345678901\n4\n5",
      
    "1",
      
    "1\n2",
      
    "1\n2\n3\n4",
      
    "1\n\n\n\n5"
      
    "1\n2\n\n\n",
      
    "\n\n\n\n5",
      
    "1\n2\n123456789012345678901234567890\n4\n5"
    );
    echo 
    "==============================\n";
    foreach(
    $tests as $test) {
      if(
    preg_match('/^([^\r\n]{0,30}(\r?\n)){4}[^\r\n]*$/'$test)) {
        echo 
    "YES!\n";
      } else {
        echo 
    "Nope...\n";
      }
      echo 
    $test "\n==============================\n";


    You may or may not be able to run this, so here's what the output is of t he snippet above:

    Code:
    ==============================
    Nope...
    1
    2
    3
    
    5
    6
    ==============================
    Nope...
    1
    2
    1234567890123456789012345678901
    4
    5
    ==============================
    Nope...
    1
    ==============================
    Nope...
    1
    2
    ==============================
    Nope...
    1
    2
    3
    4
    ==============================
    YES!
    1
    
    
    
    5
    ==============================
    YES!
    1
    2
    
    
    
    ==============================
    YES!
    
    
    
    
    5
    ==============================
    YES!
    1
    2
    123456789012345678901234567890
    4
    5
    ==============================
    Last edited by prometheuzz; August 11th, 2009 at 06:32 AM.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2009
    Location
    West Mersea, Essex, UK
    Posts
    8
    Rep Power
    0

    Thumbs up


    Originally Posted by prometheuzz
    Tempting, but the bug lies in my regex! I sometimes get overly-confident in my regex-skills and don't test what I post...
    LOL - as if we haven't ALL been there (overconfident in our code)!

    You are far too good a teacher! I think that if you modify the result one more time from
    Code:
    ^([^\r\n]{0,30}(\r?\n)){4}[^\r\n]*$
    to
    Code:
    ^([^\r\n]{0,30}(\r?\n)){4}[^\r\n]{0,30}$
    then it will do exactly what it says on the tin including when the 5th line contains > 30 characters?

    My most gracious thanks for your help!!

    I would like to award you my entire ration of points for today - but the little drop-down contains only 0 as an option even though I have given no points elsewhere. My humblest apologies!!
  18. #10
  19. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    938
    Originally Posted by Shiresmith
    LOL - as if we haven't ALL been there (overconfident in our code)!

    You are far too good a teacher! I think that if you modify the result one more time from
    Code:
    ^([^\r\n]{0,30}(\r?\n)){4}[^\r\n]*$
    to
    Code:
    ^([^\r\n]{0,30}(\r?\n)){4}[^\r\n]{0,30}$
    then it will do exactly what it says on the tin including when the 5th line contains > 30 characters?
    Ah yes, of course, that is correct! Well spotted.

    Originally Posted by Shiresmith
    My most gracious thanks for your help!!

    I would like to award you my entire ration of points for today - but the little drop-down contains only 0 as an option even though I have given no points elsewhere. My humblest apologies!!
    No problem, your sincere gratitude is worth much more to me!

IMN logo majestic logo threadwatch logo seochat tools logo