Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Nobbies beach, Gold Coast. It's beautiful.
    Posts
    2,575
    Rep Power
    171

    Regex help needed to validate and allow only a-z,0-9 and dash and space


    Hi; I want to allow only 1 single space, but my code allows many.
    PHP Code:
    if( (preg_match("/^[a-zA-Z][a-zA-Z -]+$/"$_POST["firstname"]) === 0) || strlen($_POST['firstname'])>20 || empty($_POST['firstname']))
                    {
                        
    $error TRUE;
                        
    $firstname_error TRUE;
                    } 
    How can I chnage that? Thanks
  2. #2
  3. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    Rather than "a letter and (letters or spaces or hyphens)" write it as "a letter and maybe some (letters or hyphens) and then maybe (a space followed by more letters or hyphens)".
  4. #3
  5. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Nobbies beach, Gold Coast. It's beautiful.
    Posts
    2,575
    Rep Power
    171
    Originally Posted by requinix
    Rather than "a letter and (letters or spaces or hyphens)" write it as "a letter and maybe some (letters or hyphens) and then maybe (a space followed by more letters or hyphens)".
    Ok, whats the answer? How can I write it?
  6. #4
  7. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    Code:
    # a letter
    [a-zA-Z]
    # and maybe some (letters or hyphens)
    [a-zA-Z-]*
    # and then maybe (
    (
    # a space
     # <- that's a space there
    # followed by more letters or hyphens
    [a-zA-Z-]+
    # )
    )?
    Code:
    ^[a-zA-Z][a-zA-Z-]*( [a-zA-Z-]+)?$
  8. #5
  9. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Nobbies beach, Gold Coast. It's beautiful.
    Posts
    2,575
    Rep Power
    171
    Originally Posted by requinix
    Code:
    # a letter
    [a-zA-Z]
    # and maybe some (letters or hyphens)
    [a-zA-Z-]*
    # and then maybe (
    (
    # a space
     # <- that's a space there
    # followed by more letters or hyphens
    [a-zA-Z-]+
    # )
    )?
    Code:
    ^[a-zA-Z][a-zA-Z-]*( [a-zA-Z-]+)?$
    Thank you
    Damn! It looks like its own language. Regex and mod_rewrite are confusing subjects.
  10. #6
  11. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,922
    Rep Power
    1045
    Hi,

    what if my first name is Jos? Does that mean I'm not allowed to use your website? What about "Jrme", "Bjrn" or "Gran"?

    In the English-speaking countries, you might not be aware of this, but not every name on this planet is limited to latin characters. Different countries have different names. Assuming you could cover them all with a simple regex is rather naive.

    I don't think that you can validate human names at all. And why even try? Let's say you set up a name database containing 10 Million entries to at least cover common names. That still doesn't make the name I've entered true. Nothing prevents me from telling you I'm "Bill Gates".

    I understand that customers want "pretty data" and that obvious fake names might make them think your code is somehow "wrong". But the regex above is way over the top and discriminates a lot of people. If you absolutely need to validate the name, use loose rules like "at least one non-space" or something like that.
    The 6 worst sins of security How to (properly) access a MySQL database with PHP

    Why cant I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  12. #7
  13. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Nobbies beach, Gold Coast. It's beautiful.
    Posts
    2,575
    Rep Power
    171
    Originally Posted by Jacques1
    Hi,

    what if my first name is Jos? Does that mean I'm not allowed to use your website? What about "Jrme", "Bjrn" or "Gran"?

    In the English-speaking countries, you might not be aware of this, but not every name on this planet is limited to latin characters. Different countries have different names. Assuming you could cover them all with a simple regex is rather naive.

    I don't think that you can validate human names at all. And why even try? Let's say you set up a name database containing 10 Million entries to at least cover common names. That still doesn't make the name I've entered true. Nothing prevents me from telling you I'm "Bill Gates".

    I understand that customers want "pretty data" and that obvious fake names might make them think your code is somehow "wrong". But the regex above is way over the top and discriminates a lot of people. If you absolutely need to validate the name, use loose rules like "at least one non-space" or something like that.
    Would you please write your suggestion of first and last name validation ?
  14. #8
  15. No Profile Picture
    Dazed&Confused
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2002
    Location
    Tempe, AZ
    Posts
    501
    Rep Power
    127
    Originally Posted by zxcvbnm
    Would you please write your suggestion of first and last name validation ?
    Okay... I might have something functional.

    PHP Code:
    <?php
        $names 
    = array(
            
    "Thomas",       // Legal
            
    "Mary Jane",    // Legal
            
    "Mary-Jane",    // Legal
            
    "T'chang",      // Legal
            
    "T`chang",      // Legal
            
    "Jrme",       // Legal
            
    "Bjrn",        // Legal
            
    "Gran",        // Legal
            
    "Gran0",       // Illegal
            
    "Robert_",      // Illegal
            
    "Robert/",      // Illegal
            
    "Gran/",       // Illegal
            
    'Robert$',      // Illegal
            
    'Gran$',       // Illegal
        
    );
        foreach( 
    $names as $name ){
            if ( 
    preg_match('/[^\p{L}\-\s\'\`]/',$name) ){
                print 
    "$name: Illegal characters<br/>\n";
            }
        }
    You can add other allowed characters to the regex as needed.
  16. #9
  17. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,692
    Rep Power
    6351
    I have a suggested name validation: Don't do it. Who the hell are you to tell someone if their name is valid or not? Remember there's someone on this planet who used to be named "prince" and who is now legally named a small picture which appears to be the path neutrinos take in a particle accelerator.

    The only thing you should be doing to names is protecting your own site from SQL injections and XSS. Everything else is not only unnecessary, but insensitive and annoying to people with names you'd consider "weird" but they've had their entire lives.
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  18. #10
  19. No Profile Picture
    Dazed&Confused
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2002
    Location
    Tempe, AZ
    Posts
    501
    Rep Power
    127
    Originally Posted by ManiacDan
    I have a suggested name validation: Don't do it. Who the hell are you to tell someone if their name is valid or not? Remember there's someone on this planet who used to be named "prince" and who is now legally named a small picture which appears to be the path neutrinos take in a particle accelerator.
    And because of that person's decision they can't put their name into any form field on the internet; there's no keyboard symbol for their "name".

    I'm fairly sure the filter above will work fine for the other 7 billion+ people out there.

    I also like filtering names to prevent copy&paste problems, like the special hyphens used in Word. That's lead to data lookup problems in systems I've maintained in the past.

    That said, I think asking for a first name/last name is a bit limiting. A more flexible convention might be to ask for their full name and then, separately, ask how they'd like to be referred to. Some cultures have a lot more components to their name than just first/last. Asking for a single middle initial is presumptuous, too--my son has two middle names.
  20. #11
  21. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,692
    Rep Power
    6351
    I'm fairly sure the filter above will work fine for the other 7 billion+ people out there.
    And by that of course you mean the 2 billion people with names based on the Latin character set right? Nobody from Asia, Russia, or Northern Europe will have names valid enough for your website.

    Working for a payment processing company, I've learned that the looser your name restrictions, the better. We had a client who failed to collect hundreds of thousands of dollars worth of business because a payment processor (which I won't name) was marking their names as invalid and wasn't processing the payments on their cards.

    The "word-style" characters are indeed a problem, but instead of saying "Latin letters only" why not simply replace those 5-6 characters with their ascii or unicode equivalent?

    I'll say it once more: It is not up to you, me, or anyone else to tell a human being "your name isn't a name." If you don't want anyone without a European name to use your website, that's your own decision, but there's someone sitting within 20ft of me right now whose name would fail every validation check I've seen on these forums.
    Last edited by ManiacDan; July 9th, 2013 at 01:42 PM.
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  22. #12
  23. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,922
    Rep Power
    1045
    The regex above does account for non-latin characters (the "L" stands for "letter", not "latin" or something). But of course it's completely useless with regard to validation, since it accepts any nonsense all long as it only contains letters, spaces and apostrophes. According to your regex, " " (two spaces) or "aaaa" are all perfectly valid names. What exactly do you gain from this check?

    What do you gain from validation at all? How is "Peter Pan" better than "123456"? Both is nonsense data.
    The 6 worst sins of security How to (properly) access a MySQL database with PHP

    Why cant I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  24. #13
  25. No Profile Picture
    Dazed&Confused
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2002
    Location
    Tempe, AZ
    Posts
    501
    Rep Power
    127
    Originally Posted by ManiacDan
    And by that of course you mean the 2 billion people with names based on the Latin character set right? Nobody from Asia, Russia, or Northern Europe will have names valid enough for your website.
    What Jacques said.
    I even tested a decent combination of Northern Europe names, right there in the example code I posted...

    Originally Posted by Jacques1
    What do you gain from validation at all? How is "Peter Pan" better than "123456"? Both is nonsense data.
    1. At the very least it helps prevent fat-fingering by the user, giving them a chance to correct their input if they just slipped and hit "[" when they meant to hit "P".

    2. It'll prevent them from accidentally (or intentionally) putting odd characters in, like tab. (which can happen when copy&pasting)

    3. It'll help filter out em and en dashes, and other special punctuation that word processors like to force on you, that can lead to data entry the user didn't intend.

    #2 and #3 are issues I have personal experience with, that required me to troubleshoot when the UI couldn't find data that users were searching for.
  26. #14
  27. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,922
    Rep Power
    1045
    The solution to #2 and #3 is normalizing the input (like ManiacDan already said), not blocking the user. Imagine a poor user desperetaly trying to submit the form and getting rejected again and again, just because your application can't handle certain characters. That's obviously stupid. It's not the user's responsibility to prepare the input for your application. That's your job. If your program chokes on tabs (wtf?), it needs to be repaired.

    #1 is pretty much the only argument that's kinda sorta reasonable.
    The 6 worst sins of security How to (properly) access a MySQL database with PHP

    Why cant I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  28. #15
  29. No Profile Picture
    Dazed&Confused
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2002
    Location
    Tempe, AZ
    Posts
    501
    Rep Power
    127
    Originally Posted by Jacques1
    The solution to #2 and #3 is normalizing the input (like ManiacDan already said), not blocking the user.
    I'm not in the habit of changing the data a user puts in. I'll proofread it and let them decide how they want to change it to fit the guidelines, but I'm not so presumptuous that I'm going to decide how it should be changed on my own.

    Comments on this post

    • ManiacDan disagrees : My very first post provided someone's name. You declared it an invalid name. THAT IS THE ENTIRE PROBLEM
    Last edited by dmittner; July 9th, 2013 at 03:54 PM.
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo