#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2013
    Location
    Vienna
    Posts
    2
    Rep Power
    0

    Post ECMA to PRCE Conversion - How to do?


    Dear forumianers,

    I have worked out one regex-pattern to help me matching user entered fullnames against plausibility. I worked it out in a ECMA based tool (gskinner.com/RegExr/) and there it works perfectly.

    Here it is:

    ^(([^\$%\^*£=~@\d]+){2,30})( ([^\$%\^\.*£=~@\d]+){2,30})+

    My problem now is that it does not work as expected in PRCE based functions like preg_match in PHP.

    E.g. "Ingmar Erdös" (my name) works fine but "Wolf Dietmar Eibensteiner" is not matched at all.

    So the first group should match a word without digits and special chars from min. 2 to max. 30 chars, followed by a space seperated word without digits and also from min. 2 to max. 30 chars. After the second group I put a plus sign to match as many words as needed after the first word - but exactly that's what is not working in PHP.

    Does anybody have an idea or a hint to the right direction on how to convert this to a perfectly matching PRCE (PHP) pattern?

    Or, can you tell me what I'm doing or thinking wrong?


    Thank you in advance and...

    Best regards from Vienna!

    Ingmar (aka AceLine)
  2. #2
  3. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,179
    Rep Power
    9398
    You've overcomplicated the expression and possibly introduced some problems to it. Try this:
    Code:
    ^[^$%^£*=~@\d]{2,30}( [^$%^.£*=~@\d]{2,30})+$
    1. (X+){2,30} will accept two or more of X without stopping at the 30 limit.
    2. Removed a couple unneeded sets of parentheses (easier to read without them)
    3. ^ anchor so I assume you want a $ anchor too.
    4. Many special characters lose their meanings in character sets: you don't need to escape $ or *, and ^ only when it's the first character.
    5. Did you intend to put a period in the second character set? It's not in the first.

    With the delimiters that I'm sure you added in the code,
    PHP Code:
    preg_match('/^[^$%^£*=~@\d]{2,30}( [^$%^.£*=~@\d]{2,30})+$/'$name
    it matches against both names.

    But what are you doing with this expression, exactly? Trying to validate a name? For what purposes? If you need something "perfect" then we should have a quick chat about this.
  4. #3
  5. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Hi,

    apart from all the technical issues, this "validation" doesn't work at all. It accepts five spaces or "!! !!" as valid names, but if my first name happens to have only one character, you tell me I can't have that name. And according to your logic (which doesn't work out), nobody is allowed to have a last name with more than 30 characters. C'mon. Do you really wanna reject a person telling them they got a "wrong name"?

    You also let the poor regex parser backtrack like hell, because you have it run into traps all the time: You start with a sloppy pattern so the parser matches the whole string until the end. Then suddenly you tell it you want this pattern at least twice, so the parser has to back off, reduce the first match and try again. Then suddenly you tell it you want a space, so the parser has to back off until it finds a space and try again. Then again you have it read everything until the end and then force it to back off to find a specific character -- and so on. Unless you purposely wanna stress your CPU, this is pretty much the worst you can do.

    You need to write explicit patterns. You need to tell the parser exactly what it should match rather than letting it guess.

    For example: You want <something> and then a space? Then don't let the <something> match spaces! Because if it does, it will consume the space when matching <something>, then it realizes it needs a space for the next pattern, and then it needs to go back until it finds this space.

    A good way to understand how regex are processed is to use a parser which outputs the steps. Unfortunately, I only know "RegexBuddy", which is pretty expensive.
    Last edited by Jacques1; July 29th, 2013 at 02:10 AM.
    The 6 worst sins of securityHow to (properly) access a MySQL database with PHP

    Why can’t I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2013
    Location
    Vienna
    Posts
    2
    Rep Power
    0
    Hi Jacques1,
    Hi requinix,

    Thanx both of you for your very fast reply!

    @Jacques1: Yes, it seams I forgot some chars to exclude from possible chars. "!! !!" indeed shouldn't be a possible name. Thanx for that idea.

    @all: You intend to not understand, what for I need this pattern... It's very easy to explain:

    "I want people to enter there full name, not only their firstname, not only their lastname. Also I don't want jokers to input things like "hahaha", "Ki$$ my ***" or "King Georg the 5th"... I don't want it to be to strict, I'd like it to be as naturally as possible."

    The peoples full names are written to the database for others to search and possibly find it. I want to have at least 99% of real names and no "dead data" in the database.

    @requinix: yeah, that would be great. how could we start a chat? I'm not familiar yet with this system...

    Best regards to all,

    Ingmar (aka AceLine)
  8. #5
  9. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Some people do not have a first name and a last name. Either for cultural reasons or because they go by their artist's name. You know, not everybody lives in the Western world and uses the name they got from their parents. Does that mean those people are not allowed to use your website?

    "Validations" also prevent nobody from cheating. If you don't let me call myself "hahaha", well, then I'll call myself "Bill Clinton". How exactly does that help you? Just because a name is syntactically valid (according to your definition) doesn't mean that it's true.

    I think there's a fundamental misunderstanding. First of all, you cannot "validate" human names. If you try, you'll discriminate a lot of people based on cultural arrogance or naivity. Secondly, "validation" does not prevent cheating. The best it can do is help the user recognize typos. But if the user doesn't want to enter real data, there's nothing you could do about that.

    Make a separate text field for the last name to tell your users what you expect. But don't try to enforce "correct names".

    Comments on this post

    • Laurent_R agrees
    The 6 worst sins of securityHow to (properly) access a MySQL database with PHP

    Why can’t I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  10. #6
  11. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,179
    Rep Power
    9398
    Names are like email addresses: you can make as cool a filter as you want, but if I don't want to use my real one then I'll just make something valid but wrong. Bill Clinton and prezbillyjeff@whitehouse.lol.

    Consider a reporting or moderation system capable of renaming people if they're using something invalid or obviously wrong, because whatever regex you dream up will almost certainly exclude somebody's name. Now realistically you can do things like not allow common symbols, but keep in mind parents these days are doing increasingly weird things with their children's names.

    Comments on this post

    • Laurent_R agrees

IMN logo majestic logo threadwatch logo seochat tools logo