April 15th, 2013, 10:28 AM
Mysql regex alphanumerical with some special chars
Alright, I've had a some nightmares with it already so i bow my stupid head to the almighty hive-mind.
I run a small DB with the codes for spare parts. The codes are alphanumerical and i need some proper way to search for them. They can include special chars like parentheses (), and/or @, #, &, $, <, >.
So the codes could look like:
"A", "Bb", "2C8", "A7(BO)19", "B29H$", "29H(6JI)0#", "29H(6JI)0#<O>", etc
The problem is that all these special chars in codes are optional. And the user should be able to find the code by at least its alphabetical part. So the query is something like:
WHERE column REGEXP '^[a-z0-9\@\#\$\&\<>()]*
AND column LIKE CONCAT('%', '$user_value', '%')
The DB doesnt return me "S(LJ)9" or "09S(LJ)3$" if i seek for "SLJ" or "S(LJ)"
Would appreciate any help.
April 15th, 2013, 11:25 AM
the regexp value is missing the closing quote
April 15th, 2013, 01:20 PM
Thanks, you are right, its missing over here and it should be:
Originally Posted by r937
WHERE column REGEXP '^[a-z0-9\@\#\$\&\<>()]*'
AND column LIKE CONCAT('%', '$user_value', '%');
However its not missing in the code. The query is generated by the script is displayed for test purposes and then i check it in MySQL Workbench. For some things it works, for some it doesn'n.
April 15th, 2013, 07:53 PM
um, just guessing, but try this --
April 16th, 2013, 05:17 AM
Thanks, r937, I really missed this part, but i need also to be able to search for "S(LJ)9" with "SLJ". Any ideas?
Originally Posted by r937
April 16th, 2013, 05:52 AM
on the page in da mysql manual for regexp, one of the users posting comments says you have to put square brackets around any parentheses you want
not sure if that's even valid, but give it a try
April 16th, 2013, 07:03 AM
no, it's neither correct nor necessary.
Originally Posted by r937
The square brackets denote character classes (just like the one that's already there). And since parentheses in character classes are always interpreted as literal characters rather than delimiters, you can use this as a kind of poor man's escaping.
But the parentheses in the example already are in a character class, so no escaping necessary. If you still do it, you add literal brackets to the character class, so now it also matches "[" and "]" (the second occurence of each bracket has no effect).
When you actually need to escape characters within regexes, use a double backslash.
As to the original question:
Your problem is not the regex but the LIKE. The string "S(LJ)9" simply isn't LIKE "SLJ", regardless of how many regexes it has passed earlier.
If you wanna ignore all special characters in the column, remove the special characters from the input string and then make a a regex by placing a [@#$&<>()] character class between each character. Note: Do not try to escape "$" or "&" with single backslashes. That again is neither necessary (depending on the context) nor valid. A regex escape is a double backslash. A single backslashes is an escape character within the MySQL string.
Your example "SLJ" would become the regex
This matches "S(LJ)9", "09S(LJ)3$" and any string consisting of "SLJ" with possible "special characters" before, after or between the characters.
Not pretty, not efficient, but you asked for it.
Last edited by Jacques1; April 16th, 2013 at 07:15 AM.
April 16th, 2013, 08:30 AM
Thanks, guys. I've altered the query so the single slashes are not withing the character class brackets "" anymore. Seems to work the same as it did with slashes.
Thanks for the advice, Jacques1, I will try to seek the way you told me. I will seek for the records WHERE REGEXP '^[a-zA-Z0-9@#$&<>()]*' or '*regexp after each char as you told me*'. I think that will do the trick. That Its not efficient, i agree, but the table itself is only 30K of rows or less, so it will do for now.
April 16th, 2013, 09:13 AM
You don't need the old regex, just use the one I gave you.
I'm not even sure what you're trying to do with your regex, since it's not relevant for the search. Do you wanna check if the codes are valid? That's something for an INSERT and UPDATE trigger (or CHECK constraints in other database systems). You certainly don't validate data everytime somebody does a SELECT.
April 16th, 2013, 09:34 AM
I need to do something that would allow me to find the code from as minimal of the user input as possible. The idea is to create a pattern combined with the user-input to help the search. Am i really missing something?
April 16th, 2013, 09:49 AM
And that's exactly what the regex I gave you does. I don't see why you keep your own regex. What is that supposed to do?
Originally Posted by Rhiozan
Actually, your regex does absolutely nothing except keeping the CPU busy. It matches any string, because you ask for zero or more of certain characters at the beginning of the string. Obviously, this is a tautology: Either a string has those characters (> 0 occurences), or it doesn't have those characters (0 occurences).
April 16th, 2013, 10:47 AM
Thank you, Jacques1. The regex you provided worked.
But how do i restrict the search to the amount of alphabetical chars the user submitted? For instance i have the value "S9" that is stored in the db and that want to find with input of "S". But the problem is, the search returns irrelevant values like S(P19)12, SOIU and so on...
April 16th, 2013, 12:12 PM
Not sure if I understand you correctly. So before and after the search string, there mustn't be any other alphabetic characters?
So "SLJ" should match "4SL5J" or "(SL)J10" but not "SLJC" or "ASLJ"?
April 16th, 2013, 03:11 PM
Seems to be working fine! Thank you!