#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    2
    Rep Power
    0

    Help with substring query


    I'm a complete newb on MySQL so this is probably very basic stuff, but I'm stumped.

    Anyway, in my database is a column for product-codes where each entry starts with 3-6 upper or lowercase letters, may or may not be followed by a separator (".", " ", or "-") and a 5-7 digit number.

    ("asdf1234567" or "sDfa 1234567" or "FDSA-1234567"... you get the picture)

    I want to make a sorted list of all unique entries (asdf, sDfa, FDSA ...), without the digits.

    So, remove the digits, remove duplicates, sort remaining list. Seems like it should be easy to accomplish, but I'm failing miserably.
  2. #2
  3. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Hi,

    this looks like an issue with insufficient normalization. When those letters and digits have a certain meaning, they should actually be in separate fields to allow searching, grouping etc. with normal database functionalities.

    This might be less important in other database systems, but MySQL really sucks at regexes, so it's not a good idea to stuff different information in a single string.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Location
    Italy
    Posts
    36
    Rep Power
    2
    Originally Posted by WilfTarquin
    I'm a complete newb on MySQL so this is probably very basic stuff, but I'm stumped.

    Anyway, in my database is a column for product-codes where each entry starts with 3-6 upper or lowercase letters, may or may not be followed by a separator (".", " ", or "-") and a 5-7 digit number.

    ("asdf1234567" or "sDfa 1234567" or "FDSA-1234567"... you get the picture)

    I want to make a sorted list of all unique entries (asdf, sDfa, FDSA ...), without the digits.

    So, remove the digits, remove duplicates, sort remaining list. Seems like it should be easy to accomplish, but I'm failing miserably.
    You should change the design of your db. Write a function which gets the string part of the field, so that:
    YOUR_FUNCTION('FDSA-1234567') = 'FDSA'

    Here you don't know how to write a Stored Function, start from here:
    http://dev.mysql.com/doc/refman/5.5/en/create-procedure.html

    You can use the SUBSTRING() function in a loop to read the string char by char:
    http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substring

    Now, use your function to copy the data in a new table (statement: CREATE TABLE SELECT) and then build a unique index on it to delete duplicates (statement: ALTER TABLE IGNORE ADD UNIQUE).

    At this point, you can read ordered results using a normal SELECT ... ORDER BY field.

    For the future, turn your stored function into a INSERT trigger, so that it only get the letters from NEW.column_name:
    http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html

    BUT...
    if you are confused and you understood nothing from what I've written, then you are not ready to use any relational database
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Location
    Italy
    Posts
    36
    Rep Power
    2
    @Jacques1 I forgot to answer you: REGEXP operator in MySQL returns 1 or 0. It is meant to validate data, not to transform strings.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    2
    Rep Power
    0
    Originally Posted by Jacques1
    this looks like an issue with insufficient normalization.
    Yes. It is "inherited" data which I'm hoping to clean up.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    95
    Rep Power
    3
    Hi Wilf,

    I can totally relate. Often have to deal with data that I cannot modify.

    Here's something that might work:
    select CASE
    WHEN (substr(product_codes,4,1) regexp '[^a-z]' THEN left(product_codes,3)
    WHEN (substr(product_codes,5,1) regexp '[^a-z]' THEN left(product_codes,4)
    WHEN (substr(product_codes,6,1) regexp '[^a-z]' THEN left(product_codes,5)
    ELSE left(product_codes,6)
    END AS new_product_codes
    from ...
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,481
    Rep Power
    1752
    Don't you want/need to check for longest first as a string starting with 5 letters will also return true for the regex of a substring checking for 4 letters.
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    95
    Rep Power
    3
    Originally Posted by SimonJM
    Don't you want/need to check for longest first as a string starting with 5 letters will also return true for the regex of a substring checking for 4 letters.
    the query checks the reverse: that the 4th is not a letter.
    if the 4th is not a letter, i don't want to check if the 5th is a letter, since it doesn't matter
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,481
    Rep Power
    1752
    Ahh, the ^ is the negation not the start anchor!
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc

IMN logo majestic logo threadwatch logo seochat tools logo