#1
  1. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2002
    Posts
    395
    Rep Power
    31

    Dissecting RegExp


    I've got a regexp in a script I'm troubleshooting. To know what I need to keep and what I need to chuck, it will help if I understand what this regexp does.
    Code:
    preg_replace("/[^\da-z.]/i", "", $imgfile_name);
    The imgfile_name is an uploaded file.

    The goal is to take the filename and remove all the characters Microsoft allows in file names that will cause the file name to have issues when saved on a unix system. Ultimately, I'd like an elegant regexp that allows upper and lower case alpha, numbers and hyphen only. No underscore, no spaces, no #s or ? or tilde or brackets or other things I've seen people include.

    Thanks.

    HeadElf
    HeadElf
    OfficeElf.com
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2002
    Posts
    395
    Rep Power
    31
    I've tested and it strips all but letters (case insensitive) and numbers. I tweaked it to include - and paired it with a str_replace to strip out any _ and it works.

    HeadElf
    HeadElf
    OfficeElf.com
  4. #3
  5. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,056
    Rep Power
    9398
    Aww, and here I had a nice breakdown of that expression...

    If you want to permit -s and not _s then
    Code:
    /[^\da-z.-]/i
    No str_replace needed - just one call to preg_replace.
  6. #4
  7. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2002
    Posts
    395
    Rep Power
    31
    I had switched to $res=preg_replace("/[^\d\w.-]/i", "", $string); but it still included the underscore in the result. Your bit doesn't. Thanks!

    HeadElf
    HeadElf
    OfficeElf.com
  8. #5
  9. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,056
    Rep Power
    9398
    Originally Posted by HeadElf
    but it still included the underscore in the result
    In the same way \d represents digits, \w represents letters (uppercase and lowercase), digits, and underscores.
  10. #6
  11. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2002
    Posts
    395
    Rep Power
    31
    Wonderful! Thanks!

    HeadElf
    HeadElf
    OfficeElf.com

IMN logo majestic logo threadwatch logo seochat tools logo