#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2006
    Posts
    5
    Rep Power
    0

    Question RegEX: How to remove non-ascii characters from string.


    Hi,

    I am new to reg ex. Please help me.

    I use ASP, but i dont think it matters.

    I am parsing large text files. some of them have non standard characters. I just want to parse a file and replace anything that is not found on a keyboard in usa.

    I just want to keep:

    A-Z, a-z, 0-9, .,?'"!@#$%^&*()-_=+";:<>/\|}{[]`~

    everything else i want to remove (asian letters, wacky characters, etc)

    Any help would be appreciated. Thanks in advance
  2. #2
  3. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2006
    Location
    Kent, England
    Posts
    857
    Rep Power
    575
    Try this

    Code:
    <%
    	Function RegExpReplace(Str, Pattern, Replacement)
    		Set objRegExp = New RegExp
    		objRegExp.Pattern = Pattern
    		objRegExp.Global = True
    		RegExpReplace = objRegExp.Replace(Str, Replacement)
    		Set objRegExp = Nothing
    	End Function
    	
    	strTest = "abcd2""$$"
    	strPattern = "[^A-Za-z 0-9 \.,\?'""!@#\$%\^&\*\(\)-_=\+;:<>\/\\\|\}\{\[\]`~]*"
    	strReplace = ""
    	
    	response.write strTest
    	response.write "<br>"
    	response.write RegExpReplace(strTest, strPattern, strReplace)
    %>

    Comments on this post

    • meishern agrees : good advice. works.
    It turns out there are stupid questions. And I don't know the answers!
    Over 50? Visit the Saga Zone - Social Networking for the Over 50's





    For every action there is an equal and opposite - government program
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2006
    Posts
    5
    Rep Power
    0
    thank you.

    someone suggested another way on a different forum. this seems to work as well:


    Set objRegExp = New RegExp
    objRegExp.Global = True
    objRegExp.IgnoreCase = True
    objRegExp.Pattern = "[^\x20-\x7E]"
    EditorialReview = objRegExp.Replace(EditorialReview,"")

    Comments on this post

    • ElijaTheGold agrees : Sweet
  6. #4
  7. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2006
    Location
    Kent, England
    Posts
    857
    Rep Power
    575
    Didn't know regexps could match on hex values (which
    is what it lookslike its doing ) I'll have to give it a
    try at some point.
    It turns out there are stupid questions. And I don't know the answers!
    Over 50? Visit the Saga Zone - Social Networking for the Over 50's





    For every action there is an equal and opposite - government program

IMN logo majestic logo threadwatch logo seochat tools logo