Regex Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming Languages - MoreRegex Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old March 18th, 2012, 10:36 PM
FOBioPatel FOBioPatel is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2012
Posts: 2 FOBioPatel User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 25 m 34 sec
Reputation Power: 0
Other - Need to use REGEX to catch all conformations of 4 bad words

I have the ability to parse chat in realtime in a public space with REGEX, and I have some issues with a handful of folks using racial and homophobic slurs, which need to be blocked.

I have made the following REGEX strings, but I need a better trained eye to review them, and optimize them for efficiency and thoroughness.

I have 4 chat patterns specifically that I am trying to catch and block:

Code:
"chatpattern"		"[nN]+[!1iI]+[Gg]+[3ueaUEA]+[rR]+([sS])?+([:punct:])?+"
"chatpattern"		"[fF]+[@a4A]+[gG]+[aAoOuU0]+[tT]+([sS])?+([:punct:])?+"
"chatpattern"		"[fF]+[@a4A]+[gG]+([sS])?+([:punct:])?+"
"chatpattern"		"[Nn]+[!1iI]+[gG]+([sS])?+([:punct:])?+"

Reply With Quote
  #2  
Old March 18th, 2012, 10:47 PM
FOBioPatel FOBioPatel is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2012
Posts: 2 FOBioPatel User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 25 m 34 sec
Reputation Power: 0
I have revised my filter as follows:

Code:
		"chatpattern"		"[nN]+[!1iI]+[Gg]+[3ueaUEA]+[rRsS]+"
		"chatpattern"		"[fF]+[@a4A]+[gG]+[aAoOuU0]+[tTsS]+"
		"chatpattern"		"[fF]+[@a4A]+[gG]+([sS])?+"
		"chatpattern"		"[Nn]+[!1iI]+[gG]+([sS])?+"

Reply With Quote
  #3  
Old March 19th, 2012, 03:27 AM
abareplace abareplace is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2011
Posts: 29 abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 8 h 25 m 9 sec
Reputation Power: 0
You should check for word boundaries with \b to prevent false positives. The last pattern matches "nig" in "night".

Reply With Quote
  #4  
Old March 19th, 2012, 02:01 PM
ragax's Avatar
ragax ragax is offline
Turn left at the third duck
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2011
Location: Nelson, NZ
Posts: 93 ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Day 24 m 37 sec
Reputation Power: 92
Hi FOBioPatel,
Welcome to the forum!

In addition to abareplace's comment (you would use \b around some words or the equivalent syntax for your regex engine), here are some suggestions on your rev.

- it looks like each of your character classes contains the same upper- and lower-case letter, maybe your regex engine has a "case-insensitive" mode?
- you have a + after each character class, meaning "one or several" of the elements in the class. This makes sense to me for repeated letters such as the G, but are you sure you want that everywhere?
- your last two patterns have (parentheses) which (i) result in the capture of a string of esses in Group 1 (unneeded I assume) and (ii) are not needed for the regex to function.
- your last two patterns have a "?+" modifier, which I am fairly sure is not what you intend. The ? makes the esses optional, the + makes the group of esses atomic. Guessing that your intent is to make the s optional, a simple ? would be enough.

As a way of example, in case-insensitive mode (if available), your first regex could be simplified to this:
Code:
n[!1iI]g+[3ueaUEA]rs?

Here, we're not using word boundaries because you'd be happy to match that pattern even when embedded in more characters.
Also, you can drop the s?, because once you're past the R, you know you have a match:
Code:
n[!1iI]g+[3ueaUEA]r


Let us know if you need more help with this.
Wishing you a fun week.
__________________
Regex Tutorial | Latest RegexBuddy Demo

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreRegex Programming > Other - Need to use REGEX to catch all conformations of 4 bad words

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap