Regex Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming Languages - MoreRegex Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old November 18th, 2012, 12:15 PM
boxersoft boxersoft is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2012
Posts: 15 boxersoft User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 4 h 17 m 28 sec
Reputation Power: 0
PHP - Test for repeated items in a delimited list

Given a string of items separated by a delimiter character, how can I test to see if any of those items appears more than once anywhere in the list? For example:

Michael Jackson/Phil Collins/Peter Gabriel/Lionel Ritchie/George/Phil Collins/Michael/Don Henley

... should be seen as having a duplicate (Phil Collins).

Delimiters can be added to the start and end of the lists and/or doubled-up (e.g. //) if that helps.

Thanks.

Reply With Quote
  #2  
Old November 18th, 2012, 12:37 PM
Laurent_R Laurent_R is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jun 2012
Posts: 550 Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level) 
Time spent in forums: 5 Days 3 h 26 m 28 sec
Reputation Power: 406
I probably would not use regexes for that. Split your line, store the elements in an array (or a map, a hash table, a dictionary, whatever data structure is the best in PHP) and check for duplicate entries.

Reply With Quote
  #3  
Old November 18th, 2012, 12:45 PM
boxersoft boxersoft is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2012
Posts: 15 boxersoft User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 4 h 17 m 28 sec
Reputation Power: 0
Certainly an option - it just felt like something that ought to be achieveable with a single call to preg_match(). If not, then fair enough.

Thanks for the reply.

Reply With Quote
  #4  
Old November 19th, 2012, 06:28 AM
Laurent_R Laurent_R is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jun 2012
Posts: 550 Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)Laurent_R User rank is Lieutenant Colonel (40000 - 50000 Reputation Level) 
Time spent in forums: 5 Days 3 h 26 m 28 sec
Reputation Power: 406
The point is that is is fairly easy to apply a regaex so that you have:
Michael Jackson -> 1st match ($1 or /1)
Phil Collins -> 2nd match
Peter Gabriel -> 3rd match
Lionel Ritchie -> 4th match.

Let's stop here, we know there hasn't been any duplicate so far and one is coming (although the program does not know either, so let see how it can find out).

Now, you get to the first duplicate, Phil Collins. How do you know it is a duplicate? You just can't unless you are prepared to go through every single match so far, and that's where it becomes difficult. Within a regex, you dont really know how many matches you made so far, so you can't really loop on those matches. And even if you did know how to do it, you'd have no other choice that just trying every single match so far. So, for every new word (here, for the sake of the discussion, "Phil Collins" is a single word in my mind), you have to check all the words recorded so far. This is quickly becoming very unefficient.

Whereas if you use a hash table (or whatever data structure, I would use a hash in Perl), checking whether you have already met that word is very fast.

If you know about the "big Oh" notation, we can say that checking for one word in a list is a search in O(n) complexuty (with n the number of words so far). Doing it for each new word has a complexity of O(nē);. Searching a word in a hash is a search in 0(1); searching over for every new word has a complexity of O(n). The difference between O(n) and 0(nē) is very significant, it is actually a huge difference as soon as you start to have more than a few words..

So in brief, regex are impractical for this type of problem (as far as I can say, I may miss the clever idea that does it), but if you found a practical way to use them, they would be extremely likely to be very unefficient compared to the hash table solution.

Reply With Quote
  #5  
Old November 19th, 2012, 06:34 AM
boxersoft boxersoft is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2012
Posts: 15 boxersoft User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 4 h 17 m 28 sec
Reputation Power: 0
Point taken - I am already proceeding with a hash-based approach.

Thanks very much.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreRegex Programming > PHP - Test for repeated items in a delimited list

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap