Ruby Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesRuby Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old October 12th, 2007, 11:37 PM
tschneit tschneit is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Oct 2007
Posts: 7 tschneit User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 18 m 33 sec
Reputation Power: 0
Set-Based Search

Hello -
I need to efficiently search a large number of sets to try and find a match, or near match, for a given set.
This is to take place in a web-application. Has anyone got any suggestions on how I might do this while being reasonable fast and efficient? Oh, I'll be doing it in ruby, which is why I posted in here.

All ideas I've come up with so far seem to be leading nowhere.

Thanks for any input!
-tschneit

Reply With Quote
  #2  
Old October 16th, 2007, 12:17 PM
[H4z3] [H4z3] is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2007
Location: /usr/bin/ruby
Posts: 63 [H4z3] User rank is Sergeant Major (2000 - 5000 Reputation Level)[H4z3] User rank is Sergeant Major (2000 - 5000 Reputation Level)[H4z3] User rank is Sergeant Major (2000 - 5000 Reputation Level)[H4z3] User rank is Sergeant Major (2000 - 5000 Reputation Level)[H4z3] User rank is Sergeant Major (2000 - 5000 Reputation Level)[H4z3] User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 20 h 32 m 36 sec
Reputation Power: 27
What do you have so far?

Reply With Quote
  #3  
Old October 16th, 2007, 12:29 PM
L7Sqr L7Sqr is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jan 2004
Location: Constant Limbo
Posts: 989 L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 2 Weeks 2 Days 22 h 45 m 6 sec
Reputation Power: 362
Send a message via AIM to L7Sqr
What are these 'sets'? Do you have the option of pushing the search out to a separate thread to offset any delay? Have you investigated what is out there? If so how does your problem differ?
What you ask is rather generic; there are many existing solutions that I'm sure will fit your problem. The above questions are some items that will help us (and you) to better understand and analyze the problem.
__________________
True happiness is not getting what you want, it's wanting what you've already got.

My Blog

Reply With Quote
  #4  
Old October 17th, 2007, 12:09 AM
tschneit tschneit is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Oct 2007
Posts: 7 tschneit User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 18 m 33 sec
Reputation Power: 0
I think this should cover those questions relatively well... I posted in the PHP forum as well, as this is more of a theory question than anything having to do with a particular language. Follow this link:
http://forums.devshed.com/php-development-5/set-based-search-481432.html


Thanks!

Reply With Quote
  #5  
Old October 17th, 2007, 07:43 AM
L7Sqr L7Sqr is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jan 2004
Location: Constant Limbo
Posts: 989 L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level)L7Sqr User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 2 Weeks 2 Days 22 h 45 m 6 sec
Reputation Power: 362
Send a message via AIM to L7Sqr
You speak of a problem closely related to the 'knapsack problem' (google for details) which is known to be NP-Hard.
Based on your replies in the other thread, I would suggest a Trie. A Trie will allow you (if you sort your sets) to 'tag' node that represent the end of a set. Once you have this, you can do a search from that point to see what comes close.
For instance, if you have a node in the tree that represents the end of a set (at depth n) you can look at all nodes that are at depth n + 1 and link to the end node mentioned above. That means they all contain only one more item than the set you have. You can also do the same from the parent node of your end point (depth n - 1) and you will have all sets that have an equal number of elements different by only one from your set. You can see that teh search expands exponentially, but with a good sort (*) you can find close matches fast when they exist.
Realize that this depends on the elements of your set as well. Some things (such as strings and numbers) lend themselves to this rather nicely, but objects in general may not fit the 'ordered' model.

(*) By sort I mean that the elements will have to be ordered by some standard (meaningful) method in your Trie. If you order elements in an inconvenient way you could end up with the search needing to return to the root of the Trie before following the closest match.

Reply With Quote
  #6  
Old October 17th, 2007, 12:11 PM
Scorpions4ever's Avatar
Scorpions4ever Scorpions4ever is offline
Banned ;)
Dev Shed God 9th Plane (9000 - 9499 posts)
 
Join Date: Nov 2001
Location: Woodland Hills, Los Angeles County, California, USA
Posts: 9,390 Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level) 
Time spent in forums: 1 Month 4 Weeks 1 Day 22 h 32 m 40 sec
Reputation Power: 4080
Another way would be to reorganize your data somewhat. Let's say you have the following sets (going by the example on the other thread):
A = {1, 2, 3}
B = {4, 3, 7}
C = {2, 8, 7}
D = {1, 2, 4}
E = {6, 2, 3}

And you want to find a set containing (1, 2, 3) (ideally, this would be A), but will also settle for 2 out of 3 matches (D and E). You could reorganize your data as a hash of arrays, such as:
1 => [A, D]
2 => [A, C, D, E],
3 => [A, B, E],
4 => [B, D]
6 => [E]
7 => [B, C]
8 => [C]

Then, when searching, you could find all the sets that 1, 2 and 3 belong to, by quickly looking up the hash. You could also increment counters and figure out which set has the maximum number of hits. After scanning like this, you'd end up with
Hits for A = 3
Hits for B = 1
Hits for C = 1
Hits for D = 2
Hits for E = 2

Hence you can pick A, D or E

By the way, you aren't trying to write some kind of search engine by any chance, are you?
__________________
Up the Irons
What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
"Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
Down with Sharon Osbourne

Reply With Quote
  #7  
Old October 17th, 2007, 12:55 PM
tschneit tschneit is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Oct 2007
Posts: 7 tschneit User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 18 m 33 sec
Reputation Power: 0
I like the trie idea. It seems like it could work, once I brush myself up on the operation I'll know a bit better.

I like the idea of hashing the sets to values. After picking all of the possible sets, an intersection could be performed to find the ones that use the most ingredients.

Still, I am worried about performance. I am writing in ruby (and rails), and I dont know if there is a way to keep a model loaded up in the background, on which I might run these searches. More pondering must be done

And yes, I am building some type of search engine, and I know I shouldn't reinvent the wheel. I've looked at what I could find for search implementations and I dont know if I can bend them to my will.

Thanks for your input folks!

Reply With Quote
  #8  
Old October 18th, 2007, 01:03 PM
rycamor rycamor is offline
Gödelian monster
Dev Shed Regular (2000 - 2499 posts)
 
Join Date: Jul 1999
Location: Central Florida, USA
Posts: 2,306 rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 3 Days 6 h 42 m 51 sec
Reputation Power: 60
Far be it from me to brutishly intervene, but if performance is a consideration, have you considered using a real set-based search tool, such as an SQL DBMS, perhaps?

If you want performance, Ruby is almost certainly not the language for such a thing. It excels at making things easy to manipulate, especially for pulling together an informal application, but it was never intended for processing large volumes of data.

And if you still want to deal with array/set notation, my favorite DBMS happens to have an array meta-type, and a very nice complement of array operators and functions (operators like "overlap" and "is contained by" allow you to treat the arrays like unordered sets). You can also create custom datatypes and operators, although you might just be best off sticking with normal relations and queries. And PostgreSQL is sure to be much faster at searching your data than anything except a custom C application.

Now, if the whole point of this is an intellectual exercise, or research project or somesuch, then that's another story, and I applaud your effort. But if you need something practical and production-ready, then you might want to consider using tools that are made for the purpose. Ruby happens to have a Postgres module, of course.
__________________
The real n-tier system:

FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

Amazon wishlist -- rycamor (at) gmail.com

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesRuby Programming > Set-Based Search

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap