Software Design
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming Languages - MoreSoftware Design

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old February 28th, 2003, 06:31 AM
Petskull's Avatar
Petskull Petskull is offline
Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2002
Posts: 12 Petskull User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Send a message via ICQ to Petskull
Searching for a Search Algorithm

I'm trying to implement a search function into a flat file database system I'm writing in C. I need to find a search algorithm to perform this search.

I was thinking that the obvious way was to search for the string in each record, one-by-one, and return the records that matched. However, this looks a bit inefficient and 'brute force' to me.

I was also thinking that an alternative could be to search for the first couple of letters of my 'search words' and search for the rest from the list of records that matched *that*.

Reply With Quote
  #2  
Old March 1st, 2003, 07:27 PM
PostHuman006's Avatar
PostHuman006 PostHuman006 is offline
God is just a statistic
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Location: Anywhere and everywhere
Posts: 25 PostHuman006 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Aphabetize?

Perhaps alphabetizing and keeping a separate index file that keeps track of where sections start/end? That way if you're seaking for a "S" record it doesn't have to search the entire file. (I wouldn't suggest breaking it down simply by first letter though because there are quite a few more words that start with S than X, Y, or Z. Perhaps grouping letters might help there but I digress...)

Unfortunately by doing this, you incur some overhead in insertions to the database. The question is which is the operation that will be used most often. If the searching is the most mission critical portion of your code, optimizing the search at the detriment of insertions and other ops, is not only a good idea, it's a must.

Another option is to apply a hash to search terms (word by word)as well as the text in the record (again word by word.) A few bit shifts and a bunch of integer comparisons are _NEVER_ going to hit you nearly as hard as any sort of string comparison. A good string hashing function is the "Dragon Book Hash":

Code:
int hash(char *s){ 
    char* ss = s; 
    unsigned int h = 0, g; 
    for (; *ss != '\0"; ss++) 
    { 
        h = (h << 4) + *ss; 
        if (g = h & 0xf0000000) 
        { 
            h ^= g >> 24; 
            h ^= g; 
        } 
    } 
    return h % table_size; 
} 


Generally it's used to hash strings for symbol tables in compilers, but in your case this might help. Granted you'll have to store the hashed values of the records somewhere incurring overhead in storage. Overhead in insertions too.

This would seem the best. It's going to knock down on the size of the search space really quickly. One thing to guard against is that this function doesn't produce unique integers for every input string, but its going to return fewer "hits" to look over on the second pass than re-searching things that match the first two letters.

*shrugs* That's just my $0.02 though...

Reply With Quote
  #3  
Old March 7th, 2003, 02:50 AM
operator smooth operator smooth is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Posts: 61 operator smooth User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
Just want to note that what you call "The dragon book hash" is actually a modified "multiply by 31" hashing algo. The multiplication

h = h*31 + *ss;

has been optimized to:

h = (h << 5) - h + *ss;

Which optimized further becomes:

h = (h << 5) + *ss;
if (g = h & 0xf0000000)
{
h ^= g >> 24;
h ^= g;
}

I have made some experiments, and they have shown a better distribution using shift by 5 instead of shift by 4.

Reply With Quote
  #4  
Old March 7th, 2003, 02:56 AM
operator smooth operator smooth is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Posts: 61 operator smooth User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
The actual way, you can make your search "thingy", is to read a file into a data structure, which uses a hash table for every record field, you need to search by.

If your search program can load the file at startup, it is not a problem, as the overhead of indexing will only occur at startup.

Once loaded, the data can be manipulated inside your data structure and re-written to file when your program needs to save them.

Reply With Quote
  #5  
Old March 7th, 2003, 08:54 AM
PostHuman006's Avatar
PostHuman006 PostHuman006 is offline
God is just a statistic
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Location: Anywhere and everywhere
Posts: 25 PostHuman006 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Touche

Good choice....

I don't know why I didn't think of that before. Some sort of Symbol Table is definately the answer.

I have a good optimized one laying around somewhere if you're interested.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreSoftware Design > Searching for a Search Algorithm


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 1 hosted by Hostway
Stay green...Green IT