Perhaps alphabetizing and keeping a separate index file that keeps track of where sections start/end? That way if you're seaking for a "S" record it doesn't have to search the entire file. (I wouldn't suggest breaking it down simply by first letter though because there are quite a few more words that start with S than X, Y, or Z. Perhaps grouping letters might help there but I digress...)
Unfortunately by doing this, you incur some overhead in insertions to the database. The question is which is the operation that will be used most often. If the searching is the most mission critical portion of your code, optimizing the search at the detriment of insertions and other ops, is not only a good idea, it's a must.
Another option is to apply a hash to search terms (word by word)as well as the text in the record (again word by word.) A few bit shifts and a bunch of integer comparisons are _NEVER_ going to hit you nearly as hard as any sort of string comparison. A good string hashing function is the "Dragon Book Hash":
Code:
int hash(char *s){
char* ss = s;
unsigned int h = 0, g;
for (; *ss != '\0"; ss++)
{
h = (h << 4) + *ss;
if (g = h & 0xf0000000)
{
h ^= g >> 24;
h ^= g;
}
}
return h % table_size;
}
Generally it's used to hash strings for symbol tables in compilers, but in your case this might help. Granted you'll have to store the hashed values of the records somewhere incurring overhead in storage. Overhead in insertions too.
This would seem the best. It's going to knock down on the size of the search space really quickly. One thing to guard against is that this function doesn't produce unique integers for every input string, but its going to return fewer "hits" to look over on the second pass than re-searching things that match the first two letters.
*shrugs* That's just my $0.02 though...