March 5th, 2003, 10:46 PM

Hash function in C
Hi, I am attempting to make a hash table using ansi C. I've never made one before, but I think I have the general idea of how I will implement it.
All I want to know is what a really good Hash function would be to map strings to integers. I need to take a string (character array) of about 320 characters, and convert that into a number from 199. Does anyone have a fancy formula that could do this?
Another way I was thinking was to just map to _any_ number and then recursively make it smaller (divide by 2 or something better) until it is between 1 and 99.
It is really important that this function will produce a variety of numbers within 199. I don't want a lot of duplicate mapped values, but some is always okay as I have a way around that.
Also, if anyone has any other tips on how to implement a hash table in ansi C, I would love to hear it. Any sample code (even a whole implementation) would also be helpful. But mostly I just need help with the actually mapping of values.
Thanks a lot for any help,
Kevin
March 6th, 2003, 12:15 AM

You could use any of the following hash functions:
http://www.cs.yorku.ca/~oz/hash.html
>> Another way I was thinking was to just map to _any_ number and then recursively make it smaller (divide by 2 or something better) until it is between 1 and 99.
Or you could do a modulo division by 99 and use the remainder + 1 as your index. (i.e.)
index = (hash_val % 99) + 1;
Hope this helps!
March 6th, 2003, 07:21 AM

When you use modulus, it is much quicker to use a number that is a power of 2 (i.e. 2^n, n = integer), because you can simply use the & bitwise operator.
Let's say you use a hash function that needs a value of 0..127. Then if you have a number x that is 0..some large number, you can get it to 0..127 by doing this:
hashnumber = x % 128; // modulus 2^7
the faster way is:
hashnumber = x & 127; // 127 is one less than 128 (2^7)
remember that the modulus must be equal to 2^n, n = integer. Forgot to mention: % is actually doing a division  the slowest function you can ever ask your CPU to do. It is sloooooooow. Test it out, and you'll be amazed by the speed difference. Also, starting at 0 instead of 1 makes accessing easier, since you needn't add 1 everytime.
March 6th, 2003, 09:15 AM

Hey, thanks for all the help! I managed to find a pretty good hash function after looking for quite awhile. I'm not exactly sure how it all works together, but hey, it works great! Here's the code incase anyone is interested. (I'm not sure how well this board posts code, hopefully the indentation works.) My HASHSIZE will probably be a lot smaller than 997 (still needs to be prime though) since I will have many many hashes used.
/* The hash algorithm used in the UNIX ELF format for object files.
The input is a pointer to a string to be hashed */
#define HASHSIZE 997
unsigned long ElfHash( const unsigned char *name)
{
unsigned long h = 0, g;
static int M = HASHSIZE;
while (*name)
{
h = ( h << 4 ) + *name++;
if (g = h & 0xF0000000L) {h ^= g >> 24;}
h &= ~g;
}
return h % M;
}
It seems like this should be really quick since it's all logical operations and addition. Anyone have something more efficient? I'm pretty happy with this. I tested it with a bunch of input files and didn't get many duplicate maps at all. So I think I'll use it unless someone else can think of something better. Thanks again for the replies.
Kevin
March 6th, 2003, 09:19 AM

Oh, and I forgot to try out your suggestion for modulus. I will do that since I will be doing a _lot_ of hashes and division does take a long time. I remember in assembly language it took something like 40 clock cycles on the motorola 68hc11, when most other operations were less than 10 by far. Thanks for the tip!
Kevin
March 6th, 2003, 11:18 AM

kev123,
Modern compilers may actually convert a modulus to an and bitwise operator. But, don't bet on it. Do the test like I said, and you'll notice that division is like you said (40+ clock cycles) where a bitwise and is only 1. Quite the speed increase, huh? :)
Make HASHSIZE in your code some number that is 2^n, n=integer. Make it 1024, in your case, and this should work.
Then this line:
return h % M;
can change into:
return h & (M1);
Of course, it would be nice that M1 is already a constant, so it needs not decrease it by 1 each time. Since M is a const int and not a define, this may not be the case (I am not sure, and you should find out if you wish to do it this way).
March 6th, 2003, 12:25 PM

You might want to note that the & operation instead of % works only if
(a) Your denominator number is a multiple of 2
(b) Your numbers are positive.
For example 26 & 8 gives you 0, whereas 26 % 8 returns 2
So this technique will work provided the above two conditions are always true.
>> Modern compilers may actually convert a modulus to an and bitwise operator.
We had a discussion about this around a year ago (Quick algorithm to determine odd/even numbers) and gcc with optimization option O2 does this.
March 6th, 2003, 01:54 PM

I had already mentioned in my first post above that it works only with numbers that are powers of 2. But, good point about the negative numbers. My explanation was under the assumption of positive numbes, and I should have mentioned that.