### Thread: Hash function in C

1. No Profile Picture
Junior Member
Devshed Newbie (0 - 499 posts)

Join Date
Mar 2003
Posts
5
Rep Power
0

#### Hash function in C

Hi, I am attempting to make a hash table using ansi C. I've never made one before, but I think I have the general idea of how I will implement it.

All I want to know is what a really good Hash function would be to map strings to integers. I need to take a string (character array) of about 3-20 characters, and convert that into a number from 1-99. Does anyone have a fancy formula that could do this?

Another way I was thinking was to just map to _any_ number and then recursively make it smaller (divide by 2 or something better) until it is between 1 and 99.

It is really important that this function will produce a variety of numbers within 1-99. I don't want a lot of duplicate mapped values, but some is always okay as I have a way around that.

Also, if anyone has any other tips on how to implement a hash table in ansi C, I would love to hear it. Any sample code (even a whole implementation) would also be helpful. But mostly I just need help with the actually mapping of values.

Thanks a lot for any help,

Kevin
2. You could use any of the following hash functions:
http://www.cs.yorku.ca/~oz/hash.html

>> Another way I was thinking was to just map to _any_ number and then recursively make it smaller (divide by 2 or something better) until it is between 1 and 99.
Or you could do a modulo division by 99 and use the remainder + 1 as your index. (i.e.)
index = (hash_val % 99) + 1;

Hope this helps!
3. When you use modulus, it is much quicker to use a number that is a power of 2 (i.e. 2^n, n = integer), because you can simply use the & bitwise operator.

Let's say you use a hash function that needs a value of 0..127. Then if you have a number x that is 0..some large number, you can get it to 0..127 by doing this:

hashnumber = x % 128; // modulus 2^7

the faster way is:

hashnumber = x & 127; // 127 is one less than 128 (2^7)

remember that the modulus must be equal to 2^n, n = integer. Forgot to mention: % is actually doing a division - the slowest function you can ever ask your CPU to do. It is sloooooooow. Test it out, and you'll be amazed by the speed difference. Also, starting at 0 instead of 1 makes accessing easier, since you needn't add 1 everytime.
4. No Profile Picture
Junior Member
Devshed Newbie (0 - 499 posts)

Join Date
Mar 2003
Posts
5
Rep Power
0
Hey, thanks for all the help! I managed to find a pretty good hash function after looking for quite awhile. I'm not exactly sure how it all works together, but hey, it works great! Here's the code incase anyone is interested. (I'm not sure how well this board posts code, hopefully the indentation works.) My HASHSIZE will probably be a lot smaller than 997 (still needs to be prime though) since I will have many many hashes used.

/* The hash algorithm used in the UNIX ELF format for object files.
The input is a pointer to a string to be hashed */

#define HASHSIZE 997

unsigned long ElfHash( const unsigned char *name)
{
unsigned long h = 0, g;

static int M = HASHSIZE;

while (*name)
{
h = ( h << 4 ) + *name++;

if (g = h & 0xF0000000L) {h ^= g >> 24;}

h &= ~g;
}

return h % M;
}

It seems like this should be really quick since it's all logical operations and addition. Anyone have something more efficient? I'm pretty happy with this. I tested it with a bunch of input files and didn't get many duplicate maps at all. So I think I'll use it unless someone else can think of something better. Thanks again for the replies.

Kevin
5. No Profile Picture
Junior Member
Devshed Newbie (0 - 499 posts)

Join Date
Mar 2003
Posts
5
Rep Power
0
Oh, and I forgot to try out your suggestion for modulus. I will do that since I will be doing a _lot_ of hashes and division does take a long time. I remember in assembly language it took something like 40 clock cycles on the motorola 68hc11, when most other operations were less than 10 by far. Thanks for the tip!

Kevin
6. kev123,

Modern compilers may actually convert a modulus to an and bitwise operator. But, don't bet on it. Do the test like I said, and you'll notice that division is like you said (40+ clock cycles) where a bitwise and is only 1. Quite the speed increase, huh? :)

Make HASHSIZE in your code some number that is 2^n, n=integer. Make it 1024, in your case, and this should work.

Then this line:
return h % M;
can change into:
return h & (M-1);

Of course, it would be nice that M-1 is already a constant, so it needs not decrease it by 1 each time. Since M is a const int and not a define, this may not be the case (I am not sure, and you should find out if you wish to do it this way).
7. You might want to note that the & operation instead of % works only if
(a) Your denominator number is a multiple of 2