The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages
> C Programming
|
Hash function in C
Discuss Hash function in C in the C Programming forum on Dev Shed. Hash function in C C programming forum discussing all C derivatives, including C#, C++, Object-C, and even plain old vanilla C. These languages are low level languages, and used on projects such as device drivers, compilers, and even whole computer operating systems.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

March 5th, 2003, 10:46 PM
|
|
Junior Member
|
|
Join Date: Mar 2003
Posts: 5
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
Hash function in C
Hi, I am attempting to make a hash table using ansi C. I've never made one before, but I think I have the general idea of how I will implement it.
All I want to know is what a really good Hash function would be to map strings to integers. I need to take a string (character array) of about 3-20 characters, and convert that into a number from 1-99. Does anyone have a fancy formula that could do this?
Another way I was thinking was to just map to _any_ number and then recursively make it smaller (divide by 2 or something better) until it is between 1 and 99.
It is really important that this function will produce a variety of numbers within 1-99. I don't want a lot of duplicate mapped values, but some is always okay as I have a way around that.
Also, if anyone has any other tips on how to implement a hash table in ansi C, I would love to hear it. Any sample code (even a whole implementation) would also be helpful. But mostly I just need help with the actually mapping of values.
Thanks a lot for any help,
Kevin
|

March 6th, 2003, 12:15 AM
|
 |
Banned ;)
|
|
Join Date: Nov 2001
Location: Woodland Hills, Los Angeles County, California, USA
|
|
You could use any of the following hash functions:
http://www.cs.yorku.ca/~oz/hash.html
>> Another way I was thinking was to just map to _any_ number and then recursively make it smaller (divide by 2 or something better) until it is between 1 and 99.
Or you could do a modulo division by 99 and use the remainder + 1 as your index. (i.e.)
index = (hash_val % 99) + 1;
Hope this helps!
|

March 6th, 2003, 07:21 AM
|
 |
jasondoucette.com
|
|
Join Date: Feb 2003
Location: Canada
Posts: 378

Time spent in forums: 7 h 23 m 8 sec
Reputation Power: 11
|
|
|
When you use modulus, it is much quicker to use a number that is a power of 2 (i.e. 2^n, n = integer), because you can simply use the & bitwise operator.
Let's say you use a hash function that needs a value of 0..127. Then if you have a number x that is 0..some large number, you can get it to 0..127 by doing this:
hashnumber = x % 128; // modulus 2^7
the faster way is:
hashnumber = x & 127; // 127 is one less than 128 (2^7)
remember that the modulus must be equal to 2^n, n = integer. Forgot to mention: % is actually doing a division - the slowest function you can ever ask your CPU to do. It is sloooooooow. Test it out, and you'll be amazed by the speed difference. Also, starting at 0 instead of 1 makes accessing easier, since you needn't add 1 everytime.
|

March 6th, 2003, 09:15 AM
|
|
Junior Member
|
|
Join Date: Mar 2003
Posts: 5
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
Hey, thanks for all the help! I managed to find a pretty good hash function after looking for quite awhile. I'm not exactly sure how it all works together, but hey, it works great! Here's the code incase anyone is interested. (I'm not sure how well this board posts code, hopefully the indentation works.) My HASHSIZE will probably be a lot smaller than 997 (still needs to be prime though) since I will have many many hashes used.
/* The hash algorithm used in the UNIX ELF format for object files.
The input is a pointer to a string to be hashed */
#define HASHSIZE 997
unsigned long ElfHash( const unsigned char *name)
{
unsigned long h = 0, g;
static int M = HASHSIZE;
while (*name)
{
h = ( h << 4 ) + *name++;
if (g = h & 0xF0000000L) {h ^= g >> 24;}
h &= ~g;
}
return h % M;
}
It seems like this should be really quick since it's all logical operations and addition. Anyone have something more efficient? I'm pretty happy with this. I tested it with a bunch of input files and didn't get many duplicate maps at all. So I think I'll use it unless someone else can think of something better. Thanks again for the replies.
Kevin
|

March 6th, 2003, 09:19 AM
|
|
Junior Member
|
|
Join Date: Mar 2003
Posts: 5
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
Oh, and I forgot to try out your suggestion for modulus. I will do that since I will be doing a _lot_ of hashes and division does take a long time. I remember in assembly language it took something like 40 clock cycles on the motorola 68hc11, when most other operations were less than 10 by far. Thanks for the tip!
Kevin
|

March 6th, 2003, 11:18 AM
|
 |
jasondoucette.com
|
|
Join Date: Feb 2003
Location: Canada
Posts: 378

Time spent in forums: 7 h 23 m 8 sec
Reputation Power: 11
|
|
kev123,
Modern compilers may actually convert a modulus to an and bitwise operator. But, don't bet on it. Do the test like I said, and you'll notice that division is like you said (40+ clock cycles) where a bitwise and is only 1. Quite the speed increase, huh?
Make HASHSIZE in your code some number that is 2^n, n=integer. Make it 1024, in your case, and this should work.
Then this line:
return h % M;
can change into:
return h & (M-1);
Of course, it would be nice that M-1 is already a constant, so it needs not decrease it by 1 each time. Since M is a const int and not a define, this may not be the case (I am not sure, and you should find out if you wish to do it this way).
|

March 6th, 2003, 12:25 PM
|
 |
Banned ;)
|
|
Join Date: Nov 2001
Location: Woodland Hills, Los Angeles County, California, USA
|
|
You might want to note that the & operation instead of % works only if
(a) Your denominator number is a multiple of 2
(b) Your numbers are positive.
For example -26 & 8 gives you 0, whereas -26 % 8 returns -2
So this technique will work provided the above two conditions are always true.
>> Modern compilers may actually convert a modulus to an and bitwise operator.
We had a discussion about this around a year ago ( http://forums.devshed.com/showthrea...&threadid=29843) and gcc with optimization option -O2 does this.
|

March 6th, 2003, 01:54 PM
|
 |
jasondoucette.com
|
|
Join Date: Feb 2003
Location: Canada
Posts: 378

Time spent in forums: 7 h 23 m 8 sec
Reputation Power: 11
|
|
|
I had already mentioned in my first post above that it works only with numbers that are powers of 2. But, good point about the negative numbers. My explanation was under the assumption of positive numbes, and I should have mentioned that.
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|