C Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesC Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
1200+ fellow developers rate and compare features of the top IDEs, like Visual Studio, Eclipse, RAD, Delphi and others, across 13 categories. Enjoy this FREE Download of the IDE User Satisfaction Study by Evans Data Corporation. Download Now!
  #1  
Old March 5th, 2003, 10:46 PM
kev123 kev123 is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Posts: 5 kev123 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Hash function in C

Hi, I am attempting to make a hash table using ansi C. I've never made one before, but I think I have the general idea of how I will implement it.

All I want to know is what a really good Hash function would be to map strings to integers. I need to take a string (character array) of about 3-20 characters, and convert that into a number from 1-99. Does anyone have a fancy formula that could do this?

Another way I was thinking was to just map to _any_ number and then recursively make it smaller (divide by 2 or something better) until it is between 1 and 99.

It is really important that this function will produce a variety of numbers within 1-99. I don't want a lot of duplicate mapped values, but some is always okay as I have a way around that.

Also, if anyone has any other tips on how to implement a hash table in ansi C, I would love to hear it. Any sample code (even a whole implementation) would also be helpful. But mostly I just need help with the actually mapping of values.

Thanks a lot for any help,

Kevin

Reply With Quote
  #2  
Old March 6th, 2003, 12:15 AM
Scorpions4ever's Avatar
Scorpions4ever Scorpions4ever is offline
Banned ;)
Dev Shed God 5th Plane (7000 - 7499 posts)
 
Join Date: Nov 2001
Location: Glendale, Los Angeles County, California, USA
Posts: 7,432 Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level) 
Time spent in forums: 4 Weeks 1 Day 22 h 29 m 51 sec
Reputation Power: 784
You could use any of the following hash functions:
http://www.cs.yorku.ca/~oz/hash.html

>> Another way I was thinking was to just map to _any_ number and then recursively make it smaller (divide by 2 or something better) until it is between 1 and 99.
Or you could do a modulo division by 99 and use the remainder + 1 as your index. (i.e.)
index = (hash_val % 99) + 1;

Hope this helps!

Reply With Quote
  #3  
Old March 6th, 2003, 07:21 AM
Jason Doucette's Avatar
Jason Doucette Jason Doucette is offline
jasondoucette.com
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2003
Location: Canada
Posts: 378 Jason Doucette User rank is Private First Class (20 - 50 Reputation Level)Jason Doucette User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 7 h 23 m 8 sec
Reputation Power: 6
When you use modulus, it is much quicker to use a number that is a power of 2 (i.e. 2^n, n = integer), because you can simply use the & bitwise operator.

Let's say you use a hash function that needs a value of 0..127. Then if you have a number x that is 0..some large number, you can get it to 0..127 by doing this:

hashnumber = x % 128; // modulus 2^7

the faster way is:

hashnumber = x & 127; // 127 is one less than 128 (2^7)

remember that the modulus must be equal to 2^n, n = integer. Forgot to mention: % is actually doing a division - the slowest function you can ever ask your CPU to do. It is sloooooooow. Test it out, and you'll be amazed by the speed difference. Also, starting at 0 instead of 1 makes accessing easier, since you needn't add 1 everytime.

Reply With Quote
  #4  
Old March 6th, 2003, 09:15 AM
kev123 kev123 is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Posts: 5 kev123 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Hey, thanks for all the help! I managed to find a pretty good hash function after looking for quite awhile. I'm not exactly sure how it all works together, but hey, it works great! Here's the code incase anyone is interested. (I'm not sure how well this board posts code, hopefully the indentation works.) My HASHSIZE will probably be a lot smaller than 997 (still needs to be prime though) since I will have many many hashes used.

/* The hash algorithm used in the UNIX ELF format for object files.
The input is a pointer to a string to be hashed */

#define HASHSIZE 997

unsigned long ElfHash( const unsigned char *name)
{
unsigned long h = 0, g;

static int M = HASHSIZE;

while (*name)
{
h = ( h << 4 ) + *name++;

if (g = h & 0xF0000000L) {h ^= g >> 24;}

h &= ~g;
}

return h % M;
}


It seems like this should be really quick since it's all logical operations and addition. Anyone have something more efficient? I'm pretty happy with this. I tested it with a bunch of input files and didn't get many duplicate maps at all. So I think I'll use it unless someone else can think of something better. Thanks again for the replies.

Kevin

Reply With Quote
  #5  
Old March 6th, 2003, 09:19 AM
kev123 kev123 is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Posts: 5 kev123 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Oh, and I forgot to try out your suggestion for modulus. I will do that since I will be doing a _lot_ of hashes and division does take a long time. I remember in assembly language it took something like 40 clock cycles on the motorola 68hc11, when most other operations were less than 10 by far. Thanks for the tip!

Kevin

Reply With Quote
  #6  
Old March 6th, 2003, 11:18 AM
Jason Doucette's Avatar
Jason Doucette Jason Doucette is offline
jasondoucette.com
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2003
Location: Canada
Posts: 378 Jason Doucette User rank is Private First Class (20 - 50 Reputation Level)Jason Doucette User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 7 h 23 m 8 sec
Reputation Power: 6
kev123,

Modern compilers may actually convert a modulus to an and bitwise operator. But, don't bet on it. Do the test like I said, and you'll notice that division is like you said (40+ clock cycles) where a bitwise and is only 1. Quite the speed increase, huh?

Make HASHSIZE in your code some number that is 2^n, n=integer. Make it 1024, in your case, and this should work.

Then this line:
return h % M;
can change into:
return h & (M-1);

Of course, it would be nice that M-1 is already a constant, so it needs not decrease it by 1 each time. Since M is a const int and not a define, this may not be the case (I am not sure, and you should find out if you wish to do it this way).

Reply With Quote
  #7  
Old March 6th, 2003, 12:25 PM
Scorpions4ever's Avatar
Scorpions4ever Scorpions4ever is offline
Banned ;)
Dev Shed God 5th Plane (7000 - 7499 posts)
 
Join Date: Nov 2001
Location: Glendale, Los Angeles County, California, USA
Posts: 7,432 Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level) 
Time spent in forums: 4 Weeks 1 Day 22 h 29 m 51 sec
Reputation Power: 784
You might want to note that the & operation instead of % works only if
(a) Your denominator number is a multiple of 2
(b) Your numbers are positive.

For example -26 & 8 gives you 0, whereas -26 % 8 returns -2

So this technique will work provided the above two conditions are always true.

>> Modern compilers may actually convert a modulus to an and bitwise operator.
We had a discussion about this around a year ago (http://forums.devshed.com/showthrea...&threadid=29843) and gcc with optimization option -O2 does this.

Reply With Quote
  #8  
Old March 6th, 2003, 01:54 PM
Jason Doucette's Avatar
Jason Doucette Jason Doucette is offline
jasondoucette.com
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2003
Location: Canada
Posts: 378 Jason Doucette User rank is Private First Class (20 - 50 Reputation Level)Jason Doucette User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 7 h 23 m 8 sec
Reputation Power: 6
I had already mentioned in my first post above that it works only with numbers that are powers of 2. But, good point about the negative numbers. My explanation was under the assumption of positive numbes, and I should have mentioned that.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesC Programming > Hash function in C


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway