#1
  1. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2003
    Posts
    5
    Rep Power
    0

    Hash function in C


    Hi, I am attempting to make a hash table using ansi C. I've never made one before, but I think I have the general idea of how I will implement it.

    All I want to know is what a really good Hash function would be to map strings to integers. I need to take a string (character array) of about 3-20 characters, and convert that into a number from 1-99. Does anyone have a fancy formula that could do this?

    Another way I was thinking was to just map to _any_ number and then recursively make it smaller (divide by 2 or something better) until it is between 1 and 99.

    It is really important that this function will produce a variety of numbers within 1-99. I don't want a lot of duplicate mapped values, but some is always okay as I have a way around that.

    Also, if anyone has any other tips on how to implement a hash table in ansi C, I would love to hear it. Any sample code (even a whole implementation) would also be helpful. But mostly I just need help with the actually mapping of values.

    Thanks a lot for any help,

    Kevin
  2. #2
  3. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,593
    Rep Power
    4207
    You could use any of the following hash functions:
    http://www.cs.yorku.ca/~oz/hash.html

    >> Another way I was thinking was to just map to _any_ number and then recursively make it smaller (divide by 2 or something better) until it is between 1 and 99.
    Or you could do a modulo division by 99 and use the remainder + 1 as your index. (i.e.)
    index = (hash_val % 99) + 1;

    Hope this helps!
  4. #3
  5. jasondoucette.com
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2003
    Location
    Canada
    Posts
    378
    Rep Power
    12
    When you use modulus, it is much quicker to use a number that is a power of 2 (i.e. 2^n, n = integer), because you can simply use the & bitwise operator.

    Let's say you use a hash function that needs a value of 0..127. Then if you have a number x that is 0..some large number, you can get it to 0..127 by doing this:

    hashnumber = x % 128; // modulus 2^7

    the faster way is:

    hashnumber = x & 127; // 127 is one less than 128 (2^7)

    remember that the modulus must be equal to 2^n, n = integer. Forgot to mention: % is actually doing a division - the slowest function you can ever ask your CPU to do. It is sloooooooow. Test it out, and you'll be amazed by the speed difference. Also, starting at 0 instead of 1 makes accessing easier, since you needn't add 1 everytime.
  6. #4
  7. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2003
    Posts
    5
    Rep Power
    0
    Hey, thanks for all the help! I managed to find a pretty good hash function after looking for quite awhile. I'm not exactly sure how it all works together, but hey, it works great! Here's the code incase anyone is interested. (I'm not sure how well this board posts code, hopefully the indentation works.) My HASHSIZE will probably be a lot smaller than 997 (still needs to be prime though) since I will have many many hashes used.

    /* The hash algorithm used in the UNIX ELF format for object files.
    The input is a pointer to a string to be hashed */

    #define HASHSIZE 997

    unsigned long ElfHash( const unsigned char *name)
    {
    unsigned long h = 0, g;

    static int M = HASHSIZE;

    while (*name)
    {
    h = ( h << 4 ) + *name++;

    if (g = h & 0xF0000000L) {h ^= g >> 24;}

    h &= ~g;
    }

    return h % M;
    }


    It seems like this should be really quick since it's all logical operations and addition. Anyone have something more efficient? I'm pretty happy with this. I tested it with a bunch of input files and didn't get many duplicate maps at all. So I think I'll use it unless someone else can think of something better. Thanks again for the replies.

    Kevin
  8. #5
  9. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2003
    Posts
    5
    Rep Power
    0
    Oh, and I forgot to try out your suggestion for modulus. I will do that since I will be doing a _lot_ of hashes and division does take a long time. I remember in assembly language it took something like 40 clock cycles on the motorola 68hc11, when most other operations were less than 10 by far. Thanks for the tip!

    Kevin
  10. #6
  11. jasondoucette.com
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2003
    Location
    Canada
    Posts
    378
    Rep Power
    12
    kev123,

    Modern compilers may actually convert a modulus to an and bitwise operator. But, don't bet on it. Do the test like I said, and you'll notice that division is like you said (40+ clock cycles) where a bitwise and is only 1. Quite the speed increase, huh? :)

    Make HASHSIZE in your code some number that is 2^n, n=integer. Make it 1024, in your case, and this should work.

    Then this line:
    return h % M;
    can change into:
    return h & (M-1);

    Of course, it would be nice that M-1 is already a constant, so it needs not decrease it by 1 each time. Since M is a const int and not a define, this may not be the case (I am not sure, and you should find out if you wish to do it this way).
  12. #7
  13. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,593
    Rep Power
    4207
    You might want to note that the & operation instead of % works only if
    (a) Your denominator number is a multiple of 2
    (b) Your numbers are positive.

    For example -26 & 8 gives you 0, whereas -26 % 8 returns -2

    So this technique will work provided the above two conditions are always true.

    >> Modern compilers may actually convert a modulus to an and bitwise operator.
    We had a discussion about this around a year ago (Quick algorithm to determine odd/even numbers) and gcc with optimization option -O2 does this.
  14. #8
  15. jasondoucette.com
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2003
    Location
    Canada
    Posts
    378
    Rep Power
    12
    I had already mentioned in my first post above that it works only with numbers that are powers of 2. But, good point about the negative numbers. My explanation was under the assumption of positive numbes, and I should have mentioned that.

IMN logo majestic logo threadwatch logo seochat tools logo