#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    27
    Rep Power
    0

    Hashing table for strings.


    I'm working on some hashing implementation for strings. But I'm having some issues:
    EDIT: Just got the implementation working, but it's exceeding the limits of time or memory used.

    I'm using double hashing for collisions, the hash functions are:


    Code:
    unsigned long djb2(unsigned char *str) {
      unsigned long hash = 5381;
      int c;
      
      while ((c = *str++))
        hash = ((hash << 5) + hash) + c; // hash * 33 + c
      
      return hash;
    }
    This was given, but I had to create a second hash function to determine what would be the next avaliable element in the array.

    This is how an insertion looks like:
    Code:
        //m is the table size
        position = hash1(string) % m
        while table[position]{
            position = (hash1(string) + i * hash2(string)) % m
            i++;
        } //at this point, we found an empty position( = 0)
        table[position] = string (using strcpy);
    So whenever I search for a key, I use a similar loop, looking for the positions the key should be until I find an 0 or the key itself.

    How should the hash2 function look like? I was thinking about using the same as djb2 but using "<<3" insetead of "<<5" would it be ok?

    I can easily see problems when the table has like 1500 keys and 1600 slots, where an insertion could take a lot of time.

    Any way I can solve this?
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    May 2004
    Posts
    3,417
    Rep Power
    887
    When calloc succeeds at giving you a buffer, all the memory therein (every single byte) will be zeroed.

    You're going to run into lots of problems with matrix of characters approach. Probably better to have an array of pointers to strings.

    Beware of scanf buffer overruns! The way you are using it, if someone enters a string that is longer than 40 bytes, the additional characters will overwrite your stack and your next less will be how to use the debugger.
    I no longer wish to be associated with this site.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    May 2004
    Posts
    3,417
    Rep Power
    887
    And an array of char is not a "string". It's an array of char or a buffer. If you get in the habit of calling buffers strings, you'll get confused at some point and get another lesson in the debugger.
    I no longer wish to be associated with this site.
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    27
    Rep Power
    0
    Originally Posted by jwdonahue
    When calloc succeeds at giving you a buffer, all the memory therein (every single byte) will be zeroed.

    You're going to run into lots of problems with matrix of characters approach. Probably better to have an array of pointers to strings.

    Beware of scanf buffer overruns! The way you are using it, if someone enters a string that is longer than 40 bytes, the additional characters will overwrite your stack and your next less will be how to use the debugger.
    Sorry, I was editing the OP while you responded.
    I ended up using the matrix and it seems to work, the issue I'm having now, is related to efficiency. It's taking too much time when the table is almost filled.

IMN logo majestic logo threadwatch logo seochat tools logo