Thread: hash table size

    #1
  1. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Location
    india
    Posts
    8
    Rep Power
    0

    hash table size


    hi,
    i have large number of elements (each element is a structure)
    each element is having a key value which is an integer 1,2,3,4,5.........maximum.
    The maximum may be 10000, 100000 or even higher..

    i want to have a hash table for these number of items.
    i want to know what will be the best or optimal hash table size for this problem?

    Say if the maximum is 100000 and the hash table size is 1000 ,
    then i can store the elements in linked lists using hash key index.
    the hash key index is (element key value % 1000) then
    the elements will be equally distributed in the hash table i.,e.,
    each hash key index has equal number of nodes (here 100)

    i 'm not sure about the optimality of this size.
    if the size is 10000 then there will be only 10 nodes in each list.

    can anyone tel me about the optimal size of the hash table?

    thank you,
  2. #2
  3. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    243
    If you are using C++ look up 'map'.

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  4. #3
  5. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Location
    india
    Posts
    8
    Rep Power
    0

    Smile hash table size


    no i'm using C language only..

    like i want to replace my large linked list into a hash table .
    the maximum elements may be 10000, 100000 or even higher..

    i will be using all the numbers 1,2,3...up to a maximum .(this number is a key value in my node)

    so i need to know wat wil be the optimum hash table size ?

    thank you,
  6. #4
  7. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    243
    If you are using sequential whole numbers as your key, why not simply allocate memory and then index into it? If you know that object 15 is located at array address 14 (remember that C starts at zero) then you can simply index into your array. Do something like this:

    Code:
    struct MyStruct ms = (MyStruct *) malloc(10000 * sizeof(MyStruct));
    /* error checking deleted */
    
    struct MyStruct *tmpStr = &ms[14];
    
    tmpStr->element1 = "whatever";
    The hash table is used for rapid access of information, you already seem to me to have the golden standard for one-step lookup.

    If you insist on the hash table, then you need to do research on the underlying theory behind them, try googling on "hash", "C", and "Theory".

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  8. #5
  9. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Location
    india
    Posts
    8
    Rep Power
    0

    hash table size


    hi,

    u r right!
    i can use simple arrays of structures.

    one thing here i dont know the maximum number of jobs(structures) in advance. so for this i have to allocate memory
    to each job structure when a job is actually coming( this is based on random times).

    and more ever i'm deleting the job which is no longer needed , by freeing that pointer.

    if i use 10000 nodes in linked list or an array of 10000 structures,
    what would be the optimum choice in between these?
    ( in memory and execution time wise to search a key job).

    explain this case clearly.. as i know little about the performance issues in the code...

    thank you.
  10. #6
  11. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    243
    Why do you need to dealocate the memory of each node? Simply reuse the memory or wait until the program exits. The code I gave you is for dynamic memory, just substitute a variable for the value '10000' (but be sure to check to see if the memory was allocated properly first, check your C documentation). As for memory utilization, you need to know what the size of your structure is (that is what the sizeof operator is for). Simply take that value (a simple one line program will tell you) and multiply it by the number of elements in your array. If that number is close to the amount of true RAM (do NOT count virtual memory) then you are risking paging (which will slow your program down by as much as 1000 times). However, as is likely on today's machines, you will be using just a tiny fraction of the available RAM and it won't matter.

    Without knowing more about your program I can only speculate, but I would suggest that you simply reuse your existing memory for each job as it comes in. If you only have one job at a time, then why do you need any linked list or array? If you know they only come in a few at a time (how long does it take to run a job?), just allocate enough memory to handle those jobs, plus say 20% (be sure to add error checking code for the case where too many jobs do come in). Linked lists are typically poor choices for compute bound programs, you can't get good localization of memory and wind up with a poor cache hit ratio (if this doesn't mean anything to you, don't worry, if you have a need for speed you will learn it soon).

    As for searching, there are lots of algorithms available but in C you either use third party libraries or roll your own. In C++ you get lots of great stuff in the STL that is highly optimized and all free. I did a roll my own once and indexed some 27 million values in memory (only 256 MB at that day, had to use lots of compression tricks) and the program ran so fast it was actually I/O bound.

    Unless you want to detail your requirments to me, my suggestion is to just allocate as much memory as you think you will need and use simple array indexing to do the 'lookups'. You will find nothing faster and most of the other tricks will also use lots of memory (i.e., the 'wasted' memory you allocate that doesn't get used by your program could just as easily be 'wasted' inside the hash structures).

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  12. #7
  13. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Location
    india
    Posts
    8
    Rep Power
    0

    re-using memory


    hi,

    i did n't get you regarding re-usage of the memory?

    some details in my program:

    a job (structure) comes randomly say at times t1,t2,t3..

    so at time t1 , i'm allocating memory using malloc to the job and appending at the end of the job linked lsit( or wil be the first one if the list is empty).

    before t2 occurs.
    i am searching a key job , to know the information of the job( one of the jobs which is in the list already). i mean at time t1 there may be any number of jobs. this i can know by a varaible A(as whenever a job comes this varaiable A is incremented)

    in my program i/m having lot of search calls..

    Instead of using linked lists, i will try with simple dynamic arrays as u said. ( in this there is no deletion is required)

    thank you,
  14. #8
  15. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    243
    Once a job is done, reuse that memory for a different incomming job. If you never have more than a few dozen jobs at any given instance, a simple scroll through an array of job IDs will be fast. Allocating and deallocating memory is an expensive process, it is best to minimize the number of times you have to do so in code that is performance oriented.

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  16. #9
  17. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Location
    india
    Posts
    8
    Rep Power
    0

    dynamic arrays of pointers


    hi,

    i wuld like to know how to use dynamic memory allocation ,

    as i said earlier the job(structure) comes randomly in the program,
    so i 've to allocate memory to each one.

    and i have to store them in an array of pointers?

    is there any limit for the size ? as if i use dynamically array of pointers to structure? and is deletion is required?

    as the array size may go upto 10000000..or even higher?

    thank you,
  18. #10
  19. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    243
    This is dynamic memory allocation with error and bounds checking:

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    /* you can have other structs in your structs */
    struct MyStruct{
        int element1;
        char element2[10];
        float element3; 
    };
    
    
    int main(){
        struct MyStruct *ms, *tmpStr;
        int size = 1000; /*this would be filled by your code */
        int memsize, whichStruct = 14;
    
        fprintf(stderr, "sizeof(struct MyStruct): %d bytes\n", sizeof(struct MyStruct));
        memsize = size * sizeof(struct MyStruct); /* gets the total number of bytes needed */
        fprintf(stderr, "size (%d) * sizeof(struct MyStruct) = %d bytes\n\n", size, memsize);
    
        memsize = size * sizeof(struct MyStruct); /* gets the total number of bytes needed */
        ms = (struct MyStruct *) malloc(memsize);
        /* if you want to be sure the memory is clean (some OSs will do it for you)
           use calloc(1, memsize) which will set all values to binary zero (NULL).
           you can also clear the memory with memset */
        if (ms ==NULL){
            fprintf(stderr, "Unable to allocate %d bytes on the heap for pointer 'ms'\n", memsize);
            exit(1);
        }
    
        if (whichStruct < 0 || whichStruct >= size){
            fprintf(stderr, "whichStruct (%d) is out of bounds!\n", whichStruct);
            exit(1);
        }
    
        tmpStr = &ms[whichStruct]; /* point at a 'random' location in array */
    
        tmpStr->element1 = 1;
        strcpy(tmpStr->element2, "bob");
        tmpStr->element3 = 1.901f;
    
        printf("before:\n");
        printf("\telement1: %d\n", tmpStr->element1);
        printf("\telement2: %s\n", tmpStr->element2);
        printf("\telement3: %f\n", tmpStr->element3);
    
        /* this is what I mean by reusing the same memory */
        tmpStr->element1 = 2;
        strcpy(tmpStr->element2, "fred");
        tmpStr->element3 = 9.021f;
    
        printf("after:\n");
        printf("\telement1: %d\n", tmpStr->element1);
        printf("\telement2: %s\n", tmpStr->element2);
        printf("\telement3: %f\n", tmpStr->element3);
    
        free(ms); /* deallocate memory (exiting the program will also do this, but it is bad form to rely on that) */
    
        return 0;
    }
    Unless you have to preserve your job structures for some reason for the duration of the program (if it is just for auditing, write the data to a file before you are finished), you should not have to keep allocating memory. However, if you know you will not have more than, say, 10,000 jobs before the program exits, you can save yourself a lot of headache (presuming you have the memory) by just allocating memory for the lot and not reusing it. If this is a prototype program, I would not worry about memory efficiency now, get the rest of it working flawlessly and then come back and attempt to iron out the details.

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw

IMN logo majestic logo threadwatch logo seochat tools logo