#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2010
    Posts
    22
    Rep Power
    0

    Allocating constant space/memory for big hash.


    Hi all,
    I am not sure if i will define my question correctly, but i will try to.
    i am bulding a big hash table... (~10GB)
    it takes hours to do this..
    i have to execute a program that use this hash very often..

    is there any solusion to make this hash constant in the memory and just to use pointer/reference to this blocks in the memory??
    Or, any other solution?
    i thought on background script that run all the time.. but its not good and wasteful..

    Thanks!
    Pap
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    832
    Rep Power
    496
    Hmm, not sure, but you could possibly try to serialize your data. That is, store on disk the built hash, so that you don't have to rebuild it each time, but just load it into memory. I should probably be faster. The Storable and Json modules might give you some ideas.
  4. #3
  5. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,259
    Rep Power
    1810
    Sounds like a job for a database.
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2010
    Posts
    22
    Rep Power
    0
    Originally Posted by keath
    Sounds like a job for a database.
    Can you be more specific?
    Thanks
  8. #5
  9. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,259
    Rep Power
    1810
    A hash is an indexed data structure. A database is an indexed file.

    Your complaint was that it takes a long time to build this large hash structure each time you want to run a program. The solution is to build the structure once, and have it saved to disk so it is ready to go anytime you need to use it.

    A hash has a unique key with a value assigned to it. Would work exactly the same in a database; you need a unique identifier, and then all the attributes you want to assign to that data-point.

    Code:
    CREATE TABLE  article (
      article_id char(50) primary key,
      article_text text NOT NULL
    );
    It would be easy to map your hash directly to a table in that way. One script used to create the database, and others to do the lookup.

    Code:
    SELECT article_text FROM article where article_id = 'Mets lose again';
    (Not a good choice for an article key. Needs to be unique.)

    The point is that you already have guaranteed unique keys in your hash, so you can just loop through it and write them to the database table without much concern of conflict.

    You haven't shared any data, but you could also re-write the script to read though the data one line at a time and write it into the table for each row, so you will never be consuming that much memory on the machine.

    There are several good open source databases to choose from:

    SQLite
    PostgreSQL
    MariaDB

    I actually prefer the first two. MariaDB is the new version of MySQL, which is very popular but has different "engines" you need to choose from. Some of the engines do not behave with the proper constraints one would want with in a database, so I'm not a fan.

IMN logo majestic logo threadwatch logo seochat tools logo