June 11th, 2013, 12:55 PM
Allocating constant space/memory for big hash.
I am not sure if i will define my question correctly, but i will try to.
i am bulding a big hash table... (~10GB)
it takes hours to do this..
i have to execute a program that use this hash very often..
is there any solusion to make this hash constant in the memory and just to use pointer/reference to this blocks in the memory??
Or, any other solution?
i thought on background script that run all the time.. but its not good and wasteful..
June 11th, 2013, 01:55 PM
Hmm, not sure, but you could possibly try to serialize your data. That is, store on disk the built hash, so that you don't have to rebuild it each time, but just load it into memory. I should probably be faster. The Storable and Json modules might give you some ideas.
June 11th, 2013, 02:45 PM
Sounds like a job for a database.
June 16th, 2013, 05:20 AM
Can you be more specific?
Originally Posted by keath
June 16th, 2013, 08:30 AM
A hash is an indexed data structure. A database is an indexed file.
Your complaint was that it takes a long time to build this large hash structure each time you want to run a program. The solution is to build the structure once, and have it saved to disk so it is ready to go anytime you need to use it.
A hash has a unique key with a value assigned to it. Would work exactly the same in a database; you need a unique identifier, and then all the attributes you want to assign to that data-point.
It would be easy to map your hash directly to a table in that way. One script used to create the database, and others to do the lookup.
CREATE TABLE article (
article_id char(50) primary key,
article_text text NOT NULL
(Not a good choice for an article key. Needs to be unique.)
SELECT article_text FROM article where article_id = 'Mets lose again';
The point is that you already have guaranteed unique keys in your hash, so you can just loop through it and write them to the database table without much concern of conflict.
You haven't shared any data, but you could also re-write the script to read though the data one line at a time and write it into the table for each row, so you will never be consuming that much memory on the machine.
There are several good open source databases to choose from:
I actually prefer the first two. MariaDB is the new version of MySQL, which is very popular but has different "engines" you need to choose from. Some of the engines do not behave with the proper constraints one would want with in a database, so I'm not a fan.