October 2nd, 2013, 08:17 AM
Best Data Structure for a Common Problem -- storing large strings/packets
I am storing a queue of large data elements (call them 'clobs' or packets, each of them will be about 50-60 bytes) keyed by timestamp (guaranteed to be unique), and wanted to know the best way to do this (or if it's a feasible thing to do) in C/C++.
Basically, I have a bunch of packets (can think of them as all being 50-60 byte char arrays; all ) coming in, with a guaranteed unique key of timestamp. I want to store them in FIFO order (order of increasing timestamp), and want two kinds of operations to be really fast:
1. Retrieve the packets with the lowest 3 timestamps (so valueofMinKey(), 3 times), and remove them from the queue (basically a priority queue)
2. Occasionally, we will jump into a packet by some *alternate* key (say, first 5 bits of the packet, which is also guaranteed to be unique for a subset of the elements) and change the rest of its data, *without* modifying the timestamp (basically I need a secondary key that maps into the packets, allowing me to change the rest of its data at will without modifying its primary key).
What's the best way of doing this in C++? Should I keep 2 tables?.. or perhaps store a map from long ints (timestamps) to char*'s, or string*'s but at the same time keep another fixed array of 32 string*'s pointing to the same objects which I can modify ever so often? Is there a cleaner, faster design to maintain a priority queue which has a secondary (albeit small, 5 to 6-bit key, which can therefore have a maximum of only 32 or 64 different values)?
October 2nd, 2013, 12:49 PM
I would suggest a simple std::list to begin with, and get something working.
Before you start inventing exotic data structures, you'll need
- a finished program
- some real world test data
- and a profiler.
Only then can you work out where the real hot spots are (and not some random guess - these are almost always way off base).
Having found a hotspot, you can make careful small changes to improve things. Since you have a working program and some test data, you can quickly determine
- whether your change still works
- whether it is an improvement
If it's a simple change with a big win, you'll probably want to keep it.
If it's a complicated change (think on-going maintenance) for little benefit, you might want to discard it.