#1
  1. No Profile Picture
    matthewdoucette.com
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2002
    Posts
    635
    Rep Power
    13

    on-disc binary tree?


    (in regards to my previous thread on storing millions of IPs to disc:
    limitations of a HD? (using numerous directories & files to split-up/sort databases))

    Can I create a binary tree that is stored on disc? Then only modify the section(s) of the file to modify the binary tree as each node is added? I always thought you had to read in the entire file and re-write the entire file to disk whenver a change is made, but this is not the case, is it?
    Matthew Doucette / Xona.com
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Oct 2000
    Location
    Back in the real world.
    Posts
    5,966
    Rep Power
    190
    IMHO
    you would probably want to convert all pointers in your structs to relative pointers (= "-5 records from here" or "+2 records from here") and write it to disk like that.
    on reading the tree you would back-convert.

    but for sure, there is no way to insert data in the middle of a file, so it would need a re-organization then-and-when ("defrag" ;) ) or some care on loading the file.

    never tried this though.... but for my recent project, 3D worlds, i'll need binary trees too in the near future. This will be my first approach until i get my a** up on reading about "BSP trees" that from my knowledge right now do take care of exactly this problem...

    ... just 2ect for my thoughts on this ...
  4. #3
  5. No Profile Picture
    matthewdoucette.com
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2002
    Posts
    635
    Rep Power
    13
    Originally posted by M.Hirsch
    IMHOthere is no way to insert data in the middle of a file, so it would need a re-organization then-and-when ("defrag" ;) ) or some care on loading the file.
    I am only storing IPs, so there will only be nodes added and nodes modified (increment counters on each IP), no nodes deleted. This removes the need for defrag. This tree will also be completely reset each day, even though it may grow to be 1 or 2 million nodes in size.
    Matthew Doucette / Xona.com
  6. #4
  7. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,616
    Rep Power
    4247
    You could also allocate a file that is large enough to hold all the possible entries. Then all you need to do is compute the offset for an entry and increment the counter. This may take up a lot more space (32 MB), but if you have plenty of hard disk space, this isn't such a bad plan. Besides, you don't need to implement a binary tree at all with this method. All you need to do is compute the offset for a given IP address and fseek(3) to that area of the file. It's quick, simple and gets the job done!
  8. #5
  9. No Profile Picture
    matthewdoucette.com
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2002
    Posts
    635
    Rep Power
    13
    Good idea, but it does not allow me to store anything but an increment counter. I may be storing additional information with each IP, like the referrer perhaps.

    Interesting idea though. Assuming I wanted to store only an increment, and only increment up to 255, it would take 256^4 * 1 byte / 1024 / 1024 / 1024 = 4 GB discspace, that is quite a lot.

    There are probably IP ranges that do not exist however. Anyone know what they are?
    Matthew Doucette / Xona.com
  10. #6
  11. No Profile Picture
    matthewdoucette.com
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2002
    Posts
    635
    Rep Power
    13
    Another idea:

    A 256 branch, 4-Level, b-tree. The IPs would be "stored" in the location of their data. Each IP is four nodes of one byte each. The left nodes are the only nodes which store the IP's data.
    Last edited by Doucette; February 20th, 2003 at 02:40 PM.
    Matthew Doucette / Xona.com
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2003
    Location
    Right Coast
    Posts
    25
    Rep Power
    0
    You should be able to use fixed-length records (esp with IP addresses) to make a random access file?

    J :)
  14. #8
  15. No Profile Picture
    matthewdoucette.com
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2002
    Posts
    635
    Rep Power
    13
    Originally posted by ahuimanu
    You should be able to use fixed-length records (esp with IP addresses) to make a random access file?

    J :)
    This is exactly what I plan on doing. I just have never coded anything in any language where I do not write out an entire file. Is there anything special in coding a program that modifies only the middle of a file, for example?
    Matthew Doucette / Xona.com
  16. #9
  17. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,616
    Rep Power
    4247
    Nothing to it really. Just open the file in read-write mode, seek to the offset and write your data. Here's some sample code for you to peruse:
    Code:
    FILE *fp;
    if ((fp = fopen("test.txt", "r+w")) == NULL) {
        fprintf(stderr, "Could not open test.txt\n");
        return(1);
    } 
    
    /* Compute where we want to put data */
    offset = ComputeOffset(); 
    
    /* Seek to offset bytes from beginning of file */
    fseek(fp, offset, SEEK_SET); 
    
    /* Write to the file */
    fputs(data, fp);
    
    fclose(fp);
    The only thing you need to worry about is the size of the file. Like you said, it's going to be huge :)

IMN logo majestic logo threadwatch logo seochat tools logo