#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2010
    Posts
    2
    Rep Power
    0

    Defragmentation Algorithm


    Hi,
    I have written a C program to simulate the microcomputer (CP/M) file system. The program itself completes the following functions:

    i. Initialise Disk create and initialise disk control structures
    ii. List files in the directory; the directory is initially empty
    iii. Display the free bitmap blocks

    Basically I am looking to add another function which allows the user to defragment the created file system. I am not looking for the source code itself, but some assistance in understanding the logic of the required algorithm.

    It would be most appreciated if someone/anyone could possibly describe or maybe explain in very rough pseudo-code the underlying logic required for the defragmentation of the created file system.
  2. #2
  3. No Profile Picture
    I haz teh codez!
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2003
    Posts
    2,547
    Rep Power
    2337
    See here as well. Thanks for giving it a few hours before posting elsewhere.
    I ♥ ManiacDan & requinix

    This is a sig, and not necessarily a comment on the OP:
    Please don't be a help vampire!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2010
    Posts
    2
    Rep Power
    0
    Yes, it's called finding other means of assistance. You're welcome.
  6. #4
  7. No Profile Picture
    I haz teh codez!
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2003
    Posts
    2,547
    Rep Power
    2337
    Yes, it's so very rare to find someone doing things the right way as you did.

    You have a pretty complex question, so it's not hard to understand the need to garner extra assistance. Personally I haven't the slightest idea how to help you, but hopefully someone else will. I posted a link to the other thread so that someone doesn't spit the same links back at you.
    I ♥ ManiacDan & requinix

    This is a sig, and not necessarily a comment on the OP:
    Please don't be a help vampire!
  8. #5
  9. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,607
    Rep Power
    4247
    I've always looked at defragging as a type of sorting algorithm.
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  10. #6
  11. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,607
    Rep Power
    4247
    I think I have a good idea of how to do this. I'll bang out some simulation code when I get home.
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  12. #7
  13. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,607
    Rep Power
    4247
    Ok, here's what I've come up with so far. I'm not really familiar with the CP/M filesystem, but I'll assume that it is similar to DOS 1.0's file system (which was originally written as a CP/M clone, so it should be fairly close to it.)

    In DOS 1.0, there was no concept of subdirectories, so there was a section of the disk called the "root directory" and another one called the FAT (file allocation table). Actually there are two copies of the FAT, but that's not important here.

    The way it worked is that in the root directory, there would be fixed-size entries for filenames, date, filesize etc. and the starting cluster number of the file on the disk. For instance:
    FOO.TXT 10
    BAR.TXT 18

    Here it says that FOO.TXT's first cluster of data starts at cluster #10 on the disk. Now we go to the FAT. The FAT is just a simple set of entries, sort of like an array. Since we know that FOO.TXT starts at cluster 10 from the root directory entry, we read the first cluster. Then we go to the FAT and look at element #10 in there and read the value. Let's say the value there is 12. That means the next part of the file is in cluster 12. Then we go to the FAT element #12 and read the value there. Let's say it is hex FF, i.e. 255 in decimal. Then we know that there is no more data to be read. Basically, DOS's filesystem reserves a few magic values to mean special things in the FAT. If I recall correctly, hex FF is end of the chain, hex FE means that this cluster is currently free and not used by any file, FD meant that this cluster was marked bad during formatting and shouldn't be used to save data (I may be off on the values).

    We will use similar logic in our simulated defragger, except we don't really need to work with such small cluster #s. We could assume that we have a lot more clusters in our FS. So clearly, we need to build a couple of structures in memory to hold this information.
    Code:
    struct {
        std::string filename;
        size_t       first_cluster;
    } root_dir_entry_t
    We need to create an array or a vector of root_dir_entry_t to hold all the entries of the root directory. Then we need another struct to hold FAT entries. For ease of manipulation, we'll keep a couple of extra elements.
    Code:
    struct {
        size_t next_cluster;
        size_t prev_cluster;
        int     prev_cluster_type;
    }
    Here, prev_cluster contains the index of the prev. cluster entry and prev_cluster_type indicates if the prev. entry is on the FAT or on the root directory.

    Once we've got an array of these set up, the solution becomes very easy.
    First, we work our way backwards through the FAT and find a cluster not currently in use. This is our work cluster for holding temp data.
    We start with the first root dir entry and try to put its first cluster into the first cluster on the FS. We do the move as follows:
    Code:
    Check if the dest. cluster is already in use.
    If dest. cluster is not in use:
         move source cluster to dest. cluster
         update FAT or root dir entry to point to the dest. cluster #.
    If dest. cluster is in use:
        move data from dest cluster to temp cluster and update FAT entries
        move source cluster data to dest cluster and update FAT entries
        move temp cluster data to source cluster and update FAT entries
    
    Now increment the dest. cluster # and continue.
    Will bang out some actual code to simulate this in a little while.
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  14. #8
  15. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,607
    Rep Power
    4247
    Ok I've got this sucker working. Wrote my version in C++. Output looks like this:
    Code:
    mb@motorhead:~/cpp/defragger$ ./defragger 
    Before Defrag
    Size of FAT table: 128
    ============================================
    file1.txt 10 13 17 12
    ============================================
    file2.txt 11 15
    
    Starting defrag of file: file1.txt
    Entry is unused 10 => 9
    Copy 10 data to 9
    Entry is unused 13 => 10
    Copy 13 data to 10
    Need to swap data around 17 => 11
    Copy 11 data to 128
    Copy 17 data to 11
    Copy 128 data to 17
    No need to copy 12
    Done defragging file: file1.txt
    
    Starting defrag of file: file2.txt
    Entry is unused 17 => 13
    Copy 17 data to 13
    Entry is unused 15 => 14
    Copy 15 data to 14
    Done defragging file: file2.txt
    
    After Defrag
    Size of FAT table: 128
    ============================================
    file1.txt 9 10 11 12
    ============================================
    file2.txt 13 14
    Basically, before it defrags, you can see what clusters it says file1.txt and file2.txt use. Then I told it to defrag the file system using cluster 9 as the starting cluster to begin from. As you can see, it moved everything correctly and at the end of the run, it prints the cluster layout again and they are in order starting from cluster 9.

    It is basically a sort of linked list problem. I haven't checked for a couple of error conditions in my code (e.g. what happens if you need to use a new work cluster since you've moving the source data to the cluster that happens to be the work cluster. This is just a couple more lines of code though).
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2011
    Posts
    1
    Rep Power
    0
    Can you post the c++ code ? I need it to complete something for college.

    Thank you! If you dont want to share here on forum , please be that good and send me on the private.

    Otherwise i will use your pseudocode, if you dont mind.

    Comments on this post

    • ptr2void disagrees : Do your own damn homework!
    • clifford disagrees : This was not even your thread you cheeky git. Too lazy to do the work and to as a reasonable question!

IMN logo majestic logo threadwatch logo seochat tools logo