#1
  1. pogremar
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2003
    Location
    At Work
    Posts
    958
    Rep Power
    13

    arrays or classes?


    Greetings all,

    I'm currently working on a command line app(for windows) than seaches from the current directory down recursively and garhers information on files. Currently, the only information is the file location. Is really simple as I'm not a C expert. Ultimately the purpose of the program is to search the directories, see which files are html and php, open them and extract some info such as keywords and description and put them in a file.

    Again, at this point the only info I'm interested in gathering is the address of the file, eg. c:\test\foo.html, since I figure once I'm able to do that I can expand on it.

    I've been able to do it, but in a way that I know is not very efficient. Basically, the program searches recursively, if the program finds an html file it writes the location of that file to another file called files_list.txt. Obiously this is not good because if we have 1000 html files the program would have to open files_list.txt, write to the file, and close the file 1000 times.
    it works but it's ugly.

    What I figure what I could do is make a multi-dimensional array. Add the locations to the array. When The program finishes, the array could be written to file just once. This would cut work time significantly. Since in windows xp the MAX_PATH variable (the maximum size of a file location) is 260 chars, if I wanted to make an array that could hold 100,000 file locations I would do something like:
    char locations[100,000][260];
    This is bad cause the program would need 260 megs! to hold 100,000 file locations. And imagine if I wanted to put other info such as description. It is unlikely that I'm gonna have 100,000 pages, but I would like the program to be able to handle it.

    Another solution would be to create a variable only when the program finds a file, if I could do something like this(I don't know if it's possible, but I looked for it and can't find information)

    // create multi-dimensional array with on entry

    char locations[1][260];

    // if the program finds a file then do something like
    array_push("c:\test\foo.html", locations);

    this would expand the array by one and put "c:\test\foo.hmtl" at that location

    I program in php and things like this are possible but I don't think they are possible in C.

    My question is, does anyone has something equivalent to this. I don't want to create fixed size arrays.

    thanks
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    56
    Rep Power
    12
    Instead of closing the file, pass in its file handle.

    Your function header would look something like this:
    recursiveFind(char* path, ofstream resultsFile);
  4. #3
  5. pogremar
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2003
    Location
    At Work
    Posts
    958
    Rep Power
    13
    ok, but my main question(the last one). What about that?
  6. #4
  7. No Profile Picture
    .
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Posts
    296
    Rep Power
    12
    I program in php and things like this are possible but I don't think they are possible in C.
    it's perfectly possible, you've just got to do a bit more work i think. i don't know the ins and out of them, but: dynamic arrays. that's what you're after.

    http://www.eskimo.com/~scs/C-faq/q6.16.html

    http://www.hermetic.ch/cfunlib/arrays/arrays.htm

    and extract some info such as keywords and description
    just for your information for later (this doesn't help with what you're after right now) i'd really recomend regular expressions (or regex) for this. looks like alien algebra, but fantastic for *extracting* text and information from text and information.
    Last edited by balance; July 3rd, 2003 at 06:28 PM.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    56
    Rep Power
    12
    Oh ya, you need to pass ifstream and ofstream handles by reference also.

    recursiveFind(char* path, ofstream &resultsFile);

    And you can allocate arrays on the fly, but in your case you'd have to create a new array and copy the old array into it, then delete the old array. There is no means to expand an array.

    A linked list would work perfectly for this type of thing though.
  10. #6
  11. No Profile Picture
    .
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Posts
    296
    Rep Power
    12
    you'd have to create a new array and copy the old array into it, then delete the old array
    or an alternatively to that, create one large static array of pointers to char arrays - from what you mention: charlocations[100,000]. bearing in mind 32 bit machines have 4 byte pointers and if your calculation above is correct that's 4mb. not so bad as 260mb. then dynamacally allocate char array[260];'s as and when needed and stick their pointers into the base array. i don't know. just a suggestion.

    or work on getting charlocations[100,000] dynamic aswell maybe?

    oh yeah, also when allocating dynamic arrays like this, you don't have to stick to a fixed maximum size. if the path happens to be only 20 character long, then that's how large you allocate an array for.
    Last edited by balance; July 3rd, 2003 at 08:04 PM.
  12. #7
  13. pogremar
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2003
    Location
    At Work
    Posts
    958
    Rep Power
    13
    thanks alot guys. I'm sure what I'm looking for is here. I just have to do some experimentation.
  14. #8
  15. Left due to despotic ad-min
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jun 2003
    Posts
    1,044
    Rep Power
    14
    Another choice is to used a linked list.

    The following is pseudo-C code to illustrate the idea.

    typedef struct _MyFile
    {
    char filename[80];
    struct _Myfile *next;
    } MyFile;

    MyFile *GetNextEntry()
    {
    /* search through directory to find filename you wish to record */

    if (found_an_entry)
    {
    Myfile *retval = malloc(sizeof MyFile);
    strcpy(MyFile->filename, found_filename);
    MyFile->next = NULL;
    }
    else
    {
    return NULL;
    }

    }


    int main()
    {
    MyFile *list = NULL, *last = NULL;

    while (!finished)
    {
    MyFile *temp = GetNextEntry();
    if (list == NULL)
    {
    list = last = temp;
    }
    else
    {
    last->next = temp;
    if (temp != NULL) last = temp;
    }
    }

    /* And then to write out the list .... */

    MyFile *temp = list;
    while (temp != NULL)
    {
    /* output temp->filename to your file */

    temp = temp->next;
    }

    return 0;
    }


    The advantage of this is that you only allocate as much memory as you need, and you don't need to traverse your directory tree to work out how much you need.
  16. #9
  17. pogremar
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2003
    Location
    At Work
    Posts
    958
    Rep Power
    13
    I heard of linked list before, but since I didn't have use for them at that moment I didn't learn them. As I was working out this problem, something in my head told me that linked list would probably work for this, even though I wasn't sure what they were. It just sounded like it would be a good option.
    I'm gonna look up some articles on linked list.

    thanks
  18. #10
  19. not a fan of fascism (n00b)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Feb 2003
    Location
    ct
    Posts
    2,756
    Rep Power
    95
    i think you should look up sparse tables. i have never used them, but i remember in my CS course we talked about them being a good choice when you possibly had a huge amount of data but may have a small amount also, which seems to be your situation.
  20. #11
  21. Left due to despotic ad-min
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jun 2003
    Posts
    1,044
    Rep Power
    14
    Originally posted by infamous41md
    i think you should look up sparse tables. i have never used them, but i remember in my CS course we talked about them being a good choice when you possibly had a huge amount of data but may have a small amount also, which seems to be your situation.
    I'm not sure sparse tables would be applicable in this case. Sparse tables are more for cases where there are big arrays with lots of repeated values. For example, if you have several million data values, and a large number of them will have zero values, than a sparse table may be a tool for you .....

IMN logo majestic logo threadwatch logo seochat tools logo