#1
  1. pogremar
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2003
    Location
    At Work
    Posts
    958
    Rep Power
    13

    Reading a whole file at a time


    Is there a way to read a whole text file into a variable at a time?

    keywords
    C, vc++ 6
    Some day I'll create a smart quote to put here.
  2. #2
  3. ASP.Net MVP
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Aug 2003
    Location
    WI
    Posts
    4,378
    Rep Power
    1511
    yes, but it's complicated and not really what you're thinking
    (changing the default buffersize to something larger than the file, and having the variable be one huge character array also larger than the file, reading x number of bytes, and then flushing the buffer. Even then, you'd need to have a pretty good idea of the file size in advance, it'd need to be fairly small, and you'd have to treat your variable just like you would a file anyway)
    Primary Forum: .Net Development
    Holy cow, I'm now an ASP.Net MVP!

    [Moving to ASP.Net] | [.Net Dos and Don't for VB6 Programmers]

    http://twitter.com/jcoehoorn
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Feb 2001
    Posts
    1,481
    Rep Power
    15
    yes, but it's complicated...

    As far as I can tell what f'lar said isn't correct. You can use the string function getline() to read in multiple lines of data by specifying a delimiter as the third parameter. If you specify a delimiter like '=' that won't be found in your file, then getline() will read in the whole file. The limitation on the size of a string variable declared like this:

    string data;

    is data.max_size() which on my computer is

    4294967293

    and you'd have to treat your variable just like you would a file anyway

    A string variable can be searched much more easily than a text file. Try it and see if it will work for you.
    Last edited by 7stud; August 26th, 2003 at 05:02 PM.
  6. #4
  7. Contributing User

    Join Date
    Aug 2003
    Location
    UK
    Posts
    5,114
    Rep Power
    1803
    1) Get the filesize using FileGetFileSize(): http://msdn.microsoft.com/library/de...etfilesize.asp

    2) Allocate that number of bytes (use malloc() or new).

    3) Read that many bytes into the buffer.

    Since I have suggested using The Win32 GetFileSize() function, it makes sense to use Win32 API for the other file handling (since you have already obtained a file handle for this function). Here is a reference for Win32 File I/O: http://msdn.microsoft.com/library/de..._functions.asp

    Unfortunately, there is no function in the ANSI C or C++ libraries for directly obtaining the file size.

    Note that that there is a limit on the size of file you can read like this. XP can allocate about 1.8Gb in a single block (although that will resort to using virtual memory on most systems!). Other versions of Windows may have different limits, and the limit may depend on virtual memory settings.

    Clifford
    Last edited by clifford; August 26th, 2003 at 05:07 PM.
  8. #5
  9. Contributing User

    Join Date
    Aug 2003
    Location
    UK
    Posts
    5,114
    Rep Power
    1803
    Originally posted by 7stud
    The limitation on the size of a string variable declared like this:

    string data;

    is data.max_size() which on my computer is

    4294967293
    That's (2^32)-3 - three bytes less that the maximum filesize in NTFS (4Gb) - not much of a limitation in most applications! Reading such a file into memory would use virtual memory (i.e. the file system), so there would be no benefit.

    Clifford
  10. #6
  11. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    244
    You can also memory map the file, that is probably much safer in any case as you directly rely on the OS to manage what pages are in memory. If you try to hoover in a file that is much bigger than your RAM you will start paging your *** off and it may take an hour to load the entire thing, then as soon as you start to access the first part you will start paging again.

    If you want to try, this is ANSI C and works for me:

    [edit] I forgot to open the file in binary mode, it only works on text files without it.[/edit]

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    int main(){
        FILE *fin;
        unsigned char * data;
        size_t size, dataPtr;
        int intChr;
        char fileName[] = "your.file";
    
        if ((fin = fopen(fileName, "rb")) == NULL){
            fprintf(stderr, "Can't open file %s\n", fileName);
            exit(1);
        }
        if (fseek(fin, 0, SEEK_END)){
            fprintf(stderr, "failure in fseek, to end of file\n");
            fclose(fin);
            exit(1);
        }
        size = ftell(fin);
        rewind(fin);
        fprintf(stderr, "File size is %d\n", size);
    
        /* allocate memory for file */
        if ((data = (unsigned char *) calloc((size), sizeof(unsigned char))) == NULL){
            fprintf(stderr, "Failure in malloc\n");
            fclose(fin);
            exit(1);
        }
    
        /* read data from file into array, this could be done more efficiently */
        dataPtr = 0;
        while ((intChr = fgetc(fin)) != EOF){
            data[dataPtr++] = (unsigned char) intChr;
        }
        printf("Read %d bytes into array\n", dataPtr);
    
        /* do something interesting with the file */
    
        free(data);
        fclose(fin);
        return 0;
    }
    Last edited by mitakeet; August 26th, 2003 at 06:26 PM.

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  12. #7
  13. Contributing User

    Join Date
    Aug 2003
    Location
    UK
    Posts
    5,114
    Rep Power
    1803
    Mitakeet: Neat solution to getting file size in ANSI C. I'd be tempted to wrap that up in a function for re-use. Good point about opening file in binary mode.

    Memory mapped files on the other hand would be the flexible method, and may be more efficient in some cases (large files, frequent updates), but relies on OS services, so is less portable (which is not a problem if you only ever code for Windows).

    Details of memory mapped files here: http://msdn.microsoft.com/library/de...n_manamemo.asp

    Kubicon: Take your choice. If your files are reasonably small, and thre is not neet to update the file often, I'd go with Mitakeet's ANSI C solution. If the file is large, or if you will be constantly updating the file, then the memory mapped solution may be simpler and more efficient.

    Clifford
  14. #8
  15. pogremar
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2003
    Location
    At Work
    Posts
    958
    Rep Power
    13
    WOW! thanks alot guys for the many responses. I will try them all.

    FYI's:

    1) I'm trying to make a parser(a spider, actually) that's able to search for Regular Expressions in html files(mines). The program would search for the <meta description> and <meta keyword> (and possible others) tags and put their contents in a file so that I could then put the content in a database-- for search engine purposes. Right now, this is totally for skill learning and fun, but hopefully it will blosson into something I could use. Also, 'cause I'm a web designer and it would be nice to have a tool like that at my disposal. I sort of accomplished it. I'm using the pcre regular expression library. The only problem I have is that the <meta> tag needs have to be in one line, because I'm using the line read function. This is no good because html specifications says that contents and tags can have white space and new lines appy to them before you close the tag, thus if you had someting like this:

    <meta name=keywords
    contents="some goes here keyword blah>

    then my program would not work properly. So I figure, if I could just read the whole file into a variable, I could apply the regular expressions to the whole file instead of having a loop that iterates through each line until the end is reached, applying the regular expression matching on each iteration.

    I like to be critiqued, specially when I doing something totally morronic. If any of you guys think that they way that I'm about to do my "regular expression matching/value extration" now that I know how to read a whole file is bad, please don't hesitate to call me on it.


    2) I will try to code the program as close to ansi C as possible so that I'm able to code it in linux as well.

    3) since the program is not real time-- it will run scheduled at sometime during off peak hours-- it's ok if it's a little slow, as long as it works properly.

    4) If anyone wants any of my(possible crappy) source code, just let me know and I'll be happy to give it to you.

    thanks
    Some day I'll create a smart quote to put here.
  16. #9
  17. Contributing User

    Join Date
    Aug 2003
    Location
    UK
    Posts
    5,114
    Rep Power
    1803
    HTML files are typically small, I see no problem with this.

    Clifford
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2002
    Location
    Flint, MI
    Posts
    328
    Rep Power
    12
    You folks are all taking the long way around. This bit of code is supported on at least all UNIX and most Windows compilers:

    Code:
    #include <sys/stat.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <errno.h>
    
    char* snarf_file(char* filename) {
    
        struct stat sb;
        char* buffer;
        FILE* handle;
    
        if (stat(filename, &sb)) {
            perror("snarf_file");
            return NULL;
        }
    
        buffer = (char*)calloc(1, sb.st_size + 1);
        handle = fopen(filename, "r");
        fread((void*)buffer, 1, sb.st_size, handle);
        fclose(handle);
    
        return buffer;
    
    }
    Clay Dowling
    Lazarus Notes
    Articles and commentary on web development
    http://www.lazarusid.com/notes/

IMN logo majestic logo threadwatch logo seochat tools logo