The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages
> C Programming
|
size of file
Discuss size of file in the C Programming forum on Dev Shed. size of file C programming forum discussing all C derivatives, including C#, C++, Object-C, and even plain old vanilla C. These languages are low level languages, and used on projects such as device drivers, compilers, and even whole computer operating systems.
|
|
 |
|
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

December 4th, 2004, 06:46 PM
|
|
Registered User
|
|
Join Date: Nov 2004
Posts: 21
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
size of file
how do you get the number of entries in a file in c, not c++? this is what my file looks like:
Doe
John
vw
corrado
1993
4500.00
Smith
Jack
vw
thing
1974
6500.00
I want it to count the number of entries. With this example there are 12. i want to get this number as a variable in my program
|

December 4th, 2004, 06:58 PM
|
|
Contributing User
|
|
Join Date: Jun 2003
Posts: 705
Time spent in forums: 7 m 27 sec
Reputation Power: 11
|
|
|
Actually, in my view, there are two entries, perhaps 14 or 13 lines, 12 lines of non-whitespace text.
This is a data format, in a text stream, and the grouping suggests two entries of 6 lines.
For C, I'd suggest fopen in the text mode.
Then, using fgets, loop through each line of text until the file is exhausted.
As you load each line, you'll need check each buffer for a "non-whitespace" character (or, perhaps more specifically, note when any character fails a whitespace test). Whitespace includes tabs, line returns, carriage returns, spaces - stuff that doesn't print.
Each line which has a non-whitespace character causes you to increment a counter - which ends up being the variable you wanted.
|

December 4th, 2004, 07:15 PM
|
|
Registered User
|
|
Join Date: Nov 2004
Posts: 21
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
it would be fine if i could get the two entries. if i had to test whitespace how would i do that?
|

December 4th, 2004, 07:15 PM
|
 |
Lord of Dorkness
|
|
Join Date: Jan 2004
Location: Central New York. Texan via Arizona, out of his element!
|
|
|
I'll climb up on a personal-hangup stump and disagree about text mode. "Text mode" is actually a Windows thing and doesn't really relate to textual encoding of the bytes at all. On the way out to the file it adds a '\r' for each '\n'. At the end of file it adds an 'EOF' byte, 0x1A, ctrl-Z. On input it strips the '\r's and detects and strips FROM INPUT the ctrl-Z as an eof. If one appends to the file, the append is BEYOND the ctrl-Z and subsequent reads in text mode will not penetrate past that point. Because of these more or less useless machinations (except in the context of line-wrapping mechanisms conditioned specifically for them), the ability to use positioning operations such as seek and tell is difficult and must be conducted with abnormally stringent requirements. Unix and its variants do not have a text mode. The language (I believe) requires that they not barf if they encounter an argument such as "rb", even though, for them, it is exactly equivalent to "r". Unfortunately, "text mode" is the Windows default. One must specifically call for binary or set the global variable _fmode for binary operation.
__________________
Functionality rules and clarity matters; if you can work a little elegance in there, you're stylin'.
If you can't spell "u", "ur", and "ne1", why would I hire you? 300 baud modem? Forget I mentioned it.
DaWei on Pointers Politically Incorrect.
|

December 4th, 2004, 07:21 PM
|
|
Registered User
|
|
Join Date: Nov 2004
Posts: 21
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
how do i go about doing this then? in a not so complicated way
|

December 4th, 2004, 08:09 PM
|
 |
Lord of Dorkness
|
|
Join Date: Jan 2004
Location: Central New York. Texan via Arizona, out of his element!
|
|
|
I'd do it precisely as Jason suggests, but I'd open the file in binary mode. I'd use fgets with the standard delimiter, a newline. I'd consider every fetch from fgets to be one field except for the 'empty' line; that, I'd consider to be a record separator. I'd perform some validation on input. If I didn't get 6 fields per record, I'd do something: flag an error and store a partial record, or perhaps toss the record entirely and try to resync. It's really one of the simplest formats you're likely to come up against. There are examples in the post, Commonly Asked Questions, regarding how to open files and read and write from them.
I'd define a structure that had a member for each field and store the input in that. I'd have a read function that read and stuffed an entire record; it would beach and complain if things didn't come out right.
|

December 4th, 2004, 08:18 PM
|
 |
Contributing User
|
|
Join Date: Jun 2003
Location: Baltimore, MD
Posts: 229
Time spent in forums: 2 h 13 m 58 sec
Reputation Power: 11
|
|
Quote: | Unix and its variants do not have a text mode. |
???
what do you mean?
fscanf allows to input formated text(formated binary), and fread to input raw binary(without need to format).
form the Linux Programmer’s Manual
1/
NAME: scanf, fscanf, sscanf, vscanf, vsscanf, vfscanf - input format conversion
int scanf(const char *format, ...);
DESCRIPTION: The scanf family of functions scans input according to a format as described below.../snip/
2/
NAME: fread, fwrite - binary stream input/output
DESCRIPTION: The function fread reads nmemb elements of data, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr..../snip/
text is a format for binary data (like ASCII). actually, in electrical engineering we call it character coding (binary data with alpha numeric text as content). standard formats for text are ASCII, EBCDIC, Baudot and Hollerith.
|

December 4th, 2004, 08:31 PM
|
|
Contributing User
|
|
Join Date: Jun 2003
Posts: 705
Time spent in forums: 7 m 27 sec
Reputation Power: 11
|
|
|
DaWei_M:
In particular I must agree that text mode doesn't deal with text encoding, as in Unicode or MCBS. In the days it was introduced, though, the fopen/fread/fgets set included the text mode, and Windows hadn't yet been written. Text was usually good ol' American ASCII - self centered as we were (and are).
I'm curious about the notion of the end of file marker being a problem. It's been at least a few years, but when I did work on System 5 & 7 (AT&T 3B2's), and AIX (IBM's version of Unix 7), I don't recall having any problems with the text mode. This would be somewhere between 198* - something (before 85), and '92. Indeed, when one opened files in AIX without specifying the "binary" mode - either with open or fopen - binary data would end up corrupted.
phatchump:
It appears to me you haven't started one line of code. DaWei_M, myself and others loathe to provide code for someone that hasn't tried, and I'll point you to where you should look in the documentation, but an English description is as much a primer as I'm interested to launch (I have development work, I do this as a curiosity).
Ok, look into open and fopen - the work differently. With fopen you can use fgets, which will stop reading data when it reaches the end of line (be it carriage return/linefeed - whatever the OS is ok with).
With each line, you loop through every character, testing each one. You have your pick of choices;
You could use "isspace" - which should return false for a non-whitespace character, indicating "something" is there.
If you see any non-whitespace in the buffer, it's a text line, so count it.
You could also check the value against your own table for whitespace, something like "If it's a tab, or a space, or a carriage return, or a linefeed, then it's whitespace".
If you want to read the data as groups, then you work on the premise that each group appears exactly in sequence, line for line. Figure out what each line means - first line is a telephone, second is an address - or whatever it is, and copy the buffer you read into a variable of that meaning.
This does get tricky because fgets is expected to include the linefeed (and/or carriage return, if I recall correctly).
You have to strip thouse out. In C, you have to do this (by many possible means, mind you), one character at a time.
Now, you asked for a not-so-complicated way.
Sorry, it's almost never uncomplicated.
However, it IS a bit simpler in C++ than this.
|

December 4th, 2004, 08:43 PM
|
 |
Lord of Dorkness
|
|
Join Date: Jan 2004
Location: Central New York. Texan via Arizona, out of his element!
|
|
|
I'm speaking of "opening the file in text mode", which is pretty much a MS thangy these days. That's not to be confused with text encoding, as Arpia is doing. I go back that far Jason, even farther. Different systems used to do things differently as regards text and so forth, as a matter of fact VMS distinguished between text and binary modes, but the current documentation regarding the modes for the CRTL are quite clear. MS is equally as clear regarding their text mode and the problems it can cause. The information I spouted is straight from the documentation and is flagged as MS specific. The 'lack of a text mode' I speak of is *nix specific. If I conveyed the impression that I was speaking for every OS in the world, I was remiss. My statements will be germane, however, for almost every environment encountered here.
|

December 4th, 2004, 08:53 PM
|
 |
Contributing User
|
|
Join Date: Jun 2003
Location: Baltimore, MD
Posts: 229
Time spent in forums: 2 h 13 m 58 sec
Reputation Power: 11
|
|
what an interesting subject
i guess that it all starts on the function call (system call) that opens the files. to my understanding, the program calls the OS with a system call and the OS gives back a pointer to the beginning of the file. so whether it is binary or formated-binary doesn't matter (i may need to be straighten up on this). then your program reads from memory the contents of the file. if you expect text as content, then you would use a function like fscanf. where would the opening the file as text-mode fit on this?
|

December 4th, 2004, 09:36 PM
|
 |
Lord of Dorkness
|
|
Join Date: Jan 2004
Location: Central New York. Texan via Arizona, out of his element!
|
|
When one opens a file (in Windows) in text mode, the encoding of the bits as they are transferred to the file are totally unmodified. You may write 0x41. How you interpret that is entirely up to you. Perhaps you mean it to be an 'A' and perhaps you mean it to be the value, decimal 65, and perhaps you mean it to be some other encoded entity, perhaps EBCDIC (which has many variants). The meaning of text mode, in Windows, as implied by default with "fopen (..., "r" (or "rt"),...) is strictly as I outlined in in my post above. On a *nix system binary is the only mode; any "b" is just tossed. If one wants an fscanf ("....%d...") to work without putting the stream into an error state, then the data encountered better be whitespace or something convertible to an integer. One of the benefits of fgets and other mechanisms like it is that it attempts no numerical conversions; any pattern occurring on the medium is acceptable. One can then perform conversions with sscanf (or perhaps one of the atoi family), but must still be wary of errors if the data presented aren't amenable to the specified numerical conversion. The function call that opens the file is not a "system call"; it's a call of a standard library function. The call will be implemented eventually with a "system call" that is specific to the hardware and operating system. That is something transparent to the programmer unless he/she cares to trace through all the underlying code that is being executed. One doesn't "open" files on differing media in the same way (at the hardware level), typically. There are differing drivers for the various flavors of IDE, RAID, and so forth. If the average programmer had to deal with all that these days, he/she would be gray. Perhaps that's why I am. No, it was my second ex- that did that, wasn't it?  Or, maybe my teenagers. 
|

December 4th, 2004, 10:12 PM
|
 |
Contributing User
|
|
Join Date: Jun 2003
Location: Baltimore, MD
Posts: 229
Time spent in forums: 2 h 13 m 58 sec
Reputation Power: 11
|
|
|
ok. i had no luck finding anything about binary or text files in linux.
i don't know much windows (they have so many strange names, that's why i am not interested in it). open() is a pure system call that returns a file descriptor (int). it either opens or creates a regular file (gosh, i wish i had my minix book!). i haven't found anything in linux that talks about opening a file in text-mode or binary mode. i have just look at all the the flags (12) for the system call open() and i see none that refers to binary or text mode.
BTW: in my class for C programming i had to use system calls to write programs. so, sometimes they are used.
greetings
|

December 5th, 2004, 02:10 AM
|
 |
Contributing User
|
|
Join Date: Jan 2004
Location: near St. Louis Illinois
|
|
Quote: | Originally Posted by arpia ok. i had no luck finding anything about binary or text files in linux. |
you're making this way too difficult! there isn't much to it, all you need is fopen(filename,"r"); man fopen and you will probably find the description of the open flags, such as "r" in my example. By default most operating systems and compilers will open the file in text mode. fgets() will read one line of text (which terminates with one of the following, depending on the operating system: Note that other operating systems may terminate text lines with some of ther character(s). If you use fgets() you don't have to worry about that.
Code:
MAC: '\r'
*NIX: '\n'
Windows: "\r\n"
|

December 5th, 2004, 06:30 AM
|
 |
Lord of Dorkness
|
|
Join Date: Jan 2004
Location: Central New York. Texan via Arizona, out of his element!
|
|
|
You actually searched your Linux docs after being told that Linux doesn't distinguish between the two modes? Your time would have been better spent playing with baby's mama. Or Googling for an 'fopen' discussion in a wider context of multiple operating systems and language standardization.
|

December 5th, 2004, 12:20 PM
|
 |
Contributing User
|
|
Join Date: Jun 2003
Location: Baltimore, MD
Posts: 229
Time spent in forums: 2 h 13 m 58 sec
Reputation Power: 11
|
|
yes. i am stubborn
i am not happy and i need to understand why there is a reason for text-mode or binary-mode. i need to know if opening a file in text-mode or binary-mode is call that way because there is a difference of regular files or just a way for fopen() to identify a file that contains text.i've read 'man fopen' and they do mention 'b.' however, open() sys call has nothing to this respect. and the mistery is somewhere in the implementation of fopen().
Quote:
Linux Programmer’s Manual: fopen()
The mode string can also include the letter ‘‘b’’ either as a last character or as a character between the characters in any of the two character strings described above. This is strictly for compatibility with ANSI X3.159-1989 (‘‘ANSI C’’) and has no effect; the ‘‘b’’ is ignored on all POSIX conforming systems, including Linux. (Other systems may treat text files and binary files differently, and adding the ‘‘b’’ may be a good idea if you do I/O to a binary file and expect that your program may be ported to non-Unix environments.) |
so the mistery is this:
Quote: | By default most operating systems and compilers will open the file in text mode. |
what does it mean to open a file in text-mode?
Quote: | fgets() will read one line of text (which terminates with one of the following, depending on the operating system: Note that other operating systems may terminate text lines with some of ther character(s) |
so the mistery is that function expect text and that is why it is called text-mode?
Quote: | You actually searched your Linux docs after being told that Linux doesn't distinguish between the two modes? |
that's the way i am  i need to know the why
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|