#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2008
    Posts
    4
    Rep Power
    0

    Open and write vs fopen and fwrite in a DOS->Linux port


    Hi,

    I'm porting a large C application from DOS to Linux (a task I've never done before), and generally it's going very well. I got it to compile and run, and interface with some hardware. However, I've realized one problem, and I don't know enough about DOS to really know what's going on.

    So, the code periodically writes to file using the functions open and write. Now, the Linux version compiles and runs with these function names (and writes to file with some semblance of the correct information). The information written to file includes various structures and arrays in some very particular manner to three different files.

    Anyway, as I understand it, open and write (along with read) are system-specific, so the DOS version of those calls is writing differently than the Linux version is. Ideally, the original program would be using fwrite and fopen, and things would port nicely. Regretfully, that is not the case.

    Anyhow, the complex binary files the application creates are read in by some ancient, cryptic fortran application (that I'll ideally eventually replace with something newer) and turned into ascii files. The problem is, when I import the files generated by the Linux version of the application, there are all sorts of what looks like extra zeros and other garbage; that's presumably a difference in file formatting.

    So, either I need to figure out a way to get fwrite and fread to emulate the original DOS formatting, or figure out what the difference is, so I can write my own application to read in the data (I don't really understand the differences between DOS write, Linux write, and fwrite). Perhaps a bit of each would be best.

    Any ideas?
  2. #2
  3. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,417
    Rep Power
    1871
    It's not that write() works any differently on Linux, or that fwrite() would save you from the mess.

    The short answer is the code is broken as designed. Getting out of the problem is no easy task.

    Take this simple example
    Code:
    struct foo {
      int a;
    };
    If you write() this on DOS, you'd probably end up with a 2-byte file.
    Try the same thing on Linux, and it's likely to be 4 bytes.

    And it doesn't stop there.
    Code:
    struct foo {
      short b;
      int a;
    };
    Might be 4 bytes on DOS, but 8 bytes on Linux, with a pair of \0\0 in between the values (or something completely random).
    This is the padding and alignment problem. Each compiler is free to pad a struct to maximise the efficiency of access to each member of the structure.

    As a final Coup de grāce, binary portability between machines wth a different endianness is just that little bit harder to overcome.

    The long answer is that you'll need to replace (on Linux) every read() / write() of a structure with some code which reads/writes every byte in turn, in the correct sequence.

    > (that I'll ideally eventually replace with something newer) and turned into ascii files.
    Assuming you're not going back to DOS when you've finished this port, then I'd suggest you do this now rather than later.
    It will save a lot of messing about with trying to read DOS binary files on Linux, which from the sound of things would have a limited shelf life anyway.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2008
    Posts
    4
    Rep Power
    0
    I've been looking through the code, and you are absolutely right. The writer of the originally code assumed ints were 2 bytes rather than 4 bytes in his calculations of struct sizes etc.

    I think I'm going to begin by replacing ints with shorts in a new version of the struct definitions, and see what happens from there.

    As far as endianness, I was worrying about that too, however, I'm using the same hardware, only now it's running Linux, so hopefully that won't change. Am I right here?

    So, I'll post the results of this in a bit, although I suspect you hit the nail on the head.
  6. #4
  7. Contributing User

    Join Date
    Aug 2003
    Location
    UK
    Posts
    5,117
    Rep Power
    1803
    I would suggest using the C99 <stdint.h> types such as int16_t rather than short, int etc.

    Also this does not overcome the member packing/padding applied by different compilers (or the same compiler with different options).

    I assume that you are using GCC, and in that case you can specify alignment and packing with data type attributes so that it matches the scheme of your DOS compiler. It ain't pretty, but it might get you going quicker than serialisation which would be the robust and portable method, and should be done at some point regardless.

    Clifford
  8. #5
  9. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,417
    Rep Power
    1871
    Yeah, there's no endian issue if you're just sticking to some version of the x86 architecture.

    Careful use of short, and the packed attribute (for gcc) should allow you to declare structs which look the same on both machines.

    In any event, you should only use packed structures as transfer agents. Squeezing out padding necessarily compromises speed of access. So you don't really want to propagate usage of those structures through your entire program.

    Consider calling a 'check' function which is just a series of
    if ( sizeof(foo) != 4 ) { print an error message }
    type things, to validate the size of all your structs before the program proper starts.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2008
    Posts
    4
    Rep Power
    0
    So, it seems this fix worked. Luckily, the original developer of the application went through the trouble of putting in filler variables into the struct, and keeping them in there, it seems that the structs should be packed deterministically.

    Anyway, the application is now writing in the proper format. Much thanks for your help.

IMN logo majestic logo threadwatch logo seochat tools logo