#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    1
    Rep Power
    0

    Making of C Compiler


    I want to know that how exactly C code get executed. I want to know that what things get happened when we give command for compiling C program.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    156
    Rep Power
    33
    It's not really an answer to your question, but you might like to read A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux.
  4. #3
  5. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    243
    C code is converted to assembler which is then assembled into binary instructions (usually position independent). Nothing happens until the linker/loader is invoked when you run the program and the linker resolves any calls to other object (binary) files and the loader puts the executable image into RAM at a specific location and then launches the startup routine.

    This isn't really a C/C++ question. The compiler converts the human readable text into a binary representation specific to the OS/hardware the program is expected to run on.

    I have a couple of examples of code written directly in binary instructions if you are curious:

    http://sol-biotech.com/code/SelfModifyingCPUID/
    http://sol-biotech.com/code/CPE//

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    San Francisco Bay
    Posts
    1,939
    Rep Power
    1313
    Originally Posted by sagarkamble
    I want to know that how exactly C code get executed. I want to know that what things get happened when we give command for compiling C program.
    1. Preprocess (insert #included files, replace macros by their definition, etc.)
    2. Compile to assembly code
    3. Assemble to machine code
    4. Link

    It's helpful to view the result of each stage. I'm going to use gcc for my examples because that's what I'm familiar with. It's most instructive to do this with a small "hello world"-type program. Suppose the source file you want to compile is called program.c.

    1. Preprocess:
    Code:
    gcc -E program.c > program.i
    Now, you can peruse program.i and see all the header files spliced right in and all your macros expanded. If you've never read stdio.h, you might be surprised by how big the preprocessed file is.

    2. Compile:
    Code:
    gcc -S program.i
    This created an assembly source file called program.s, which you can view in a text editor. This part really benefits from using a very simple "hello world" program; you might be surprised how short the assembly source is, especially compared to how long the preprocessed file was! With a little effort, you can probably even understand the assembly code.

    3. Assemble:
    Code:
    gcc -c program.s
    You now have an object file, program.o. This is one small step away from being an executable program. The main difference has to do with any external functions or variables (for example, printf) you use in your source code: at this stage of the process, GCC has not tried to track those down for you, so those remain unresolved references in the object file. The object file is a binary file and thus not human readable, but there are utilities for extracting information from object files. A pretty useful one is nm, which can show all the external references in an object file:
    Code:
    nm -g program.o
    You should see all the non-static functions in your code listed as well as any standard library functions like printf or scanf.

    4. Link:
    Code:
    gcc program.o -o program
    Not much to do now but run the program! (Of course, the compiled program is not human-readable. You can still get information from it in various ways, but I don't think it's relevant to your question any more.)
    Last edited by Lux Perpetua; August 21st, 2012 at 03:42 AM. Reason: Fixing a typo
  8. #5
  9. Contributing User
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jan 2003
    Location
    USA
    Posts
    7,091
    Rep Power
    2222
    You're asking two different questions there, so we're a bit confused as to what you want.

    I'll assume that you're asking about the C build process:

    When the build involves multiple source files, each source file is compiled separately, such that the compiler starts each compilation with absolutely no knowledge of what it had found in any other source file. That is why you place type definitions, macros, extern variables, and function prototypes in a header file that's associated with a source file so that the other source files can #include it to let them know what's in that other source file.

    With each source file, first the compiler runs the preprocessor, which executes megacommands that start with #, such as #define, #include, #ifdef. The preprocessor inserts the files indicated by the #include commands, expands macros (which are defined by #define), interpret conditional compilation commands by including or excluding the indicated code, etc. Basically, the preprocessor creates the final compilable form of the source file. In many compilers, you can command the compiler to generate an output file which is that final compilable form; how to do that differs from one compiler to another.

    Then the compiler does its thing, parsing the source code, building symbol tables, translating the source code to assembly (or to an intermediate form that will then be converted to assembly) and converting the assembly to object code, which is mostly but not quite machine code. That object code goes into an object file (eg, .obj in Microsoft, .o in Linux) which also marks up the object code for where the code accesses external resources (these are called unresolved symbols) as well as contains tables for the linker to use in resolving those unresolved symbols. Actual tables and file format depends on the compiler, etc.

    When all the source files have been compiled, the linker is invoked to generate the executable. The linker takes all the object files and all the referenced libraries (special object files designed for reuse; .LIB in Microsoft and .a in Linux, the Standard C Library is an example, though you could create your own libraries) and links them all together in the executable, generating location tables in the process. Then it uses those tables to go through each object file and replace the "unresolved symbol" markers with the actual address of each symbol, AKA "resolving the addresses".

    Each step depends on what can be known at that time -- these times being known as "compile-time", "link-time", and "run-time" -- ; it is absolutely and vitally necessary to know which "time" you are in. At compile-time, compiling a source file depends on header files to tell it what should exist in other source files or in libraries being linked in, so the object code contains markers, AKA "place holders", for address information to be inserted later. At link-time, linking handles all that, but it still does not know exactly where in memory the program will be loaded, so the linker has no idea of the exact memory location of each variable and function, information which is absolutely necessary for the code to actually execute.

    For that reason, all addresses in the executable are resolved relative to a common starting address and the location of all addresses is marked either in the object code or in a relocation table in the executable. Then when you execute the program, you do so with a loader which obtains a block of memory for the program and then performs relocation of all the addresses. That creates in the memory a memory image which is executable as-is; note that in embedded programming, the end-result is a memory image that can be loaded into some kind of PROM (programmable read-only memory).

    If you can get your hands on it, read The MS-DOS Encyclopedia (Microsoft Press, 1988 -- decades out-of-print by now, obviously). It not only explains the process excellently (it's where I learned what the loader does), but also provides and explains the file formats. Of course, that's the only reason for you to read it and those formats are obsolete. If you can find a similar description of the OS and compiler that you're using, then get that description and read it.
  10. #6
  11. Commie Mutant Traitor
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Jun 2004
    Location
    Norcross, GA (again)
    Posts
    1,804
    Rep Power
    1569
    Sorry for a somewhat late reply, but if you are interested in executable formats and how they are used, you would do well to check out Linkers and Loaders by John Levine. While it is now somewhat dated, most of the information is still relevant, and an early version of the book is available for reading online on that page.
    Rev First Speaker Schol-R-LEA;2 JAM LCF ELF KoR KCO BiWM TGIF
    #define KINSEY (rand() % 7) λ Scheme is the Red Pill
    Scheme in Short Understanding the C/C++ Preprocessor
    Taming Python A Highly Opinionated Review of Programming Languages for the Novice, v1.1

    FOR SALE: One ShapeSystem 2300 CMD, extensively modified for human use. Includes s/w for anthro, transgender, sex-appeal enhance, & Gillian Anderson and Jason D. Poit clone forms. Some wear. $4500 obo. tverres@et.ins.gov
  12. #7
  13. Contributing User

    Join Date
    Aug 2003
    Location
    UK
    Posts
    5,075
    Rep Power
    1802
    Originally Posted by sagarkamble
    I want to know that how exactly C code get executed.
    C code is not executed, it is compiler into a machine code executable. It is the the usually operating system that is responsible for loading and starting execution of the code. Embedded systems or bootstrap code (where there is no OS) may be started by other mechanisms, but that is probably not what you are asking about here?

    Originally Posted by sagarkamble
    I want to know that what things get happened when we give command for compiling C program.
    Compilation of C comprised of a number of stages, primarily:

    • Pre-processing - any line beginning with # is a preprocessor directive. The preprocessor outputs C code with all the #include'd code inserted, all the #define macro instances replaced, and any #if... conditional code included or removed as directed.
    • Compilation - the compiler proper generated "object" code. Some compilers generate assembler and then have an assembler pass to generate machine code, others generate machine code directly. The object code output by the compiler does not include code relating to external references to library code or code in separately compiled object code - the object file contains unresolved links to such code.
    • Linking - the linker is responsible for assembling separate modules and library code into a single executable, and resolving all unresolved references with references to the linked code.


    The body of you post does not contain any clear and specific questions and your title seems to be asking something else altogether; that suggests that you want to build a compiler?

    Compilers can be complex things. First of all they are required to generate assembler or machine code, so to create a compiler you must be familiar with the target instruction set and architecture. Moreover to create a linker, you need to know how the OS loads an executable and the format of the executable file to support loading. Luckily C is a rather small and simple language for the most part, bit still not insignificant. Modern compilers perform many complex optimisations requiring deep analysis of code flow and instruction execution.

    One way to start studying a simple compiler implementation is to perhaps look at the source and documentation for Tiny C Compiler

IMN logo majestic logo threadwatch logo seochat tools logo