Forums: » Register « |  Free Tools |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support |
 User Name: Password: Remember me

New Free Tools on Dev Shed!
We're Excited to announce that Dev Shed now has 70 free tools on the site. To learn more, click here!

 Dev Shed Forums Sponsor:
#1
March 13th, 2010, 09:25 PM
 jmpeer
Registered User

Join Date: Mar 2010
Posts: 7
Time spent in forums: 4 h 54 m 9 sec
Reputation Power: 0
[Assembly] Developing an Assembler in Theory

I'm just curious how an assembler converts its source code into machine code.

I'd assume it compares the character values of the source code to a table from which it'd generate it's binary instruction counterparts... and write it to file of course.

Can anyone explain this process and show me any references if they exist?

#2
March 13th, 2010, 11:43 PM
 Lux Perpetua
Contributing User

Join Date: Feb 2004
Location: San Francisco Bay
Posts: 1,939
Time spent in forums: 1 Month 1 Week 3 h 27 m 29 sec
Reputation Power: 1312
Writing an assembler for a RISC architecture is relatively straightforward, since converting assembly language to machine code is a pretty direct translation. (Assembling for a CISC architecture is in theory fundamentally similar, just more complicated due to, well, the complexity of the instruction set.) However, it's going to seem really tedious if you have to think about it on the level of "compares the character values of the source code to a table." To make it seem less tedious, you should break it down into multiple steps: first the source file is parsed into its logical structure, and then the structure is translated into ones and zeroes.

That probably isn't clear, so let me give some pseudo-C++.
Code:
```enum instr_type {
ADD, SUB, MOV, LDR, STR, // ...
};

struct instruction {
instr_type type;
operands *ops;  // array of operands
// etc. (whatever is logically needed to specify a machine instruction)
};

// Returns pointer to byte after last written byte
void *assemble_instruction(const instruction & i, void *dest) {
switch (i.type) {
// ...
}
}```
That struct instruction is the key element of the logical structure of an assembly program. (It's not the only element: there's also the global data and variables, and the division of the program into sections. For simplicitly, I'll ignore those, but you will have to deal with them if you want to write an assembler.) The main process would then roughly be to parse the source code into a sequence of type struct instruction and then to call assemble_instruction repeatedly to translate it into machine code.

I haven't yet told you how to parse the source file into a sequence of those structs. That itself is usually divided into two steps: first perform lexical analysis to produce a sequence of tokens, and then interpret the tokens into logical units. Roughly speaking, tokens (the output of lexical analysis) are to computer language as words are to natural language. Continuing the analogy, the logical units (the output of parsing) are to computer language what phrases, clauses, sentences, and paragraphs are to natural language. On the other hand, the machine code (the output of assembling) has no counterpart in natural language; the closest thing would be a brain's internal representation of information.

There are a few other things that make the whole process more complicated than the above might suggest. One complication in the machine-code-generation step is labels: to assemble an instruction that refers to a label (jumps being the canonical example), you have to know what address that label points to. A common technique to handle this is to make two passes: first go through and figure out where all the labels point and put that information in a symbol table, and then go through it again to do the assembling normally, now that you know what the labels refer to.

Of course, after you've completed all that, you still need to produce a valid object file, whose format depends on the operating system. I don't know much about this part, but it's probably safe to say that if you've done everything up to this point, you can probably finish it now, relying heavily on documentation.

#3
March 14th, 2010, 01:35 AM
 jmpeer
Registered User

Join Date: Mar 2010
Posts: 7
Time spent in forums: 4 h 54 m 9 sec
Reputation Power: 0
Parsing and tokenizing this won't be too much of a hassle for me. I'm more concerned with how to get the form of a binary executable, since this language is not too too hard to translate over.

Things like the addressing are what I have to worry about, but what you recommended is simple enough, to collect them and reference them whenever needed.

But, like, my concern lies in, the addressing itself. Is it relative to the beginning of the program? Or perhaps assemblers pass the difference between the addresses of 'jmp' and the label, so the execute-instruction register pointer thingy can be changed relative to the program's perspective. I'll see if my reading clues me in on this instruction tomorrow maybe.

But yeah, thanks for the reply Lux. It was pretty quick and informative. I've sort of confirmed what I wanted to know before, and now I'm going to wander into the more specific procedures.

#4
March 14th, 2010, 11:51 AM
 fishtoprecords
Contributing User

Join Date: Sep 2007
Location: outside Washington DC
Posts: 2,642
Time spent in forums: 3 Weeks 4 Days 23 h 21 m 56 sec
Reputation Power: 3699
Standard practice for a compiler is to generate its output with a symbol table as input to a link program. Sometimes called a linkloader.

The compiler generates binary that corresponds to something like

and the linker fills in the proper address for where the "age" variable is located in memory.

This happens for most addresses, things like function entry points, arguments, constants, etc.

Any good Computer Science book on compilers will have tons of detail on how this is done'

 Viewing: Dev Shed Forums > Programming Languages - More > Other Programming Languages > [Assembly] Developing an Assembler in Theory

## Developer Shed Advertisers and Affiliates

 Thread Tools Search this Thread Search this Thread: Advanced Search Display Modes Rate This Thread Linear Mode Rate This Thread: 5 : Excellent 4 : Good 3 : Average 2 : Bad 1 : Terrible

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts vB code is On Smilies are On [IMG] code is On HTML code is Off
 View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox Forum Jump Please select one User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home -------------------- Programming Languages    PHP Development        PHP FAQs and Stickies    Perl Programming        Perl FAQs and Stickies    C Programming        C Programming FAQs and Stickies    Java Help        Java FAQs    Python Programming        Python Programming FAQs    Ruby Programming        Ruby Programming FAQs    Game Development        Game Development FAQs Programming Languages - More    ASP Programming        ASP Programming FAQs    .Net Development        .Net Development FAQs    Visual Basic Programming        Visual Basic Programming FAQs    Software Design        Software Design FAQs    ColdFusion Development        ColdFusion Development FAQs    Delphi Programming        Delphi Programming FAQs    Regex Programming        Regex Programming FAQs    XML Programming        XML Programming FAQs    Other Programming Languages        Other Programming Languages FAQs Web Design    HTML Programming        HTML Programming FAQs    JavaScript Development        JavaScript Development FAQs    CSS Help        CSS Help FAQs    Flash Help        Flash Help FAQs    Photoshop Help        Photoshop Help FAQs    Web Design Help        Web Design Help FAQs    Website Critiques        Website Critiques FAQs    Search Engine Optimization        Search Engine Optimization FAQs Mobile Programming    Mobile Programming        Mobile Programming FAQs    iPhone SDK Development        iPhone SDK Development FAQs    Android Development        Android Development FAQs    BlackBerry Development        BlackBerry Development FAQs Web Site Management    Business Help        Business Help FAQs    Development Software        Development Software FAQs    Scripts        Scripts FAQs Databases    Database Management        Database Management FAQs    DB2 Development        DB2 Development FAQs    MySQL Help        MySQL Help FAQs    PostgreSQL Help        PostgreSQL Help FAQs    Firebird SQL Development        Firebird SQL Development FAQs    MS SQL Development        MS SQL Development FAQs    Oracle Development        Oracle Development FAQs    LDAP Programming        LDAP Programming FAQs System Administration    Mail Server Help        Mail Server Help FAQs    Apache Development        Apache Development FAQs    Security and Cryptography        Security and Cryptography FAQs    Antivirus Protection        Antivirus Protection FAQs    DNS        DNS FAQs    IIS        IIS FAQs    Networking Help        Networking Help FAQs    FTP Help        FTP Help FAQs Operating Systems    BSD Help        BSD Help FAQs    Linux Help        Linux Help FAQs    UNIX Help        UNIX Help FAQs    Windows Help        Windows Help FAQs    Mac Help        Mac Help FAQs Web Hosting    Web Hosting        Web Hosting FAQs    Free Web Hosting        Free Web Hosting FAQs    Web Hosting Requests        Web Hosting Requests FAQs    Web Hosting Offers        Web Hosting Offers FAQs Computer Hardware    Computer Hardware    CPUs        CPUs FAQs    Cooling        Cooling FAQs    Embedded Programming        Embedded Programming FAQs    Motherboards        Motherboards FAQs    Multimedia Hardware        Multimedia Hardware FAQs Other    Dev Shed Lounge        Dev Shed Lounge FAQs    Development Articles        Development Articles FAQs    Beginner Programming        Beginner Programming FAQs    Hire A Programmer        Hire A Programmer FAQs    Project Help Wanted        Project Help Wanted FAQs Latest News Updated Hourly    Technology News    Business News    Science News Forum Information    Forum Rules/Guidelines        Forum Rules/Guidelines FAQs    Forum Announcements        Forum Announcements FAQs    Dev Shed Gaming Center        Go to the Dev Shed Battle Arena        Go to the Dev Shed Arcade Games        Go to the Legend of the Green Dragon    Suggestions & Feedback        Suggestions & Feedback FAQs

 Forums: » Register « |  Free Tools |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support |