Other Programming Languages
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming Languages - MoreOther Programming Languages

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Stop making mediocre tutorials.The best tutorials are video! Camtasia Studio makes it easy to create engaging, buzz-building screen videos at any size, in any popular format. Download the free trial!
  #1  
Old December 16th, 2005, 06:07 PM
Scorpions4ever's Avatar
Scorpions4ever Scorpions4ever is offline
Banned ;)
Dev Shed God 5th Plane (7000 - 7499 posts)
 
Join Date: Nov 2001
Location: Glendale, Los Angeles County, California, USA
Posts: 7,336 Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level)Scorpions4ever User rank is Brigadier General (60000 - 70000 Reputation Level) 
Time spent in forums: 4 Weeks 12 h 46 m 43 sec
Reputation Power: 674
How does an interpreter/compiler work?

Everything you were afraid to ask about how a compiler/interpreter works.

Let's start by discussing a very simple form of a compiler/interpreter:

Source File ---> Scanner ---> Lexer ---> Parser ---> Interpreter/Code Generator

So what do all the terms here mean. We'll work on that as we go:
Source File: This is the program that is read by the compiler or interpreter. This is the text that needs to be compiled or interpreted.

Scanner: This is the first module in a compiler or interpreter. Its job is to read the source file one character at a time. It can also keep track of which line number and character is currently being read. A typical scanner can be instructed to move backwards and forwards through the source file. Why do we need to move backwards? We will see why in just a little bit when we examine the lexer. For now, assume that each time the scanner is called, it returns the next character in the file.

Lexer: This module serves to break up the source file into chunks (called tokens). It calls the scanner to get characters one at a time and organizes them into tokens and token types. For instance, if the source file read something like this:
Code:
cx = cy + 324;
print "value of cx is ", cx;

a lexer would perhaps break it like this:
Code:
cx  --> Identifier (variable)
=  --> Symbol (assignment operator)
cy  --> Identifier (variable)
+ --> Symbol (addition operator)
324 --> Numeric constant (integer)
; --> Symbol (end of statement)
print --> Identifier (keyword)
"value of cx is " --> String constant
, --> Symbol (string concatenation operator)
cx --> Identifier (variable)
; --> Symbol (end of statement)

Thus, the lexer calls the scanner to pass it one character at a time and groups them together and identifies them up as tokens for the language parser (which is the next stage). It also identifies the type of token (variable vs. keyword, assignment operator vs. addition operator vs. string concatenation operator etc.) Occasionally, the lexer has to tell the scanner to back up though. Consider a language that has operators that may be more than one character long (! vs. !=, < vs. <=, + vs. ++ etc.). Assume that the lexer has requested the scanner for a character and it has returned '<'. The lexer needs to determine whether the operator is a < or a <=. So it requests the scanner for another character. If the next character is a '=', it changes the token to '<=' and passes it to the parser. If not, it tells the scanner to back up one character and hold it in the buffer, while it passes the '<' to the parser.

Parser: This is the part of the compiler that really understands the syntax of the language. It calls the lexer to get tokens and processes the tokens per the syntax of the language. For instance, taking the example from the lexer above, the hypothetical interaction between the lexer and parser could go like this:
Code:
Parser: Give me the next token
Lexer: Next token is "cx" which is a variable.
Parser: Ok, I have "cx" as a declared integer variable. Give me next token
Lexer: Next token is "=", the assignment operator.
Parser: Ok, the program wants me to assign something to "cx". Next token 
     Lexer: The next token is "cy" which is a variable.
     Parser: Ok, I know "cy" is an integer variable. Next token please
     Lexer: The next token is '+', which is an addition operator.
     Parser: Ok, so I need to add something to the value in "cy". Next token please.
         Lexer: The next token is "324", which is an integer.
         Parser: Ok, both "cy" and "324" are integers, so I can add them. Next token please:
         Lexer: The next token is ";" which is end of statement.
     Parser: Ok, I will evaluate "cy + 324" and get the answer
Parser: I'll take the answer from "cy + 324" and assign it to "cx"

In the above, the indenting shows a subprocess that the parser enters, to evaluate "cy + 324". This gives you a decent idea about how the parser operates. Also note that the parser is checking types and syntax rules (for instance, it checked whether cy and 324 were both integer types before adding them). If the parser gets a token that it was not expecting, it will stop processing and complain to the user about an error. The Scanner holds the current line number and character, so the Parser can inform the user approximately where the error occurred.

Interpreter/Code Generator: This is the part that actually takes the action that is specified by a program statement. In some cases, this is actually part of the parser (especially for interpreters) and the parser interprets and takes action directly. In other cases, the parser converts the statements into byte-code (intermediate language). In case of a compiler, it then hands them to the Code Generator to convert into machine code instructions. If you want a compiler for a different CPU or architecture, all you have to do is put a new code generator unit to translate the byte code into machine code for the new CPU.

This is about the simplest form for an interpreter or compiler. In the next few threads, we will look in some more detail at the interaction between the Parser and Code Generator.

Feedback about this post will be greatly appreciated.
Comments on this post
SimonGreenhill agrees: informative
medialint agrees!
netytan agrees: Awesum Scorpi
jafet agrees: Good article
JhonnyO@gmail.c agrees!
crownjewel82 agrees: My compiler construction professor would be proud.
LinuxPenguin agrees: Amazing
__________________
Up the Irons
What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
"Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
Down with Sharon Osbourne

Puzzle of the Month solved by sizeablegrin, etienne141 and L7Sqr, superior C/C++ programmers of the month

Last edited by Scorpions4ever : December 16th, 2005 at 06:52 PM.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreOther Programming Languages > How does an interpreter/compiler work?


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

 Free IT White Papers!
 
Accelerating Trading Partner Performance
One in five. That's how many partner transactions have at least one error. That is an amazing statistic, particularly given the extraordinary leaps in innovation across the global supply chain during the past two decades. Download this white paper to learn more.

 
Competing on Analytics
This Tech Analysis is designed to help identify characteristics shared by analytics competitors, and includes information about 32 organizations that have made a commitment to quantitative, fact-based analysis.

 
Cost Effective Scaling with Virtualization and Coyote Point Systems
An overview of the industry trend toward virtualization, how server consolidation has increased the importance of application uptime and the steps being taken to integrate load balancing technology with virtualized servers.

 
Five Checkpoints to Implementing IP Telephony
Implementation planning for IP PBX software and IP telephony has become vital as businesses replace discontinued legacy PBX phone systems. This informative whitepaper outlines five "checkpoints" for any implementation plan that will help make IP communications a successful proposition.

 
Hosted Email Security: Staying Ahead of New Threats
In the last two years, email has become a fierce battleground between the nefarious forces of spam and malware, and the heroes of messaging protection. The spam volumes increased alarmingly every month, bringing clever new forms of phishing and virus propagation attacks.

 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway