How do people create new programming languages?
I'm a novice programmer, having only briefly played with some Java, some C++ and OpenGL.
Something that's always confused me is how do people create new programming languages?
1) What language do they write it in?
2) Does the language have to be one that in between a high level language and machine code?
3) What are the stages/elements of creating a programming language (From my basic research I've realised that all programming languages either need a compiler or an interpreter but I don't really understand the difference between the two)
As I said, I'm a complete novice but I'm really intrigued by programming and really wanna learn more. I'm like a really crap tennis player who wants to know everything about tennis!
I'm going to (attempt) to summarize a few years worth of college courses into the space of this post. Any factual errors are unintentional and (hopefully) inconsequential. But it will give you a good idea of where computers started and how they've progressed and how they progressed.
If you want to get into the history of computers I'd look at things such as the Jacquard loom and Babbage's difference engine. These two inventions probably had the greatest early influence on early modern computing. Usage of punch cards (Jacquard) and the mechanical 'calculation' (Babbage) provided a great foundation.
So lets start with how the computer works. At the heart of every computer is a Transistor (and before that vacuum tubes). The big thing behind a transistor is it can send electricity in two different ways, depending on it's state*. This allows for the creation of logical flows of electricity. With this we can create all sorts of wonderful things**: nand gates, and gates, half adders, multiplexes etc.
Now that we have these electronic building blocks in place there's essentially two types of signals that come into them. Signals that tell the electronics what to do, and actual data that the electronics compute with. So the command to add, might say take the data from register 1, and add it to the data in register 2, and store that information in register 3. What this command does, is sets the computer up in a state so that registers 1 and 2 behave as inputs to the adder, and register 3 stores the result. The same would be true for subtraction multiplication etc. There's also commands to say jump to a certain line, read information from memory etc.
The binary commands these binary commands are the machine code necessary to 'set the cpu.
To now I haven't really addressed your questions, but I've laid some of the framework to understand what's going on. (You said you wanted to learn everything =P)
So now we have a computer that runs on machine code. Nothing more nothing less. This is now a very hard to use, machine. So assembly is almost always one of the first languages that's created. To use assembly we need to create an assembler. An assembler is essentially a compiler, that turns assembly language into machine code. As a result assembly languages are 1 to 1 with the machine code commands. The idea being since we're coding this in binary, it's a good idea to keep it simple. So now we have something that can turn assembly language into machine code.
So now we have two levels of 'languages'
0 - Machine code: This is code that the cpu understands, it's in binary, and not very user friendly
1 - Assembly language: This uses some 'english like' terms, but is still relatively clunky, and 1 to 1 command wise with the machine code.
So let's add a third,
2- High level language.
A high level language is something that more closely resembles english, such as C. We have loops and data structures and other useful things. To use C we have to write a compiler. A compiler takes code written in the C language and creates object code (similar to assembly language). Then another program turns that object code into machine language. Anymore these two steps are usually combined into one for efficiency sake. Now we have your first definition. A compiler turns a high level language into object (or machine) code***.
Now that we have our first high level language (C) it probably seems foolish and painful to work with the assembly language again unless we have to. So now we can write new languages and compilers in C, or any other language we've now made****.
So now lets tear into an interpreter. An interpreter is a program that reads, and executes a program on it's own. Instead of turning a program that turns high level code into machine code, this reads the high level code, (usually a line at a time) and executes it.
Let's take java for example. Java is an interpreted language, basically that means someone has made a program (in say C) This C program, reads the java and executes it. So there's another layer between the computer and the code.
There's a lot of stuff this response kind of glazes over, how to build a cpu, the considerations into making a language, how the same language can run on different cpus. The advantages of interpreters over compilers etc. But hopefully it gives a good amount of background and information that you can read up and research more on your own.
*This is a vast oversimplification of the transistor if you're interested look more into it
**These are some of the basic components of electronics/computers.
***This kind of trivializes the difference between object and machine code. I'm not going into it here
****This does not go into the process of 'making' a language, the considieration etc.
Comments on this post
MBirchmeier - that's a great post. In a very positive way it's a lot more than I expected the reply to be considering the time it must've took to write. (I'm a slow poster, constantly re-writing and editing my sentences; it's the same when I program too - write, rewrite, rewrite....).
Its given me a fuzzy but, at the very least, directed view of the overall process. I'm gonna try and fill in some blanks and go from there. Hopefully I'll have more specific and relevant questions to ask than the last ones!
Not a problem...
Originally Posted by tReZ
These are the types of questions that help remove the magic from the computer. And once the magic has turned to understanding real cool stuff can start happening.
I covered this a while ago in the Other Programming Languages forum.
Up the Irons
What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
"Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
Down with Sharon Osbourne
"I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
i had studied this in Course called compiler design, it contain step by step information regarding to development of programming language. nice information given by MBirchmeier
Humans create a computer chip, which they know (or at least hope) will do certain things when given certain inputs (basically of 0s and 1s).
If you can make the chip capable of reading a file containing sequences of 0s and 1s, then you can write a program for it. Manually insert some 0s and 1s into the file, tell the chip to execute, and your program will execute.
That's the basic premise.
The problem as noted as that humans aren't very good at writing complex programs in 0s and 1s. We want to write stuff in english. What we need, is something to translate, or interpret, our english instructions into binary instructions that the chip can follow.
So any higher level language, basically consists of a bunch symbols and letters and whatnot, that when structured a certain way, can be interpretted and translated by a program we've written (initially and arduously in 0s and 1s) into 0s and 1s.
So the first compiler is written in 0s and 1s, and takes an input file with letter and numbers etc, and converts them into machine-readable 0s and 1s (a binary program).
A second level compiler would (simplistically) take english terms and more general language and more complex expressions, and turn them into language that the first level compiler can understand, which would then turn that into a binary program.
And so you can make higher and higher level compilers/languages.
An interpreter does what a compiler does, but 'on-the-go', rather all in one step. In a compiler, you feed it the entire source code, and it then spits out a binary program which can be run. In an interpreter, you feed it the source code, which it reads [and converts into binary language] line-by-line, executing as it reads/goes.
Because compilers are able to get 'the big picture' or the entire program, before they convert everything into the simpler language (binary program), whereas an interpreter only looks at the current situation, compilers are usually able to make 'optimisations' in the code they produce and a compiled program will 95% of the time be more efficient than the same program interpreted.
The advantage to interpreters is that you are able to 'debug' your high-level code extremely easily with them, owing to their line-by-line nature.
That's the overall gist and likely contains many errors, but if you somehow didn't manage to get MBirchmeier's post perhaps this will sort any general misunderstandings out.
Last edited by _ivo_; May 28th, 2008 at 10:58 PM.
I also was intrigued so I googled it. What he was saying up there is basically the same thing.
@tReZ: Are you trying to create a language?
September 14th, 2008, 08:35 PM
Just a question out of curiosity, what are the steps involved in creating a language.
Because, I am fascinated about whenever I read in books, about some people creating languages.
Can any of you provide info on this.
September 15th, 2008, 01:24 AM
First, carefully read the link provided by Scorpions4Ever prior to your post:
Originally Posted by sunilbsrv2k
This clears a few things up. If you're creating a language with the intention of compiling the code into machine code, then typically the compiler converts the source code into assembler (normally, through the techniques explained in the post in that link I just gave). In almost all cases the compiler software uses an existing assembler program to convert the assembler into actual machine code. There's not usually a good reason to go and create your own assembler instead of using an existing one, unless you want to do it for your personal learning experience.
This is just keeping things in simple terms, without going into the serious details. I'm sure you'd like to know the rest of the details (they can be found on this site in multiple places as well as other sites), however with what I and the others posted, this is really a nice starting point. This gives you a nice, simple idea of how languages are created.
Interpreted languages are just where one program (called the "interpreter") simply reads the source code and executes the code itself. This usually causes for software written in the language to be open source. However, most interpreted languages nowadays have software that can convert the source code into byte-code (similar to what Java has), whether created by a 3rd party or by the creator(s) of the language itself.
November 21st, 2008, 12:43 PM
Well 1st of all thanks for explainin all the bove stuffs in such a beautiful manner(I try to do it myself many times but fail -Guess its hard to just explain hard concepts in an easy manner)Any ways I m a 2ndyear Computer Science student and umm actually interested in creatin my own programmin language(For fun??---Absolutely not...I m serious...lthough I donch aim to be the Sir Dennis Richie of the new millenia!!) And that too a HLL!!..I know I have a long way to go and thus am trying my best too..but a few suggestions would really help me:
Originally Posted by MBirchmeier
First of all I m quite familiar with C and C++....Plus I have done a 2 semester course on Data Structures..
But now that I ve decided to make my own language I m learnin Assembly language
But after reading ur post I feel Compiler design is a necessity inorder to create ur own language
I ve no problems in doin this course in advance if this is goin to help me achieve my aim.So my question is "Is it a good Idea for me to go for Compiler design after learnin assembly language or is it that there are more prerequisites for that course??"
I said u b4 all i Know is C,C++,Dtastructures...Plz tell me...
And ofcourse if there are any other suggestions u want to give me that will help my goal the please do..!
Comments on this post
November 21st, 2008, 03:55 PM
Hello and welcome to devshed.
Originally Posted by Shirou008
A few suggestions, first when posting please try to use better English. Your post is hard to understand, especially for those whose native language is not English.
As for which individual courses to take, talking to someone at the university might be your best bet. Because these are courses, you're not just studying independently chances are there is a recommended order.
If you're looking to make your own language you don't need to make a compiler... rather you can make an interpreted language, which could run within a C program, but understanding the basics of compilers would provide you a fuller understanding of what's going on.
Hope this helps...
November 21st, 2008, 10:45 PM
Right sir...Next time I ll be serious not to use any stupid english here..Sorry
Originally Posted by MBirchmeier
Sir can you shed more light on the creation of interpreted language that you have just advised me..and what do I need to know inorder to do that(Prerequisites)??
November 24th, 2008, 12:38 PM
Google YACC (Yet Another Compiler Compiler) and try using it. If you have detailed questions, start a new thread. I do not know the history of what you know and what you don't.
Originally Posted by Shirou008