February 27th, 2007, 10:44 PM
[Assembly] Historical Notes on CISC vs. RISC
I originally wrote this for some of the other students in an assembly language class I took late last year. While it is not about programming per se, it may help clarify some of the puzzling aspects of assembly language for some members.
It is somewhat ironic that our professor (and many others who use the Britton textbook and the SPIM simulator) has found it easier to teach assembly programming in a RISC architecture such as MIPS than in the far more complex x86 PC system. You see, it is the CISC designs which are generally associated with extensive assembly language programming, while the RISC design were explicitly intended to make compiling code from a high-level language easier, with the expectation that assembly language would be rarely used except for in the operating system.
To understand why how this reversal came about, you need to know a bit of history. When stored-program computers based on the Von Neumann architecture were first being developed in the late 1940s and early 1950s, the size and complexity of the hardware (which was much more diverse than today, using such disparate technologies as mercury delay lines, CRT-phosphor memories, magnetic drums, wire and tape recorders, ferromagnetic cores, and the ever-present vacuum tubes) meant that the hard ware capacities were extremely limited; it was not uncommon for even a large machine to have only one or two registers and a main memory of a few thousand words (which varied in size from machine to machine). The instruction sets were equally constrained, and often had special-purpose instructions for connecting to the peripherals, taking up the already small instruction space.
For example, the 18-bit Lincoln Labs TX-0 (one of the first transistor-based systems designed) had a grand total of four primary instructions (though one of these was a kind of escape code that told it to use the following word as an instruction from a different group of special-purpose operations) and two registers, X and Y. Many of these designs had what would today be considered poorly developed instruction sets, full of special cases, lacking useful instructions (see footnote), and including instructions that were of little use.
Generally speaking, these early designs had only a single special-purpose register, called the accumulator, on which it could perform most of the common operations such as addition or subtraction; the other operand, if any, usually was either in main memory, or in another special-purpose register called the source, and the result would remain in the accumulator (hence the name).
As the designers began to develop more effective architectural principles based on the success and failures of earlier efforts, hardware became smaller and less expensive, which in turn allowed for greater freedom in the CPU design, so the complexity of the systems grew; by 1966, the PDP-6, a typical design of the time, had sixteen 36-bit general-purpose registers, plus a stack register and frame register. The instruction sets became more complex as well; since a significant percentage of programming even into the 1970s was done in assembly language, it was thought that the instruction sets should be 'rich', that is, they should have a special instruction for nearly every common operation the programmer would want to use.
This culminated with the 32-bit VAX-11 series of minicomputers and mainframes, which had 256 primary instructions, including ones for polynomial evaluation, trigonometric functions, and CRC-checking. Many of these instructions had multiple addressing modes, so that they could operate on registers, on main memory, or some combination thereof; while this increased the convenience of assembly programming, it also meant that there were several special cases the programmer had to be aware of, and it complicated the CPU design substantially.
To provide these instructions, designers began to use 'microcode', which was a set of what could be called firmware-encoded macro instructions that would be handled by the CPU itself; the complex instructions would get broken down into simpler instructions internally, which the programmers wouldn't need to be aware of, an executed as if they were a fixed part of the hardware. Such instructions would often take several system clock cycles to run, and in most processors, there was no pipelining in the modern sense; the system would have to wait until the each instruction was finished before beginning processing the next one.
About the time the VAX was being designed, four other things were happening that would change this attitude. First, high-level languages were becoming the primary method of programming, meaning that the baroque instruction sets of the then-current CPUs was becoming unnecessary.
Second, assembly language programmers were noting that the majority of programs, whether in assembly or in a compiled language, only a handful of instructions were being used: few programs needed a CRC instruction, and hardly any programmers were familiar enough with the entire instruction set to know that the instruction existed and how to use it. A side aspect of this is that the more complex instruction sets were increasingly difficult to learn and to teach.
Third, computer hardware design had advanced to the point where graduate courses in CPU design were being offered; such courses naturally enough stuck to very simple and regular designs, which they found actually outperformed comparable complex systems; they especially noted that the operations that worked directly on memory were generally slower than the process of loading the values into registers, performing the operation, and then swapping them back into memory was.
Finally, the advent of single chip microprocessors meant that it would soon be possible to mass-produce inexpensive computers for a fraction of the cost of minis and mainframes. Early on these were limited by the capabilities of the hardware in much the same ways the first generation of CPUs were (and some of the design mistake of the first generation were repeated as well), but the technology soon grew beyond anyone's expectations, and in some ways, beyond the abilities of the designers to predict what they would later need to support.
This last part had some major repercussions, especially some of the design decisions as microprocessors became more powerful. Chief amongst these was the decision by Intel to try and extend the 8-bit 8080 design into a new 16-bit design, the 8086; they bent over backwards to make the CPU as familiar to older programmers as possible, with the result that the design ended up with some unnecessary complications and limitations, especially in how it addresses larger sections of memory. Also, because the designers had given the system only a small number of registers, most of the operations has several complicate addressing modes to avoid running out of registers.
The fact that this processor was selected for the IBM PC, which would soon become the dominant platform, meant that these weaknesses were of critical importance to millions of users. This was further exacerbated when they extended the design still further with the 80286, which now needed a separate 'protected mode' to access its full abilities while retaining 'normal mode' for backwards compatibility. Some of the design flaws were resolved in the next design, the 80386, but at the cost of exponentially increasing complexity both of the chip itself and of assembly programming for it.
Meanwhile, other chip designers were going in other directions. One, Motorola, started over with a classic 'big' design, the 68000, which resembled a scaled-down version of the VAX in many ways, with 16 general-purpose registers and a complicated instruction set. This would become the CPU for several successful workstations, as well as the original Apple Macintosh line. Later design extensions would complicate this, though not to the extent that the 80x86 design would be.
Still, it was growing clear that the complex instruction sets were growing counter-productive. Thus, many chip designers decided to go in the opposite direction: minimal instruction sets, no microcoding, load/store architectures with few if any operations working on memory directly, large register sets which could be used to avoid accessing main memory whenever possible, and an emphasis on supporting high-level languages rather than assembly programing. This new idea, which was called RISC (reduced instruction set computer), would be the basis several new CPU designs, including the MIPS, the SPARC, the ARM, the PowerPC, and the Alpha. Of these, all but the Alpha remain in use today for certain specialized areas of use, though for the most part the domination by the Windows-x86 system has forced them out of the market for home and business systems.
1) In principle, only a single operation, 'subtract two memory values and branch if the result is negative' (or several variants on this) is sufficient for to allow a Random Access Machine to perform all Turing-computable calculations). There are even (simulated) machines which are designed on this principle, such as the OISC. In practice, of course, such a system would be both tedious and wasteful, especially for the more commonly used operations such as integer arithmetic.
Comments on this post
Last edited by Schol-R-LEA; February 27th, 2007 at 10:50 PM.