Lecture 2-Berkeley RISC Penghui Zhang Guanming Wang Hang Zhang 1. What Is RISC? 1.1 RISC idea developed from the realization that the vast majority of programs did not use the vast majority of a processor’s instructions. including only those instructions that were really used using the space that had been used for the removed circuitry for other circuits that would speed the system up instead. 1. What Is RISC? 1.2 How RISC achieves its goal adding many more registers small bits of memory hold temporary values that can be accessed at negligible cost the speed of the processor would be more closely defined by its clock speed 1. What Is RISC? 1.3 Comparison Between RISC and CISC 2. RISC I 2.1 RSIC I Design Goals High-level language programming Cost-effective system in both hardware and software Simple, one-word(32-bits) long “cost” of each statement type 2. RISC I 2.2 RISC I Architechture 31 instructions in a few similar formats, all 32 long Execution time Instructions between registers and memory 2. RISC I 2.3 Micro-architechture of RISC I Instruction executions pattern 1. Read two register 2. 2. Perform an operation on them 3. Store the result EX. Data-Path of RISC I Chip 2. RISC I 2.4 Design environment of RISC I UNIX environment on a VAX 11/780 Regular parts of the chip Control section 3. RISC II 3.1 Background RISC II microprocessor • Meets the requirements by the code analysis. • The majority of the chip is occupied by the data unit. • Unlike normal microprocessors were dominated by control. • Majority of the data unit consists a huge file of registers — 138 of them. 3. RISC II • 3.1 Background • RISC work at Berkeley had turned to the new Blue design from Gold Design. • The savings due to the new design were tremendous. • Gold contained 78 registers in 6 windows. • Blue contained 138 registers 8 windows of 16 registers each another 10 globals. • The final Blue design, fabbed as RISC II, implemented all of the RISC instruction set with only 39,000 transistors. 3. RISC II The RISC II register file 3. RISC II 3.2 Difference The key difference was simpler cache circuitry that eliminated one line per bit The other major change was to include an "instructionformat expander“ RISC II proved to be much more successful in silicon and in testing outperformed almost all minicomputers on almost all tasks. 3. RISC II 3.3 Architecture Of RISC II It is the evolution of the RISC I design. Reading is accomplished by selectively discharging one of the two precharged bit Line busses RISC was design based two-bus and two port register cell. The RISC II architecture used a two-stage pipeline. 3. RISC II Data Path of RISC II 3. RISC II 3.4 Implementation Three Machine Cycles: Instruction fetch and decode. Register read, operate, and temporary latching of result. Write result back into the register file. These three cycles are overlaped New instruction begins every machine cycle. Except for Load and Store instructions. 4. Architectural inheritance Features used A load-store architecture Fixed-length 32-bit instructions 3-address instruction formats 4. Architectural inheritance 4.2 Features rejected 4.2.1Register windows The register banks on the Berkeley RISC processors incorporated a large number of registers, 32 of which were visible at any time Procedure entry and exit instructions moved the visible ‘window’ to give each procedure access to new registers The principal problem with register windows is the large chip area occupied by the large number of registers This feature was therefore rejected on cost grounds 4. Architectural inheritance 4.2 Features rejected 4.2.2 Delayed branches Branches cause pipelines problems since they interrupt the smooth flow of instructions Most RISC processors ameliorate the problem by using delayed branches where the branch takes effect after the following instruction has executed On the original ARM delayed branches were not used because they made exception handling more complex In the long run this has turned out to be a good decision since it simplifies re-implementing the architecture with a different pipeline 4. Architectural inheritance 4.2 Features rejected 4.2.3 Single-cycle execution of all instruction Although the ARM executes most data processing instructions in a single clock cycle, many other instructions take multiple clock cycles Single cycle operation of all instructions is only possible with separate data and instruction memories, which were considered too expensive for the intended ARM application areas The ARM was designed to use the minimum number of cycles required for memory access