Session F1G Incorporating Simulation and Implementation into Teaching Computer Organization and Architecture Bo Hatfield1, Mike Rieker2, Lan Jin 3 Abstract - This paper describes the motivation, the realization, and the experience of incorporating simulation and hardware implementation into teaching computer organization and architecture to computer science students. It demonstrates that learning by doing has helped students to truly understand how a computer is constructed and how it really works in practice. Correlated with textbook material, a set of simulation and implementation projects were created on the basis of the work that students had done in previous homework and laboratory activities. Students can thus use these designs as building blocks for completing more complex projects at a later time. The projects cover a wide range of topics from simple adders up to ALU’s and CPU’s. These processors operate in a virtual manner on certain short assembly-language programs. Specifically, this paper shares the experience of using simulation tools (Altera® Quartus II) and reconfigurable hardware prototyping platforms (Altera® UP2 development boards). Index Terms – Teaching Methods, Computer Organization, Computer Architecture, Simulation, Hardware Prototyping, Altera FPGA Development Kit. INTRODUCTION AND MOTIVATION Traditionally, teaching computer organization and architecture to computer science students can be inefficient if the teaching focus is solely on textbook materials. Students often have to rely on their imaginations to understand the underlying hardware-related concepts. In a college/university level teaching environment, it is still not feasible to build a laboratory that can provide various computer systems and architectures for teaching computer organization and architecture, yet keeping computer science education up-todate requires keeping pace with the rapid evolution of the computing industry. Searching for a more efficient way of teaching computer organization and architecture is an ongoing task. One common solution is to provide an active learning environment for teaching computer organization and architecture by taking advantage of advanced software technology [2]. In other words, through simulating key elements of a processor, students can obtain a better understanding of the internal operation of a processor. Simulation can also help students study and appreciate some of the design trade-offs and performance issues. Although using such a software simulation tool in teaching is very popular, it still has its limitations. We agree with the school of thought that believes that building an original processor from scratch is the only way to ensure that the acquired knowledge persists longer and the interface between hardware and software becomes absolutely clear. However, this affect will only be useful and feasible if a processor of a reasonable size is used. There have been many educators who shared their experiences and utilized similar approaches for teaching computer organization and architecture [6]. While we feel that in some areas our experiences are similar to the previously reported ones, we gained many special experiences that are directly related to the teaching model and the software and hardware tools we used. In this paper, we will introduce and discuss the building of a simple MIPS-like RISC computer in teaching computer organization and architecture. We were motivated to introduce the students to a RISC-based architecture, yet keep the processor simple enough for the student to build from scratch. For example, the instruction set contains no more than sixteen instructions to minimize the implementation complexity. A more detailed description of the proposed simple computer are given in our textbook, which we use in our teaching [1]. It is worth noting that working on a non-MIPS project in the lab portion of a class can provide an in-depth understanding of processor design, however, it can become overwhelming in a one-semester long undergraduate computer organization and architecture course. A non-MIPS project can be used in a graduate level computer architecture course. This paper is structured as follows. We first briefly introduce the proposed computer. Then we share our thoughts on the simulation tools and the hardware prototyping platforms used. We then discuss the design issues and the implementation results, which are organized according to the logical modules in our project sequence. Student learning experiences and usage of the software and hardware tools will be reported. Finally, we present our conclusions and future work. A SIMPLE RISC COMPUTER Many high-performance processors have been developed based on the RISC (Reduced Instruction Set Computer) design philosophy. Some of the key concepts of the instruction-set architecture of a RISC machine include: 1 Bo Hatfield, Salem State College, bo.hatfield@salemstate.edu Mike Rieker, Salem State College, wmrieker@nii.net 3 Lan Jin, California State University, Fresno, lanj@csufresno.edu 2 0-7803-9077-6/05/$20.00 © 2005 IEEE October 19 – 22, 2005, Indianapolis, IN 35th ASEE/IEEE Frontiers in Education Conference F1G-18 Session F1G • • • • Only operations and addressing modes that are frequently used are included in the instruction set for direct implementation by hardware. Basic arithmetic-logic operations use a three-address register-register instruction format. The instruction format has fixed length and regular fields, so as to make instruction decoding easier and faster. Only load/store instructions access memory for data. The common features of RISC computers described above have enabled the formulation of the specification of a typical educational RISC machine. One example of such a machine is the DLX computer derived from the MIPS architecture, with a subset of the MIPS64 instruction set and a 32-bit word length [5]. Our goal is to design a processor that will introduce the RISC features to the students, but is simple enough for the students to complete the hardware implementation within a limited time frame, e.g., one semester. Without loss of generality, we propose a computer that is like the MIPS, but has a smaller and much simpler instruction set than the MIPS [1]. Simplification is necessary for easier implementation. The simplification consists of decreasing word length, instruction length, size of the instruction set, and memory capacity. The above-mentioned RISC design features remain unchanged. Table I lists all of the operations defined for this computer. TABLE I THE INSTRUCTION SET OF A SIMPLE MIPS-LIKE RISC COMPUTER OPcodes Mnemonics Operation 0000 ADD Rd ← (Rs1)+(Rs2) 0001 ADDI Rd ← (Rs1)+#imm 0010 SUB Rd ← (Rs1)-(Rs2) 0011 MUL Rd, Rd+1 ← (Rs1)*(Rs2) Rd ← (Rs1) ∧ (Rs2) 0100 AND Rd ← (Rs1) ∨ (Rs2) 0101 OR 0110 XOR Rd ← (Rs1) ⊕ (Rs2) 0111 LSH Rd ← (Rs1)<<imm if imm < 0 Rd ← (Rs1)>>imm if imm > 0 1000 LW Rd ← (Mem[(Rs)+imm]) 1001 SW Mem[(Rs)+imm] ← (Rd) 1010 SLT 1011 SGT 1100 1101 1110 1111 BEG BNE JR JAL if (Rs1) < (Rs2), Rd ← 1, else Rd ← 0 if (Rs1) > (Rs2), Rd ← 1, else Rd ← 0 if (Rs1) = 0, PC ← (PC)+imm if (Rs1) ≠ 0, PC ← (PC)+imm PC ← (Rs1) Rd ← (PC)+1; PC ← (Rs1) The specification summary of our simple computer is as follows: • Word, instruction, and address length: 16 bits. • General-purpose register set: 16x16 bits, (R0)=0. • Memory capacity: 64K words, word-addressable. • Instruction format: 4-bit OPcode and 3 address fields of 4 bits each. • Load-store format: OPcode, Rs, Rd, imm (used as a displacement in displacement addressing mode). • • • ALU format: OPcode, Rs1, Rs2, Rd. ALUimm format: OPcode, Rs1, imm, Rd. Branch format: OPcode, Rs1, 8-bit imm (signextended in PC-relative addressing mode). The MUL instruction multiplies two 16-bit operands (Rs1 and Rs2) to obtain a 32-bit product in two consecutive registers Rd and Rd+1, where Rd must be an even-numbered register so that Rd+1 can be obtained by simply XORing the LSB of the register number of Rd. The LSH instruction uses the ALUimm format with imm as the shift amount. It logically shifts operand (Rs1) left or right, depending on the sign of imm as minus or plus, respectively. The SLT and SGT use the ALU format as two compare instructions. They set the flags ‘1’ or ‘0’ in Rd rather than in a condition code. This is typical for a RISC-style machine. The JAL and JR use the ALU format with some fields not used. In addition to their standard usage, they are also used for the special cases of jump to subroutine and return from subroutine, respectively. SOFTWARE TOOLS AND HARDWARE PLATFORMS Our main objective in searching for an appropriate tool was to design, develop, simulate and prototype the proposed simple computer system using the recent advances in ComputerAided Design (CAD) tools for logic synthesis and simulation. With respect to a hardware platform, we are aware of the fact that Field Programmable Gate Arrays (FPGA’s) have grown in both size and performance as the cost has gone down. Moreover, the number of gates on a FPGA chip has grown so large that many complex applications can now be implemented with them. CAD tools have evolved from a simple hardware description language and elementary functional block diagrams to sophisticated hierarchal design layout and high level hardware design languages such as Verilog and VHDL with pre-made blocks of code called “cores”, which can be customized for the desired function. In addition, ease of modification and enhancement are inherent in the process of design using a FPGA. These are the motivations for taking advantage of FPGA’s in a system for the hardware implementation of our teaching projects. In searching for an appropriate platform and associated tools, we settled on using Altera’s University Program Education Kit, which incorporates 70,000 gates on a FPGA device. The major factor that helped us to make our final decision was Altera’s dedicated support for education programs. The package provides all of the necessary tools for creating and implementing digital logic designs [4]. Major features include the Quartus II development software, a 240pin FPGA device, and a parallel port download cable. The software supports schematic capture and text-based HDL (Hardware Description Language) design entry [3] [4]. It also provides design programming, compilation, and verification support for all devices on the UP2 board (EPM7128S and EPF10K70 devices). The UP2 Education Board is a stand- 0-7803-9077-6/05/$20.00 © 2005 IEEE October 19 – 22, 2005, Indianapolis, IN 35th ASEE/IEEE Frontiers in Education Conference F1G-19 Session F1G alone experimental board based on a FLEX10K device that offers 70,000 gates [4]. When used with the Quartus II software, the board provides a superior platform for prototyping intermediate to advanced digital design projects. The feedback we received from the students concerning the use of this package was very positive. Ease of use of the software and availability of on-line support documentation are among the primary merits mentioned in the student feedback. PROJECTS In order to avoid having students be overwhelmed by the complexity of a complete computer system design, we took the approach of modularization. We first implemented the entire system (from software simulation to hardware implementation) as a top-level system linked to a collection of sub-circuits. These sub-circuits directly represent logical modules in the computer, namely the ALU (Arithmetic Logic Unit) module, the datapath module, and the CU (Control Unit) module. Each module can be removed from the system and replaced by a student’s design. In this way, a student gets the opportunity to see how his/her design fits into the entire system and gains early experience in integration testing at the system level. Figure 1 shows the complete top-level system design using Altera’s Quartus II CAD tool. This schematic diagram of the design clearly shows the modularization characteristics of the system. In the following subsections, we will briefly discuss the three most important modules. I. The ALU Module After completing a few tutorials on how to use Altera’s Quartus II design software and a preliminary project on a 16bit adder design and implementation, students were ready to start the project on the design and implementation of the first logical module, the ALU. As a first step, students can design an ALU module on a standard 16-bit adder of carry look-ahead structure. Since the ALU usually serves as the center of the CPU at the registertransfer level, it should provide the convenience of transferring either operand OP1 or operand OP2 directly to its output. Actually, to satisfy the limited functionality of the operation repertoire of the instruction set, we need to include no more than 8 (1 transfer, 3 arithmetic, and 4 logic) operations in the function table of the ALU as shown in Table II. TABLE II FUNCTION TABLE OF THE ALU Function f2 f1 f0 000 001 010 011 100 101 110 111 OP1 OP2 – OP1 OP1 – OP2 OP1 + OP2 OP1 OR OP2 OP1 AND OP2 OP1 XOR OP2 OP1 XNOR OP2 The simplest way of designing this ALU is to use a set of 16 multiplexers for collecting data from (1) OP1 and OP2, (2) a simple 16-bit adder for the 3 arithmetic operations, and (3) a set of XOR-, AND-, and OR-gates for logic operations. A circuit diagram for the ALU module can be found in [1]. We subdivided the ALU design project into two parts: (1) logic design and (2) simulation and hardware implementation. The work involved in the first part was assigned in a homework assignment, serving the role of pre-lab preparation. Students received complete functional descriptions of the ALU module, including a detailed description of its interface with the rest of the system. In this way, the complete design could be easily brought into the whole system. The second part of the project was conducted in a lab project, in which students took their designs completed in the first part and conducted a module-level full-function simulation in the Quartus II environment. Then, the top-level design files, excluding the pre-designed ALU module, were given to the students so that they could integrate the ALU module into the system. After the integration, students conducted a system-level simulation to make sure that they work together as a complete system. Finally, the integrated system was programmed into the FPGA device before the actual physical test was conducted. II. The Datapath Module After designing the ALU, the students design the second module of the CPU, the general-purpose register set. First, they should determine the interface between the register set and the other components of the CPU. Since any arithmeticlogic operation in the instruction set must be executed in just one cycle, the general-purpose register set must have two read ports and one write port to be active at the same time. Also, the general-purpose register set must receive two sourceregister numbers and one destination-register number as its source operands and destination operand, respectively. An auxiliary register, A, is used for the execution of instructions for accumulating the sum of partial products (MUL) and performing shift operations (LSH). III. Microprogrammed Control Unit Module Given that the CPU is composed of two parts, the datapath and the control unit, the next project should be the design of the microprogrammed control unit. The design of a microprogrammed control unit consists of the microinstruction format and the microprogram flowchart. It is important to note that the designs of the CPU datapath, the microinstruction format and the microprogram flowchart are closely interrelated. In particular, during the CPU datapath design, the microinstruction format should have already been taken into consideration, except for certain fields. These fields are those used for the sequencing control of the microprogram itself, such as the loop counter, a branch condition, and the next microinstruction. 0-7803-9077-6/05/$20.00 © 2005 IEEE October 19 – 22, 2005, Indianapolis, IN 35th ASEE/IEEE Frontiers in Education Conference F1G-20 Session F1G FIGURE 1 THE TOP-LEVEL VIEW OF THE IMPLEMENTATION IN THE QUARTUS II DESIGN ENVIRONMENT REPORT AND ANALYSIS OF STUDENT EXPERIENCES Over a period of 20 years, the authors have taught computer organization and architecture at both the undergraduate level and the graduate level. Over the years, the authors have gained first-hand teaching experiences and observed student learning experiences while trying out different textbooks and designing various lab experiments. Some problems were reported in student use of the labs that associated with those texts. The two most critical issues reported by the students are the unmanageable size of the projects and lack of hardware implementation. This experience allowed the authors to co-author and publish a textbook [1] in which a simple RISC processor is proposed and studied. A set of labs were developed which were associated with the teaching of the processor. One of the most important motivations for this work was to ensure that a student, taking a one-semester computer organization and architecture undergraduate course with little or no logic design experience, the processor and the associated labs are feasible for students to study, design and implement. The authors’ teaching experience shows that, considering the complexity of any commercial processor, this processor is more appropriate for teaching and learning. During the time period between 1992 and 2005 in California State University-Fresno and Salem State College, there were a total number of 800 students who took the computer organization and architecture course and completed the associated lab assignments. These two institutions have typical undergraduate computer science programs in which (1) no prior logic design experience is required and (2) passing a one-semester logic design course is required as the pre- requisite for a computer organization and architecture course. The course is offered as a junior/senior undergraduate course. Table III shows the lab sequence that was used by both institutions. The only difference is that in (1) all six labs were conducted, while in (2), the first three labs were conducted in the earlier logic design course and the last three were used in the computer organization and architecture course. Note that each lab consists of three parts: Part I. Design and/or Analysis Part II. Implementation using simulation software Part III. Test As stated in the first section, another key motivation was to accomplish not only a fully functional simulation of the simple RISC processor, but also achieve a physical implementation of it. The authors worked had worked hard in finding an appropriate tool for achieving this goal. After discovering the University Program supported by Altera, the authors decided to use Altera’s Quartus II CAD software tool and the FPGA device on the UP2/UP3 board. After two semesters of using Altera’s CAD tools, the student experiences and feedback were very positive. Students were surveyed regarding their use of the lab materials and the Altera design/implementation tools. The survey results coincide with the findings of the faculty’s interaction with lab materials during teaching and lab supervising. These can be summarized as the follows. • Altera has a very good university program • Altera’s tool support is very good • Quartus II is a very good tool to use • The tutorials are easy to follow and understand 0-7803-9077-6/05/$20.00 © 2005 IEEE October 19 – 22, 2005, Indianapolis, IN 35th ASEE/IEEE Frontiers in Education Conference F1G-21 Session F1G • Quartus II free web edition makes it so easy for the students to work at his/her own computer TABLE III LABORATORY DESIGN Description Lab Part I Part II Use the clock devices in the simulation software Study static and dynamic behavior of to study waveform diagrams to simulate various basic logic gates. timing sequences: binary, reverse binary, Gray code, etc. Part III Use Boolean algebra, especially DeMorgan’s Theorem, to study equivalent gates and equivalent circuits Study various full adder circuits: XOR-based, XNOR-based, all-NAND, all-NOR, etc. Create the waveform diagram of a full-adder circuit to study the phenomena of gate delay and hazards. Study the ripple-carry parallel adder, including performing add/subtract operations to implement two’s complement or sign-magnitude algorithms. Implement combinational circuits using multiplexers, e.g., code conversion (binary code to gray code or vice versa), seven-segment displays, etc. Design and simulate counters and shift register as well as study various shift operations. Implement and test multi-functional counters and shift register circuits. Study a bit-serial adder to understand the general model of sequential circuits and the timing control of bitserial add/subtract operations. Test the circuits of 16-bit carry lookahead adder in two different structures: serial-parallel vs. fullyparallel. Compare their carrypropagation delays. Using some specified methods to design, implement and test the circuits of a customized ALU according to a given function table. Design and implement a bus-based CPU datapath, and create a memory module using simulation software Design a timing sequence of control words to test the created CPU datapath for executing a few bus communication operations Design and implement a hardwired or a microprogrammed control unit. Test the CPU 1 Getting Started & Basic Logic Gates 2 Combinational Logic Circuits 3 Sequential Logic Circuits 4 ALU 5 CPU Datapath Construct buses on the tri-state devices in busdriver chips 6 CPU Control Design, construct, and test a general-purpose register-based CPU Study basic flip-flop circuits and/or edgetriggered flip-flop circuits, including static and dynamic behaviors. Use simulation software to create sub-circuits. The circuits of SN74181 and SN74182 industrial chips are good examples of creating the building blocks. We quote some of student feedback verbatim as follows. • process, and very understandable output. I like to use Quartus II to build circuit!” “Quartus II helped me to further my understanding in computer organization and architecture by helping me visualize a working circuit while actually building it. With its available tools and customizability, my learning increased greatly.” “The UP2 board is a great learning addition.” “Also, the ability to download this software (i.e., Quartus II web edition) and use it at home with ease helped greatly.” “The tool is free and available for home use. Tutorials are excellent.” “The simulation and hardware implementation tool • helped me in learning Computer Organization and Architecture” • “Using the UP2 board helps to see the creation comes to ‘life’”. • • “The tools are very effective in bringing the ideas in the textbook alive” • • “Quartus II is a good program with nice interface.” • “I though it (Quartus II CAD tool) was pretty slick in • the way it handles all the interconnects, very easy. The only thing I got a bit confused about was when starting a mega-function (built-in), but after it was CONCLUSION AND FUTURE PLAN shown in class, even that was not bad at all.” • “UP2 board was very good. Being able to assign pins, The proposed computer system makes it possible to utilize doing timing diagrams. Everything seemed to work each logical module in the design of a sequence of simulation as theoretically thought of.” and implementation projects for teaching computer • “UP2 board helped me in learning. It was nice to see organization and architecture. A prototype of the system has the actual physical implementation of our designs. It been implemented in a FPGA device on an Altera UP2 board. is more than just theory this way, and makes the use Its top-level diagram clearly shows the modularization of the of the design tool more interesting.” system. Students were given an opportunity to perform hands• “I really thought that Quartus II was useful for on design, simulation, implementation, and testing of an computer organization and architecture. It was easy individual module while developing a complete understanding to learn. It has many tools to streamline the building 0-7803-9077-6/05/$20.00 © 2005 IEEE October 19 – 22, 2005, Indianapolis, IN 35th ASEE/IEEE Frontiers in Education Conference F1G-22 Session F1G of the entire system. Moreover, each module can be redesigned following different requirements or guidelines as long as it maintains the same interface with the system (e.g., Our students had designed and implemented a different microprogrammed control unit module that achieved better system-level performance). The student feedback relating to the labs and to the usage of Altera’s CAD tools were very positive. In the near future, our plan is to introduce Altera tools into teaching graduate-level courses. A typical project would be the design, implementation, and test of a pipelined processor. In the mean time, developing an assembler for the processor greatly supports the teaching. Students can write assembly language programs that can be loaded onto the processor and executed directly by the hardware. ACKNOWLEDGMENT REFERENCES [1] Jin, L. and Hatfield, B. J. Computer Organization: Principles, Analysis & Design, 1st Edition, 2004. [2] Yurcik, W., Wolffe, G.S., Holliday, M.A., "A Survey of Simulators Used in Computer Organization/Architecture Courses", Proceedings of the 2001 Summer Computer Simulation Conference (SCSC), July 2001. [3] Hamblen, J.O., Furman, M.D., Holliday, M.A., Rapid Prototyping of Digital Systems, 2nd Edition, 2001. [4] Altera University Program UP2 Education Kit User Guide, Altera Corporation December, 2004. http://www.altera.com. [5] Patterson, D.A. and Hennessy, J.L. Computer Architecture: A Quantitative Approach, 3rd Edition, 2003. [6] Calazans, N. L. V., Moraes, F. G., Marcon, C. A. M., “Teaching Computer Organization and Architecture with Hands-on Experience”, Proceedings of the 32nd ASEE/IEEE Frontiers in Education Conference, November, 2002. The authors would like to acknowledge the initial support of Altera, in particular its University Program. 0-7803-9077-6/05/$20.00 © 2005 IEEE October 19 – 22, 2005, Indianapolis, IN 35th ASEE/IEEE Frontiers in Education Conference F1G-23