Manish Kulkarni Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 mmk0002@auburn.edu 4/28/2008 Computer Architecture & Design (6200) Class Presentation 1 Overview What is CISC and Why to learn? History Architecture Typical x86 design Characteristics & Addressing modes CISC Vs RISC Example Programs The Performance Equation FAQs Recent Developments & Future Scope Resources Questions 4/28/2008 Computer Architecture & Design (6200) Class Presentation 2 What is CISC? Definition: Pronounced "sisk" and standing for Complex Instruction Set Computer, is a Microprocessor Architecture that aims at achieving complex operations with single instructions and favors the richness of the instruction set (typically as many as 200 unique instructions) over the speed with which individual instructions are executed. Why should I know about CISC? Today’s computers still use processors which are based on CISC designs It has been a prominent architecture since 1978 Most Emerging Processor designs combine features of CISC and RISC to create better designs. 4/28/2008 Computer Architecture & Design (6200) Class Presentation 3 History Generation First introduced Prominent Consumer CPU linear / physical address brands space 1 (IA-16) 1978 Intel 8086, Intel 8088 2 1982 Intel 80186, Intel 80188, NEC V20 see above 2 1982 Intel 80286 16-bit (30-bit virtual) / 24- MMU, for protected mode bit (segmented) and a larger address space 3 (IA-32) 1985 Intel386, AMD Am386 32-bit (46-bit virtual) / 32- 32-bit instruction set, bit MMU with paging 4 1989 Intel486 see above RISC-like pipelining, integrated FPU, on-chip cache 5 1993 Pentium, Pentium MMX see above superscalar, 64-bit databus, faster FPU, MMX 5/6 1996 Cyrix 6x86, Cyrix MII see above register renaming, speculative execution Pentium Pro, AMD K5 μ-op translation, PAE (not see above / 36-bit physical K5), integrated L2 cache (PAE) (not K5) 6 1995 Notable (new) features 16-bit / 20-bit (segmented) first x86 microprocessors hardware for fast address calculations, fast mul/div etc Continued…. 4/28/2008 Computer Architecture & Design (6200) Class Presentation 4 Continued…. Prominent Consumer CPU linear / physical address brands space AMD K6/-2/3, Pentium see above II/III Generation First introduced 6 1997 7 1999 Athlon, Athlon XP see above 7 2000 Pentium 4 see above 6/7-M 2003 Pentium M see above L3-cache support, 3D Now, SSE superscalar FPU, wide design (up to three x86 instr./clock) deeply pipelined, high frequency, SSE2, hyperthreading optimized for low power 8 (x86-64) 2003 Athlon 64 64-bit / 40-bit physical in first impl. x86-64 instruction set, ondie memory controller 8 2004 Prescott see above very deeply pipelined, very high frequency, SSE3 9 2006 Intel Core, Intel Core 2 see above (some are 32bit only) low power, multi-core, lower clock frequency see above monolithic quad-core, 128 bit FPUs, SSE4a Hyper Transport 3, native memory controller, on-die L3 cache 10 4/28/2008 2007-2008 AMD Phenom Computer Architecture & Design (6200) Class Presentation Notable (new) features 5 Architecture A typical x86 Architecture Intel 8086 Architecture, the 1st member of x86 family 4/28/2008 Computer Architecture & Design (6200) Class Presentation 6 Characteristics o o o o CISC are Mostly Von Neumann Architecture (There are few exceptions) Same bus for program memory, data memory, I/O, registers, etc Generally Micro-coded ,Variable length instructions Segmentation is possible with Segment Register s like DS, ES and an offset which can be common to all segments. o Many powerful instructions are supported, making the assembly language programmer’s job much easier. o Physical Memory Extension Possible Addressing modes o o o o o o o Register Addressing Mode Memory Addressing Modes Displacement Only Addressing Mode Register Indirect Addressing Modes Indexed Addressing Modes Based Indexed Addressing Modes Based Indexed Plus Displacement Addressing 4/28/2008 Computer Architecture & Design (6200) Class Presentation 7 CISC Vs RISC Example Program Main Memory General Purpose Registers ALU 4/28/2008 Computer Architecture & Design (6200) Class Presentation 8 Consider following task of Multiplication 15 20 Operands: M[2:3] = operand 1 (15) M[5:2] = operand 2(20) Task : Multiplication Result: M[2:3] <= result 4/28/2008 Computer Architecture & Design (6200) Class Presentation 9 The CISC Approach Instruction : Operations: 1. MULT 2:3, 5:2 2. 3. 4. Loads the two operands into separate registers Multiplies the operands in the execution unit Then stores the product in the some temporary register Stores value back to memory location 2:3 • MULT is what is known as a "complex instruction." • Operates directly on the computer's memory banks • Does not require the programmer to explicitly call any loading or storing functions. • closely resembles a command in a higher level language. e.g. a ‘C’ statement "a = a * b." 4/28/2008 Computer Architecture & Design (6200) Class Presentation 10 The RISC Approach Instructions : Operations: 1. LW LW MULT SW A, 2:3 B, 5:2 A, B 2:3, A 2. 3. 4. Load operand1 into register A Load operand2 into register B Multiply the operands in the execution unit and store result in A Store value of A back to memory location 2:3 • These set of Instructions is known as a “Reduced Instructions." • Cannot Operate directly on the computer's memory banks • Requires the programmer to explicitly call any loading or storing functions. • RISC processors only use simple instructions that can be executed within one clock cycle 4/28/2008 Computer Architecture & Design (6200) Class Presentation 11 CISC RISC Primary goal is to complete a Primary goal is to speedup task in as few lines of assembly as possible Emphasis on hardware Includes multi-clock complex instructions Memory-to-memory: "LOAD" and "STORE" incorporated in instructions Small code sizes High cycles per second Variable length Instructions 4/28/2008 individual instruction Emphasis on software Single-clock, reduced instruction only Register to register: "LOAD" and "STORE" are independent instructions Large code sizes Low cycles per second Equal length instructions which make pipelining possible Computer Architecture & Design (6200) Class Presentation 12 The Performance Equation The following equation is commonly used for expressing a computer's performance ability: 1 2 The CISC approach • minimizes the number of instructions per program (2) • sacrificing the number of cycles per instruction. (1) RISC does the opposite • reduces the cycles per instruction (1) • sacrificing number of instructions per program (2) 4/28/2008 Computer Architecture & Design (6200) Class Presentation 13 FAQs Which one is faster? Well, it is commonly accepted that RISC ISA's should make computers faster. The main reason why is because RISC computers figure out more words in a shorter amount of time due to pipelining. So why isn't my computer a RISC? • CISC ISA's were implemented in the first personal computers • With more people buying computers, CISC isa's became more prominent • Software (especially OS) was developed and "translated" so that personal computers speaking x86 would be able to interact with its users • Because there was so much software written for computers "speaking" x86, people continued to buy those computers. • If we tried to switch to another ISA, we would not have all of the software choices we have now. 4/28/2008 Computer Architecture & Design (6200) Class Presentation 14 So why would someone want to develop another ISA? • x86 (and CISC) make poor use of the faster hardware we have now. • Another problem with x86 is that people have been trying to make it faster for a long time, at least 20 years, and after a while you have found most of the ways to speed the computer up significantly Why don't we just switch to RISC? • Although it is not used on your desktop PC, RISC ISA's are implemented in many mainframe computers. • Programmers have been trying to make RISC faster for a long time, and they have found many of the areas in which it is able to be sped up significantly. 4/28/2008 Computer Architecture & Design (6200) Class Presentation 15 Where are we running into problems speeding up RISC and CISC? We are running into problems with speeding up the computer in 2 areas 1. Branching Decisions and predictions consume good amount of processing time 2. Access to memory to fetch instruction and data So What we are going to do? 4/28/2008 Computer Architecture & Design (6200) Class Presentation 16 Recent Developments & Future Scope o The terms RISC and CISC have become less meaningful with the continued evolution of both CISC and RISC designs and implementations. o Modern x86 processors also decode and split more complex instructions into a series of smaller internal "micro-operations" which can thereby be executed in a pipelined (parallel) fashion, thus achieving high performance on a much larger subset of instructions. o Attempts have been made to combine features of both RISC and CISC to develop a new approach o Intel has teamed up with Hewlett-Packard to design a new type of ISA. They are calling it IA-64 (Intel Architecture 64) 4/28/2008 Computer Architecture & Design (6200) Class Presentation 17 IA-64 What is IA-64? • IA-64 is a new instruction set architecture. • IA-64 seeks to address: branch delays and memory latency. What main principles is IA-64 designed around? • IA-64 seeks to exploit instruction level parallelism to the highest degree. • Intel and HP have called their method of exploiting this parallelism in IA-64 EPIC (Explicitly Parallel Instruction Computing). • EPIC simulates parallelism by having the compiler find what instructions can be executed in parallel and "explicitly" package them for the CPU. How does IA-64 help with branch delays? • IA-64 takes a unique approach of prediction to reduce the consequences of branch delays. • The compiler can append a predicate to any instruction it chooses. The compiler will append predicates to instructions that depend on the outcome of a branch in order to help reduce branch penalties. 4/28/2008 Computer Architecture & Design (6200) Class Presentation 18 How does IA-64 deal with memory latency issues? • Memory latency occurs because CPU processing speed is significantly faster than the speed of fetching data from memory. • IA-64 suggests a new way to eliminate some memory latency problems, speculative loading. IA-64 Realities: • "A study in ISCA '95 by S. Malhlke, et. al. demonstrated that predication removed over 50% of the branches and 40% of the mis-predicted branches from several popular benchmark programs." ( http://www.hp.com/esy/technology/ia_64/products/isapress.html ) • IA-64 lack compatibility with Intel x86 and HP PA-RISC architectures, so this additional compatibility logic will take lot of die space. • Presently, the compilers are in experiment phase and IA-64 has no OS support. 4/28/2008 Computer Architecture & Design (6200) Class Presentation 19 Resources o http://www.pctechguide.com/glossary/WordFind.php?wordInput=CISC o http://www.cs.umd.edu/class/fall2001/cmsc411/projects/IA64/ o http://cse.stanford.edu/class/sophomore-college/projects00/risc/risccisc/index.html o http://en.wikipedia.org/wiki/Complex_instruction_set_computer o http://en.wikipedia.org/wiki/X86 o http://arstechnica.com/cpu/4q99/risc-cisc/rvc-6.html 4/28/2008 Computer Architecture & Design (6200) Class Presentation 20 Questions ?? 4/28/2008 Computer Architecture & Design (6200) Class Presentation 21