ECE 5367 4436 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM – 2:30PM Prerequisites: ECE 4436 ECE 5367 4436 Instructor: Ji Chen Email: jchen18@uh.edu Tel: (713)-743-4423 Office: W328 Office Hour: T TH 2:30-3:30 or by appointment TA: None ECE 5367 4436 ECE 5367 4436 Course Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. Introduction, basic computer organization Instruction formats, instruction sets and their design ALU design: Adders, subtracters, logic operations Multiplication, division, floating point arithmetic Datapath design Control design: Hardwired control, microprogrammed control Pipelining Memory systems I/O ECE 5367 4436 Web: http://www.egr.uh.edu/courses/ece/ECE5367/ Grading HW/Quiz/Lab 10 % Project 15 % Exam 1 25 % Exam 2 25 % Exam 3 25 % Academic Honesty Statement ECE 5367 4436 Computer Organization and Design: The Hardware/Software Interface by David A. Patterson, John L. Hennessy, 3rd edition Required NOT REQUIRED ECE 5367 4436 Home works/quiz: Labs: Laboratory assignments may be worked in teams of two (2); however, there should be no collaboration between teams . . Lab assignments turned in late will be penalized 25 points for each calendar day. Both students in a team will receive the same grade for the project. Projects: Exams: There will be several graded homework/lab assignments. Home works turned in late will be accepted only under extraordinary circumstances. Teams of four (4): describe computer architecture of a modern technology two mid-term exams, and one final exam. A missed exam will result in a grade of zero Let me know immediately if you have any situation Final Exam - TBD Grading: Your final grade will be computed as follows: HW/Quiz/Lab 10 % Project 15 % Exam 1 25 % Exam 2 25 % Exam 3 25 % ECE 5367 4436 • Since 1946 all computers have had 5 components Processor Input Control Memory Datapath Output ECE 5367 4436 • TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20 MBus Module SuperSPARC Floating-point Unit L2 $ Integer Unit Inst Cache Ref MMU Data Cache CC MBus L64852 MBus control M-S Adapter SBus Store Buffer Bus Interface Message Bus (Mbus) DRAM Controller SBus DMA SBus Cards SCSI Ethernet STDIO serial kbd mouse audio RTC Floppy ECE 5367 4436 Computer Architecture Application Operating System Compiler Instr. Set Proc. Firmware I/O system Instruction Set Architecture Datapath & Control Digital Design Circuit Design Layout • Coordination of many levels of abstraction • Under a rapidly changing set of forces • Design, Measurement, and Evaluation ECE 5367 4436 Forces on Computer Architecture Technology Programming Languages Applications Computer Architecture Operating Systems Cleverness History ECE 5367 4436 Mixed-Signal ECE 5367 4436 Where are We Going?? Input Multiplier Input Multiplicand 32 Multiplicand Register LoadMp 32=>34 signEx 32 34 34 1 0 34x2 MUX Multi x2/x1 34 34 Arithmetic Sub/Add 34-bit ALU Control Logic 32 32 2 ShiftAll 2 LO register (16x2 bits) Prev HI register (16x2 bits) Booth Encoder Extra 2 bits 2 "LO [0]" 34 LO[1] Single/multicycle Datapaths <<1 32=>34 signEx ENC[2] ENC[1] ENC[0] LoadLO ClearHI LoadHI 2 32 Result[HI] LO[1:0] 32 Result[LO] 1000 “Moore’s Law” Exec Mem WB IFetchDcd Exec Mem WB Performance Processor-Memory Performance Gap: (grows 50% / year) 10 DRAM 9%/yr. DRAM (2X/10 yrs) 1 198 198 0 1 198 198 2 198 3 4 198 5 198 698 1 198 7 8 198 9 199 099 1 199 2 199 199 3 4 199 5 199 6 199 1 799 8 199 900 2 0 IFetchDcd ECE 5367 Spring 08 100 µProc CPU 60%/yr. (2X/1.5yr) Time IFetchDcd Exec Mem WB IFetchDcd Exec Mem WB Pipelining I/O Memory Systems ECE 5367 4436 • Purchasing perspective – Given a collection of machines, which has the • Best performance ? • Least cost ? • Best performance / cost ? • Design perspective – Faced with design options, which has the • Best performance improvement ? • Least cost ? • Best performance / cost ? • Both require – basis for comparison – metric for evaluation • Our goal: understand cost & performance implications of architectural choices ECE 5367 4436 Two Notions of “Performance” Plane DC to Paris Speed Passengers Throughput (pmph) Boeing 747 6.5 hours 610 mph 470 286,700 Concorde 3 hours 1350 mph 132 178,200 Which has higher performance? • Time to do the task (Execution Time) – execution time, response time, latency • Tasks per day, hour, week, sec, ns. .. (Performance) – throughput, bandwidth Response time and throughput often are in opposition ECE 5367 4436 Definitions • Performance is in units of things-per-second – bigger is better • If we are primarily concerned with response time – performance(x) = 1 execution_time(x) " X is n times faster than Y" means n = Performance(X) ---------------------Performance(Y) Example ECE 5367 4436 • Time of Concorde vs. Boeing 747? • Concord is 1350 mph / 610 mph = 2.2 times faster = 6.5 hours / 3 hours • Throughput of Concorde vs. Boeing 747 ? • Concord is 178,200 pmph / 286,700 pmph • Boeing is 286,700 pmph / 178,200 pmph = 0.62 “times faster” = 1.60 “times faster” • Boeing is 1.6 times (“60%”) faster in terms of throughput • Concord is 2.2 times (“120%”) faster in terms of flying time We will focus primarily on execution time for a single job Lots of instructions in a program => Instruction throughput important! ECE 5367 4436 CPU Performance = Seconds Program = Instructions x Cycles Program Instruction x Seconds Cycle ECE 5367 4436 Amdahl's Law Speedup due to enhancement E: ExTime w/o E Speedup(E) = -------------------ExTime w/ E Performance w/ E = --------------------Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then, ExTime(with E) = ((1-F) + F/S) x ExTime(without E) Speedup(with E) = 1 (1-F) + F/S ECE 5367 4436 Base Machine Op ALU Load Store Branch Freq 50% 20% 10% 20% Typical Mix Cycles 1 5 3 2 CPI(i) .5 1.0 .3 .4 2.2 % Time 23% 45% 14% 18% How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to save a cycle off the branch time? What if two ALU instructions could be executed at once?