Computer Architecture ECE 4801 Berk Sunar Erkay Savas Outline Brief Overview How is a computer program executed? Computer organization Roadmap for this class Things You Learn in this Course How computers work; the basic foundation How to analyze their performance (and how not to) Key technologies determining the performance of modern processors Datapath Design Pipelining Cache Systems Memory Hierarchy I/O Multiprocessors Instruction Set Architecture Important abstraction Interface between hardware and low-level software Or features available to programmers instructions set architecture (ISA) e.g. does the processor have an multiply instruction? instruction encoding Data representation I/O mechanism. addressing mechanism Modern instruction set architectures: 80x86/Pentium/K6, PowerPC, DEC Alpha, MIPS, SPARC, HP, ARM. Computer Organization Computer Organization is how features are implemented in hardware Transparent to programmers Different implementations are possible for the same architecture (affects performance/price) Determines how memory, CPU, peripherals, busses are interconnected and how control signals routed. Has HUGE impact on performance. Performance of the organization is usually application dependent. (e.g. I/O intensive, computation intensive, memory bound etc.) How to Program a Computer? A simple but universal interface Machine Code (binary images) Assembly language Uses mnemonics that map directly to ISA e.g. addw, lb, jmp etc. More readable than machine languages Error prone but excellent for low-level optimization High-level languages E.g. C/C++, Pascal, Fortran, Java, C# Much easier to use and program Promotes code portability Not as efficient as custom assembly Processing a C Program High-level language program (in C) swap (int v[], int k){ int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; } Binary machine language program for MIPS Assembly language program for MIPS C compiler 0000000010100001000000000001 10000000000010001110000110000 01000011000110001100010000000 00000000001000110011110010000 00000000001001010110011110010 00000000000000001010110001100 01000000000000001000000001111 1000000000000000001000 swap: muli add lw lw sw sw jr $2, $2, $15, $16, $16, $15, $31 $5, 4 $4, $2 0($2) 4($2) 0($2) 4($2) Assembler Functions of a Computer Data processing , e.g. sort entries of a spreadsheet Data storage, e.g. personal files, applications, movies, music etc. Data movement, e.g. play a music file, display a picture Control, (applies to all examples above) Five Classic Components Computer Processor Datapath Control Memory Input Output System Interconnection Bridges Inside the Processor Chip Instruction Cache Data Cache Control Bus branch prediction integer floating-point datapath datapath An Actual View 22nm Intel Core CPU Source: Intel Corp. 6 core CPU with L3 caches https://computing.llnl.gov/tutorials/parallel_comp/ Memory Nonvolatile: ROM Hard disk, floppy disk, magnetic tape, CDROM, USB Memory Flash memory Volatile DRAM used usually for main memory SRAM used mainly for on-chip memory such as register and cache DRAM is much cheaper than SRAM SRAM is much faster than DRAM How about solid state drives? DRAM and Processor Characteristics Solutions to Memory Problems Increase number of bits retrieved at one time Make DRAM “wider” rather than “deeper” Change DRAM interface Cache Reduce frequency of memory access More complex cache and cache on chip Increase interconnection bandwidth High speed buses Hierarchy of buses Computer Networks Very essential aspect of computer systems Communication Resource sharing Remote access Ethernet is the most popular LAN Range is limited to 1 kilometer 10/100 Mbit/s Wide Area Networks (WAN) Cross continents and backbone of the Internet Roadmap Performance issues Instruction set of MIPS Arithmetic and ALU Constructing a processor to execute our instructions (datapath design) Pipelining Memory hierarchy: caches and virtual memory I/O