Stuff to know! Introductions Instruction Set Architecture (ISA) Today’s topics Architecture overview Machine instructions Instruction Execution Cycle CISC machines RISC machines Parallelism Instruction-level Processor-level Internal representation Microprograms Limits of representation External representation Binary, octal, decimal, hexadecimal number systems Terms CPU: Central Processing Unit ALU: Arithmetic/Logic Unit Memory: storage for data and programs (separate from CPU) Register: fast temporary storage inside the CPU Bus: parallel "wires" for transferring a set of electrical signals simultaneously Internal: Transfers signals among CPU components Control: Carries signals for memory and I/O operations Address: Links to specific memory locations Data: Carries data CPU memory Microprogram: sequence of micro-instructions required to execute a machine instruction Cache: temporary storage for faster access Note: caching takes place at many levels in a computer system Registers General/Temporary: fast local memory inside the CPU one type of cache Control: dictates current state of the machine Status: indicates error conditions IR: Instruction Register (holds current instruction) IP: Instruction Pointer (holds memory address of next instruction) [often called Program Counter (PC)] MAR: Memory Address Register (holds address of memory location currently referenced) MDR: Memory Data Register: holds data being set to or retrieved from the memory address in the MAR Machine instructions Each computer architecture provides a set of machine-level instructions Instruction Set Architecture (ISA) Specific to one particular architecture Like everything inside a computer, machine instructions are implemented electrically Micro-instructions set the switches in the control register Hypothetical CISC* machine Shows hardware components Does not show digital logic level or microprograms. Shows how machine-level instructions can be stored and executed. Illustrates Finite-state machine *CISC Complex Instruction Set Computer VonNeumann architecture Instruction execution cycle Real computers … Use the “stored program” concept VonNeumann architecture Program is stored in memory, and is executed under the control of the operating system Operate using an Instruction Execution Cycle Instruction Execution Cycle 1. 2. 3. 4. Fetch next instruction (at address in IP) into IR. Increment IP to point to next instruction. Decode instruction in IR If instruction requires memory access, A. B. Determine memory address. Fetch operand from memory into a CPU register, or send operand from a CPU register to memory. Execute micro-program for instruction 6. Go to step 1. Note: default execution is sequential 5. Example CISC Instruction ADD R1, mem1 ;(Add contents of memory location mem1 to register R1) Example ADD Microprogram (each microinstruction executes in one clock cycle) 1. 2. 3. 4. 5. 6. 7. Copy contents of R1 to ALU Operand_1 Move address of mem1 to MAR Signal memory fetch (gets contents of memory address currently in MAR into MDR) Copy contents of MDR into ALU Operand_2 Signal ALU addition Check Status Register Copy contents of ALU Result to R1 Improving CISC CISC speed (and convenience) is increased by more efficient microprograms more powerful ISA level instructions cache memory more registers wider buses making it smaller more processors floating point instructions Etc. Clock Cycles So how “slow” is this? It isn’t slow Execution near light-speed Clock cycle length determines CPU speed (mostly) Limitations of CISC Improving a specific architecture requires instructions to be backward compatible. So … how about a different architecture? RISC machines Reduced Instruction Set Computer Much smaller set of instructions at ISA level Instructions are like CISC micro-instructions RISC assembly level programs look much longer (more instructions) than CISC assembly level programs, but they execute faster. Why? RISC design principles Instructions executed directly by hardware (no microprograms). Maximize rate of fetching instructions. Instructions easy to decode Instruction cache Fetching operands, etc. Only LOAD and STORE instructions reference memory. Plenty of registers More speed improvement Minimize memory and I/O accesses Cache Separate I/O unit (buffers/processing) Separate network communication unit (NIC) Etc. Parallel processing Parallelism (overview) Instruction-level parallelism pipeline cache Processor-level parallelism multiprocessor (multiple CPUs, common memory) multicomputer (multiple CPUs, each with own memory) Pipelining Pipelining Equations! For k execution stages, n instructions require k + (n – 1) S1 1 S2 S3 S4 S5 I-1 2 S1 I-1 3 I-1 4 I-1 5 8 9 10 11 12 1 I-1 2 I-2 S2 S3 S4 S5 S6 I-1 I-1 6 7 S6 I-1 3 4 I-2 I-2 I-2 I-1 I-2 5 I-2 No Pipelining I-2 6 I-2 7 I-2 I-2 I-1 I-1 I-2 6-stage pipeline I-1 I-2 Instruction Caching Hardware provides area for multiple instructions in the CPU Reduces number of memory accesses Instructions are available for immediate execution Might cause problems with decision, repetition, and procedure structures in programs Multiprocessor (shared memory) Multicomputer (distributed memory) Comparisons Cache and Pipelining Implemented in hardware Multiprocessor Difficult to build Relatively easy to program Multicomputer Easy to build (given networking technology) Extremely difficult to program Other types of parallelism Hybrid systems Scalable architectures Add more processors (nodes), without having to re-invent the system Simulated parallelism Super-Scalar Applications of Parallelism Multi-user systems Networks Internet Speed up single processes Chess example Other AI applications Questions? Parallelism … more later Internal representation (tomorrow!) Data Instructions Addresses Internal representation Just like everything else in a computer, the representation of numbers is implemented electrically switches set to off or on with open(transparent)/closed(opaque) gates. There are two states for each gate The binary number system uses two digits (0 and 1) In order to simplify discussion, we use the standard shorthand to transcribe the computer representation: off is written as digit 0 on is written as digit 1 External representation Use the binary number system to represent numeric values electrically. Switches (gates) are grouped into bytes, words, etc., to represent the digits of a binary number. Note: The number of gates in a group depends on the computer architecture and the type of data represented. E.G., For Intel-based architectures byte = 8-bits, word = 2 bytes (16 bits) integers use 2, 4, 8, or 10 bytes Binary number system has 2 digits: 0 and 1 (binary digit) has places and place values determined by powers of 2. (in theory) can uniquely represent any integer value A binary representation is just another way of writing a number that we are accustomed to seeing in decimal form. (in practice, inside the computer) representation is finite Representations with too many digits get chopped. Internal representation Place values (right-to-left) are 20,21,22,23,24, etc. Bits are numbered (right-to-left) starting at 0 Place value depends on number of "bits" defined for the type. Example: 15 A 16-bit integer might be 14 13 12 11 10 9 8 7 (red is "on") 6 5 4 3 2 1 0 (bit numbers) … transcribed by a human as 0000000010110010 To convert to its familiar decimal representation, just add up the place values of the places that are "on". Converting binary to decimal 215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20 32768 256 128 64 32 16 8 4 2 1 0 16384 8192 4096 2048 1024 512 0 0 0 0 0 0 0 1 0 1 1 0 0 1 0 in decimal form: 128 + 32 + 16 + 2 = 178 How many different codes (integers) can be represented using 16 bits? What is the largest (unsigned) integer that can be represented using 16 bits? What is the largest (unsigned) integer that can be represented using 32 bits? Prove that for n-bit representation, number of codes is 2n, largest unsigned integer is 2n – 1, and largest signed integer is 2n-1 - 1 Converting decimal to binary Method 1: Removing largest powers of 2 Method 2: Successive division by 2 Converting decimal to binary Example: 157 Method 1:Removing largest powers of 2 157 – 128 = 29 29 – 16 = 13 13 – 8 = 5 5–4=1 1–1=0 10011101 Method 2:Successive division by 2 157 ÷ 2 = 78 ÷ 2 = 39 ÷ 2 = 19 ÷ 2 = 9÷2= 4÷2= 2÷2= 1÷2= 78 R 1 39 R 0 19 R 1 9R1 4R1 2R0 1R0 0R1 10011101 Numeric representation We will show (later) exactly how an electrical operation can be performed on two electrical numeric representations to give an electrical result that is consistent with the rules of arithmetic. … but not quite consistent … Since the number of gates in each group (byte, word, etc.) is finite, computers can represent numbers with finite precision only. Example: Suppose that signed integer data is represented using 16 gates. Then the largest integer that can be represented is 65535. What happens if we add 1 ? If necessary, representations are truncated; overflow / underflow can occur, and the Status Register will be set Representing negative integers Must specify size! Specify n: number of bits (8, 16, 32, etc.) There are 2n possible "codes" Separate the "codes" so that half of them represent negative numbers. Note that exactly half of the codes have 1 in the "leftmost" bit.) Binary form of negative numbers Several methods, each with disadvantages. We will focus on twos-complement form For a negative number x : specify number of bits start with binary representation of |x| change every bit to its opposite, then add 1 to the result. Binary form of negative numbers Example: -13 in 16-bit twos-complement |-13| = 13 = 0000000000001101 ones-complement is 1111111111110010 add 1 to get 1111111111110011 = -13 Note that -(-13) should give 13. Try it. Hexadecimal representation? Convert binary to hex in the usual way -13 = 1111111111110011 = FFF3 H = 0xfff3 Note: For byte, word, etc., if the first hex digit is greater than or equal to 8, the value is negative. Convert negative binary to decimal? Find twos complement, convert, and prepend a minus sign. Signed numbers using 4-bit twos-complement form -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 Notice that all of the negative numbers have 1 in the leftmost bit. All of the non-negative numbers have 0 in the leftmost bit. For this reason, the leftmost bit is called the sign bit Note: Nobody uses 4-bit representations (“nibble”), but there’s not enough room to show 8-bit representations here. You can extend this diagram to 8-bit, 16-bit, etc. n-bit twos-complement form The 2n possible codes give all zero 2n-1 - 1 positive numbers 2n-1 negative numbers Note: zero is its own complement Note: there is one “weird” number 01111111 + 1 = 10000000 127 + 1 = -128 (inconsistent with rules of arithmetic) 127 is the largest number that can be represented in 8 bits. This means that -(-128) cannot be represented with 8 bits. i.e., the 2's-complement of 10000000 is 10000000 Signed or Unsigned? A 16-bit representation could be used for signed or unsigned numbers 16-bit signed range is -32768 .. +32767 16-bit unsigned range is 0 .. 65535 Both forms use the same 216 codes Example: 1010101010101010 unsigned is 43690 decimal 1010101010101010 signed is -21846 Programmer must tell the computer which form is being used. Other representations Every integer number has a unique representation in each "base" 2 Hexadecimal is commonly used for easily converting binary to a more manageable form. example 16-bit binary to hexadecimal: Binary 0001 0111 1011 Hexadecimal 1 7 B Write it as 0x17BD or 17BDh 1101 D Questions? Read Irvine Chapter 17.1