N15 Microprocessors von Neumann (Princeton) architecture instructions and data reside in same memory space (same address and data busses) cannot fetch a new instruction and read/write data at the same time (bottleneck) Harvard architecture separate memory for instructions and data data stored in RAM, program often stored in EEPROM with external serial load page 1 of 10 N15 page 1 of 10 Microprocessor history Apollo guidance computer = 12,300 transistors (4,100 chips, one 3-input NOR gate per chip) Intel 4004 8008 8080 Instructions 46 48 256 Data 4b 8b 8b Addr 12b 14b 16b Transistors 2,300 3,500 4,500 8085 8086 8087 256 8b 16b 16b 20b 6,500 29,000 8b 16b 16b 32b 32b 64b 64b 64b 20b 20b 24b 32b 32b 32b 32b 32b 29,000 55,000 134,000 275,000 1,180235 3,100,000 5,500,000 7,500,000 9,500,000 42,000,000 8088 80186 80286 80386 80486 Pentium Pentium Pro Pentium II Pentium III Pentium IV Year 1971 1972 1974 several support chips with different voltages 1977 single +5V supply 1978 1980 first floating point coprocessor 1979 IBM-PC 1982 1983 1985 1989 MOS Technology 6502 256 8b 16b 3,510 1975 Apple II Zilog Z80 256 8b 16b 8,500 1976 8085 clone 72 8b 16b 32b 16b 24b 32b 4,100 68,000 200,000 1974 1979 1984 Macintosh, UNIX UNIX Motorola 6800 68000 68020 N15 page 1 of 10 N15 page 1 of 10 Main functions of Central Processing Unit (CPU) Arithmetic logic unit (ALU) Registers for temporary storage Typically size of external data bus Accumulator or Working (input and output of ALU) Temporary (second input to ALU) Flag status register (Carry, Zero, Negative, Interrupt Disable, etc.) Instruction buffer Control buffer Data buffer General purpose Typically size of external address bus Address buffer Program counter (PC) Stack pointer (SP) Instruction decode lookup table (LUT) Bus drivers 6502 CPU N15 page 1 of 10 N15 page 1 of 10 Classic one byte instruction (8b internal data, 8b external data, 16b external address) move content of PCL to address buffer low move contents of PCH to address buffer high set MEMR fetch instruction byte from memory into data buffer move contents of data buffer into instruction buffer decode instruction to set internal control lines increment PC by one typical functions a) arithmetic operations - result in W single operand on W - zero, NOT, increment/decrement, two's complement, set/clear bit, rotate/shift two operand on W and TEMP - add, AND, OR, XOR b) internal move from register source to register destination Classic two byte instruction (8b internal data, 8b external data, 16b external address) same as one byte instruction decode instruction to set internal control lines increment PC by one fetch a second byte into data buffer (usually contains a constant value) move contents of data buffer into W (or another register) increment PC again typical functions load constant into register destination N15 page 1 of 10 Classic three byte instruction (8b internal data, 8b external data, 16b external address) same as one byte instruction decode instruction to set internal control lines increment PC by one fetch a second byte into data buffer (usually contains low byte of an address) move contents of data buffer into W (or another register) increment PC again fetch a third byte into data buffer (usually contains high byte of an address) move contents of data buffer into TEMP (or another register) increment PC again move W into address buffer low move TEMP into address buffer high if read from memory set MEMR fetch byte from memory location into data buffer move data buffer into W if write to memory move content of a register into data buffer set MEMW write data buffer into memory location typical functions a) read variable from memory b) write variable to memory c) conditional jump - JZ, JNZ, JC, JNC, JPOS, JNEG move W and TEMP into PCL and PCH instead of address buffer if test is true d) unconditional jump - JMP e) jump to subroutine - JSR (see below) N15 page 1 of 10 N15 page 1 of 10 Arduino program compiled into AVR instructions int a, b, c; // 16b signed a = 4; b = 6; c = a + b; first pass of compiler assigns memory locations for data hex location 0x0100 0x0101 0x0102 0x0103 0x0104 0x0105 data lsB(a) msB(a) lsB(b) msB(b) lsB(c) msB(c) second pass of complier creates instructions (hex opcode) hex opcode command,args comment 84 e0 90 e0 90 93 01 01 80 93 00 01 ldi ldi sts sts r24, 0x04 r25, 0x00 0x0101, r25 0x0100, r24 a = 4; Load Immediate - load lsB(a) into register 24 (r24) Load Immediate - load msB(a) into register 25 (r25) Store Direct to Data Space - store msB(a) to memory Store Direct to Data Space - store lsB(a) to memory r24, 0x06 r25, 0x00 0x0103, r25 0x0102, r24 b = 6; Load Immediate - load lsB(b) into r24 Load Immediate - load msB(b) into r25 Store Direct to Data Space - store msB(b) to mem Store Direct to Data Space - store msB(b) to mem r24, 0x0102 r25, 0x0103 r18, 0x0100 r19, 0x0101 r24, r18 r25, r19 0x0105, r25 0x0104, r24 c = a + b; Load Direct from Data Space - load lsB(b) from mem into r24 Load Direct from Data Space - load msB(b) from mem into r25 Load Direct from Data Space - load lsB(a) from mem into r18 Load Direct from Data Space - load msB(a) from mem into r19 Add without Carry – add lsB(b) plus lsb(a), result in r24 Add with Carry – add msB(b) plus msb(a), result in r25 Store Direct to Data Space - store msB(c) to mem Store Direct to Data Space - store lsB(c) to mem 86 e0 90 e0 90 93 03 01 80 93 02 01 80 91 02 01 90 91 03 01 20 91 00 01 30 91 01 01 82 0f 93 1f 90 93 03 01 80 93 02 01 ldi ldi sts sts lds lds lds lds add adc sts sts 11 bytes of text (excluding spaces and semicolons), 6 bytes of data, 52 bytes of opcode instructions N15 Historic evolution of CPU More registers More arithmetic functions More instructions in instruction set Bigger instruction decoder LUT due to propagation delay Slower performance per instruction Reduced instruction set computer (RISC) Fewer registers Fewer instructions Smaller decoder LUT Faster performance per instruction PIC and Atmel are RISC with modified Harvard architecture Call subroutine 1) push contents of PC onto stack and increment SP 2) load address of instructions for subroutine into PC 3) execute subroutine Return from subroutine 4) pop old value of PC from stack and decrement SP 5) load value from stack into PC Maximum size of stack limits number of levels of subroutine nesting Interrupt hardware generated subroutine - use stack to remember old PC need special interrupt handler instructions need list of addresses for different handler routines (interrupt vector) mask off other interrupts page 1 of 10