Digital Design: An Embedded Systems Approach Using Verilog Chapter 7 Processor Basics Portions of this work are from the book, Digital Design: An Embedded Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved. Verilog Embedded Computers A computer as part of a digital system Performs processing to implement or control the system’s function Components Processor core Instruction and data memory Input, output, and input/output controllers Accelerators For interacting with the physical world High-performance circuit for specialized functions Interconnecting buses Digital Design — Chapter 7 — Processor Basics 2 Verilog Memory Organization Von Neumann architecture Single memory for instructions and data Harvard architecture Separate instruction and data memories Most common in embedded systems CPU Instruction memory Data memory Accelerator Input controller Output controller I/O controller … Digital Design — Chapter 7 — Processor Basics 3 Verilog Bus Organization Single bus for low-cost low-performance systems Multiple buses for higher performance Data memory Instruction memory Input controller Accelerator CPU Output controller I/O controller Digital Design — Chapter 7 — Processor Basics 4 Verilog Microprocessors Single-chip processor in a package External connections to memory and I/O buses Most commonly seen in general purpose computers E.g., Intel Pentium family, PowerPC, … Digital Design — Chapter 7 — Processor Basics 5 Verilog Microcontrollers Single chip combining Microcontroller families Same processor, varying memory and I/O 8-bit microcontrollers Processor A small amount of instruction/data memory I/O controllers Operate on 8-bit data Low cost, low performance 16-bit and 32-bit microcontrollers Higher performance Digital Design — Chapter 7 — Processor Basics 6 Verilog Processor Cores Processor as a component in an FPGA or ASIC In FPGA, can be a fixed-function block Or can be a soft core E.g., PowerPC cores in some Xilinx FPGAs Implemented using programmable resources E.g., Xilinx MicroBlaze, Altera Nios-II In ASIC, provided as an IP block E.g., ARM, PowerPC, MIPS, Tensilica cores Can be customized for an application Digital Design — Chapter 7 — Processor Basics 7 Verilog Digital Signal Processors DSPs are processors optimized for signal processing operations E.g., audio, video, sensor data; wireless communication Often combined with a conventional core for processing other data Heterogeneous multiprocessor Digital Design — Chapter 7 — Processor Basics 8 Verilog Instruction Sets A processor executes a program Instruction set: the repertoire of available instructions A sequence of instructions, each performing a small step of a computation Different processor types have different instruction sets High-level languages: more abstract E.g., C, C++, Ada, Java Translated to processor instructions by a compiler Digital Design — Chapter 7 — Processor Basics 9 Verilog Instruction Execution Instructions are encoded in binary A processor executes a program by repeatedly Stored in the instruction memory Fetching the next instruction Decoding it to work out what to do Executing the operation Program counter (PC) Register in the processor holding the address of the next instruction Digital Design — Chapter 7 — Processor Basics 10 Verilog Data and Endian-ness Instructions operate on data from the data memory Byte: 8-bit data Data memory is usually byte addressed 16-bit, 32-bit, 64-bit words of data Little endian 0 8-bit data m least sig. byte m+1 most sig. byte n least sig. byte n+1 16-bit data 32-bit data n+2 n+3 Big endian 0 8-bit data m most sig. byte m+1 least sig. byte n most sig. byte n+1 32-bit data n+2 most sig. byte n+3 16-bit data least sig. byte Digital Design — Chapter 7 — Processor Basics 11 Verilog The Gumnut Core A small 8-bit soft core Instruction set illustrates features typical of 8bit cores and processors in general Programs written in assembly language Can be used in FPGA designs Each processor instruction written explicitly Translated to binary representation by an assembler Resources available on companions web site Digital Design — Chapter 7 — Processor Basics 12 Verilog Gumnut Storage Digital Design — Chapter 7 — Processor Basics 13 Verilog Arithmetic Instructions Operate on register data and put result in a register Condition codes add, addc, sub, subc Can have immediate value operand Z: 1 if result is zero, 0 if result is non-zero C: carry out of add/addc, borrow out of sub/subc addc and subc include C bit in operation Digital Design — Chapter 7 — Processor Basics 14 Verilog Arithmetic Instructions Examples add add sub r3, r4, r1 r5, r1, 2 r4, r4, 1 Evaluate 2x + 1; x in r3, result in r4 add add r4, r4, r3 r4, r4, 1 ; double x ; then add 1 Digital Design — Chapter 7 — Processor Basics 15 Verilog Logical Instructions Operate on register data and put result in a register and, or, xor, mask (and not) Operate bitwise on 8-bit operands Can have immediate value operand Condition codes Z: 1 if result is zero, 0 if result is non-zero C: always 0 Digital Design — Chapter 7 — Processor Basics 16 Verilog Logical Instructions Examples and or xor r3, r4, r5 r1, r1, 0x80 r5, r5, 0xFF ; set r1(7) ; invert r5 Set Z if least-significant 4 bits of r2 are 0101 and sub r1, r2, 0x0F ; clear high bits r0, r1, 0x05 ; compare with 0101 Digital Design — Chapter 7 — Processor Basics 17 Verilog Shift Instructions Logical shift/rotate register data and put result in a register shl, shr, rol, ror Count specified as a literal operand Condition codes Z: 1 if result is zero, 0 if result is non-zero C: the value of the last bit shifted/rotated past the end of the byte Digital Design — Chapter 7 — Processor Basics 18 Verilog Shift Instructions Examples r4, r1, 3 r2, r2, 4 Multiply r4 by 8, ignoring overflow shl ror shl r4, r4, 3 Multiply r4 by 10, ignoring overflow shl shl add r1, r4, 1 ; multiply by 2 r4, r4, 3 ; multiply by 8 r4, r4, r1 Digital Design — Chapter 7 — Processor Basics 19 Verilog Memory Instructions Transfer data between registers and data memory Load register from memory r1, (r2)+5 stm r1, (r4)-2 Use r0 if base address is 0 ldm Store from register to memory Compute address by adding an offset to a base register value ldm r3, 23 ldm r3, (r0)+23 Condition codes not affected Digital Design — Chapter 7 — Processor Basics 20 Verilog Memory Instructions Increment a 16-bit integer in memory Little-endian: address location ldm r1, (r2) add r1, r1, 1 stm r1, (r2) ldm r1, (r2)+1 addc r1, r1, 0 stm r1, (r2)+1 of lsb in r2, msb in next ; increment lsb ; increment msb ; with carry Digital Design — Chapter 7 — Processor Basics 21 Verilog Input/Output Instructions I/O controllers have registers that govern their operation Input from I/O register inp r3, 157 inp r3, (r0)+157 Output to I/O register Each has an address, like data memory Gumnut has separate data and I/O address spaces out r3, (r7) out r3, (r7)+0 Condition codes not affected Further examples in Chapter 8 Digital Design — Chapter 7 — Processor Basics 22 Verilog Branch Instructions Programs can evaluate conditions and take alternate courses of action Condition codes (Z, C) represent outcomes of arithmetic/logical/shift instructions Branch instructions examine Z or C bz, bnz, bc, bnc Add a displacement to PC if condition is true Specifies how many instructions forward or backward to skip Counting from instruction after branch Digital Design — Chapter 7 — Processor Basics 23 Verilog Branch Example Elapsed seconds in location 100 Increment, ldm r1, add r1, sub r0, bnz +1 add r1, stm r1, wrapping to 0 after 59 100 r1, 1 r1, 60 ; Z set if r1 = 60 ; Skip to store if r0, 0 ; Z is 0 100 Digital Design — Chapter 7 — Processor Basics 24 Verilog Jump Instruction Unconditionally skips forward or backward to specified address Changes the PC to the address Example: if r1 = 0, clear data location 100 to 0; otherwise clear location 200 to 0 Assume instructions start at address 10 10: sub r0, r1, 0 11: bnz +2 12: stm r0, 100 13: jmp 15 14: stm r0, 200 15: ... Digital Design — Chapter 7 — Processor Basics 25 Verilog Subroutines A sequence of instructions that perform some operation Can call them from different parts of a program using a jsb instruction Subroutine returns with a ret instruction Digital Design — Chapter 7 — Processor Basics 26 Verilog Subroutine Example Subroutine to increment second count Address of count in r2 ldm r1, (r2) add r1, r1, 1 sub r0, r1, 60 bnz +1 add r1, r0, 0 stm r1, (r2) ret Call to increment locations 100 and 102 add jsb add jsb r2, r0, 100 20 r2, r0, 102 20 Digital Design — Chapter 7 — Processor Basics 27 Verilog Return Address Stack The jsb saves the return address for use by the ret But what if the subroutine includes a jsb? Gumnut core includes an 8-entry pushdown stack of return addresses return addr for third call return addr for second call return addr for second call return addr for first call return addr for first call Digital Design — Chapter 7 — Processor Basics 28 Verilog Miscellaneous Instructions Instructions supporting interrupts See Chapter 8 reti Return from interrupt enai Enable interrupts disi Disable interrupts wait Wait for an interrupt stby Stand by in low power mode until an interrupt occurs Digital Design — Chapter 7 — Processor Basics 29 Verilog The Gumnut Assembler Gasm: translates assembly programs Generates memory images for program text (binary-coded instructions) and data See documentation on web site Write a program as a text file Instructions Directives Comments Use symbolic labels Digital Design — Chapter 7 — Processor Basics 30 Verilog Example Program ; Program to determine greater of value_1 and value_2 text org 0x000 ; start here on reset jmp main ; Data memory layout data value_1: byte value_2: byte result: bss 10 20 1 ; Main program text org main: ldm ldm sub bc stm jmp value_2_greater: stm finish: jmp 0x010 r1, value_1 r2, value_2 r0, r1, r2 value_2_greater r1, result finish r2, result finish ; load values ; compare values ; value_1 is greater ; value_2 is greater ; idle loop Digital Design — Chapter 7 — Processor Basics 31 Verilog Gumnut Instruction Encoding Instructions are a form of information Can be encoded in binary Gumnut encoding 18 bits per instruction Divided into fields representing different aspects of the instruction Opcodes and function codes Register numbers Addresses Digital Design — Chapter 7 — Processor Basics 32 Verilog Gumnut Instruction Encoding Arith/Logical Register Arith/Logical Immediate 4 3 3 3 1 1 1 0 rd rs rs2 1 3 3 3 8 0 fn rd rs immed 3 3 3 rd rs count 3 Shift Memory, I/O Branch 1 1 1 0 Miscellaneous 3 fn 3 2 3 3 8 1 0 fn rd rs offset 6 2 1 1 1 1 1 0 fn 2 2 fn 2 5 Jump 2 8 disp 1 12 1 1 1 1 0 fn addr 7 3 1 1 1 1 1 1 0 fn 8 Digital Design — Chapter 7 — Processor Basics 33 Verilog Encoding Examples Encoding for addc r3, r5, 24 Arithmetic immediate, fn = 001 1 3 3 3 8 0 fn rd rs immed 0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 0 05D18 Instruction encoded by 2ECFC 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 0 Branch 6 2 1 1 1 1 1 0 fn 2 8 disp Digital Design — Chapter 7 — Processor Basics bnc -4 34 Verilog Other Instruction Sets 8-bit cores and microcontrollers Xilinx PicoBlaze: like Gumnut 8051, and numerous like it Originated as 8-bit microprocessors Instructions encoded as one or more bytes Instruction set is more complex and irregular Complex instruction set computer (CISC) C.f. Reduced instruction set computer (RISC) 16-, 32- and 64-bit cores Mostly RISC E.g., PowerPC, ARM, MIPS, Tensilica, … Digital Design — Chapter 7 — Processor Basics 35 Verilog Instruction and Data Memory In embedded systems Instruction memory is usually ROM, flash, SRAM, or combination Data memory is usually SRAM DRAM if large capacity needed Processor/memory interfacing Gluing the signals together Digital Design — Chapter 7 — Processor Basics 36 Verilog Example: Gumnut Memory data SRAM gumnut instruction ROM clk_i rst_i clk_i en D Q clk_i inst_cyc_o inst_stb_o data_cyc_o data_stb_o inst_ack_i data_ack_i Q D clk clk adr dat_o en inst_adr_o inst_dat_i data_we_o we data_adr_o data_dat_o data_dat_i adr dat_i dat_o Digital Design — Chapter 7 — Processor Basics 37 Verilog Example: Gumnut Memory always @(posedge clk) // Instruction memory if (inst_cyc_o && inst_stb_o) begin inst_dat_i <= inst_ROM[inst_adr_o[10:0]]; inst_ack_i <= 1'b1; end else inst_ack_i <= 1'b0; Digital Design — Chapter 7 — Processor Basics 38 Verilog Example: Gumnut Memory always @(posedge clk) // Data memory if (data_cyc_o && data_stb_o) if (data_we_o) begin data_RAM[data_adr_o] <= data_dat_o; data_dat_i <= data_dat_o; data_ack_i <= 1'b1; end else begin data_dat_i <= data_RAM[data_adr_o]; data_ack_i <= 1'b1; end else data_ack_i <= 1'b0; Digital Design — Chapter 7 — Processor Basics 39 Verilog Example: Microcontroller Memory 8051 SRAM P2 A(15..8) D P0 ALE PSEN WR RD D Q A(7..0) LE A(16) WE OE CE Digital Design — Chapter 7 — Processor Basics 40 Verilog 32-bit Memory Four bytes per memory word Little-endian: lsb at least address Big-endian: msb at least address 0 4 8 2 6 10 3 7 11 Partial-word read 1 5 9 Read all bytes, processor selects those needed Partial-word write Use byte-enable signals Digital Design — Chapter 7 — Processor Basics 41 Verilog Example: MicroBlaze Memory Addr Data_Write 2:16 0:7 AS SSRAM A D_in D_out Write_Strobe en Byte_Enable(0) wr clk Byte_Enable(1) Byte_Enable(2) SSRAM Byte_Enable(3) Read_Strobe 0:7 8:15 A D_in D_out 8:15 en Data_Read wr +V clk Ready Clk SSRAM 16:23 A D_in D_out 16:23 en wr clk SSRAM 24:31 A D_in D_out 24:31 en wr clk Digital Design — Chapter 7 — Processor Basics 42 Verilog Cache Memory For high-performance processors Memory access time is several clock cycles Performance bottleneck Cache memory Small fast memory attached to a processor Stores most frequently accessed items, plus adjacent items Locality: those items are most likely to be accessed again soon Digital Design — Chapter 7 — Processor Basics 43 Verilog Cache Memory Memory contents divided into fixedsized blocks (lines) Cache copies whole lines from memory When processor accesses an item If item is in cache: hit - fast access Occurs most of the time If item is not in cache: miss Line containing item is copied from memory Slower, but less frequent May need to replace a line already in cache Digital Design — Chapter 7 — Processor Basics 44 Verilog Fast Main Memory Access Optimize memory for line access by cache Wide memory Burst transfers Send starting address, then read successive locations Pipelining Read a line in one access Overlapping stages of memory access E.g., address transfer, memory operation, data transfer Double data rate (DDR), Quad data rate (QDR) Transfer on both rising and falling clock edges Digital Design — Chapter 7 — Processor Basics 45 Verilog Summary Embedded computer Microprocessors, microcontrollers, and processor cores Soft-core processors for ASIC/FPGA Processor instruction sets Processor, memory, I/O controllers, buses Binary encoding for instructions Assembly language programs Memory interfacing Digital Design — Chapter 7 — Processor Basics 46