Digital Design:
An Embedded Systems
Approach Using Verilog
Chapter 7
Processor Basics
Portions of this work are from the book, Digital Design: An Embedded
Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan
Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.
Verilog
Embedded Computers

A computer as part of a digital system


Performs processing to implement or control the
system’s function
Components



Processor core
Instruction and data memory
Input, output, and input/output controllers


Accelerators


For interacting with the physical world
High-performance circuit for specialized functions
Interconnecting buses
Digital Design — Chapter 7 — Processor Basics
2
Verilog
Memory Organization

Von Neumann architecture


Single memory for instructions and data
Harvard architecture


Separate instruction and data memories
Most common in embedded systems
CPU
Instruction
memory
Data
memory
Accelerator
Input
controller
Output
controller
I/O
controller
…
Digital Design — Chapter 7 — Processor Basics
3
Verilog
Bus Organization


Single bus for low-cost low-performance
systems
Multiple buses for higher performance
Data
memory
Instruction
memory
Input
controller
Accelerator
CPU
Output
controller
I/O
controller
Digital Design — Chapter 7 — Processor Basics
4
Verilog
Microprocessors



Single-chip processor in a package
External connections to memory and
I/O buses
Most commonly seen in general purpose
computers

E.g., Intel Pentium family, PowerPC, …
Digital Design — Chapter 7 — Processor Basics
5
Verilog
Microcontrollers

Single chip combining




Microcontroller families


Same processor, varying memory and I/O
8-bit microcontrollers



Processor
A small amount of instruction/data memory
I/O controllers
Operate on 8-bit data
Low cost, low performance
16-bit and 32-bit microcontrollers

Higher performance
Digital Design — Chapter 7 — Processor Basics
6
Verilog
Processor Cores


Processor as a component in an FPGA or
ASIC
In FPGA, can be a fixed-function block


Or can be a soft core



E.g., PowerPC cores in some Xilinx FPGAs
Implemented using programmable resources
E.g., Xilinx MicroBlaze, Altera Nios-II
In ASIC, provided as an IP block


E.g., ARM, PowerPC, MIPS, Tensilica cores
Can be customized for an application
Digital Design — Chapter 7 — Processor Basics
7
Verilog
Digital Signal Processors

DSPs are processors optimized for
signal processing operations


E.g., audio, video, sensor data; wireless
communication
Often combined with a conventional
core for processing other data

Heterogeneous multiprocessor
Digital Design — Chapter 7 — Processor Basics
8
Verilog
Instruction Sets

A processor executes a program


Instruction set: the repertoire of available
instructions


A sequence of instructions, each performing a
small step of a computation
Different processor types have different instruction
sets
High-level languages: more abstract


E.g., C, C++, Ada, Java
Translated to processor instructions by a compiler
Digital Design — Chapter 7 — Processor Basics
9
Verilog
Instruction Execution

Instructions are encoded in binary


A processor executes a program by
repeatedly




Stored in the instruction memory
Fetching the next instruction
Decoding it to work out what to do
Executing the operation
Program counter (PC)

Register in the processor holding the
address of the next instruction
Digital Design — Chapter 7 — Processor Basics
10
Verilog
Data and Endian-ness


Instructions operate on data from the data memory
Byte: 8-bit data


Data memory is usually byte addressed
16-bit, 32-bit, 64-bit words of data
Little endian
0
8-bit data
m
least sig. byte
m+1
most sig. byte
n
least sig. byte
n+1
16-bit data
32-bit data
n+2
n+3
Big endian
0
8-bit data
m
most sig. byte
m+1
least sig. byte
n
most sig. byte
n+1
32-bit data
n+2
most sig. byte
n+3
16-bit data
least sig. byte
Digital Design — Chapter 7 — Processor Basics
11
Verilog
The Gumnut Core

A small 8-bit soft core



Instruction set illustrates features typical of 8bit cores and processors in general
Programs written in assembly language



Can be used in FPGA designs
Each processor instruction written explicitly
Translated to binary representation by an
assembler
Resources available on companions web site
Digital Design — Chapter 7 — Processor Basics
12
Verilog
Gumnut Storage
Digital Design — Chapter 7 — Processor Basics
13
Verilog
Arithmetic Instructions

Operate on register data and put result
in a register



Condition codes



add, addc, sub, subc
Can have immediate value operand
Z: 1 if result is zero, 0 if result is non-zero
C: carry out of add/addc, borrow out of
sub/subc
addc and subc include C bit in
operation
Digital Design — Chapter 7 — Processor Basics
14
Verilog
Arithmetic Instructions

Examples




add
add
sub
r3, r4, r1
r5, r1, 2
r4, r4, 1
Evaluate 2x + 1; x in r3, result in r4

add
add
r4, r4, r3
r4, r4, 1
; double x
; then add 1
Digital Design — Chapter 7 — Processor Basics
15
Verilog
Logical Instructions

Operate on register data and put result
in a register




and, or, xor, mask (and not)
Operate bitwise on 8-bit operands
Can have immediate value operand
Condition codes


Z: 1 if result is zero, 0 if result is non-zero
C: always 0
Digital Design — Chapter 7 — Processor Basics
16
Verilog
Logical Instructions

Examples




and
or
xor
r3, r4, r5
r1, r1, 0x80
r5, r5, 0xFF
; set r1(7)
; invert r5
Set Z if least-significant 4 bits of r2 are 0101

and
sub
r1, r2, 0x0F ; clear high bits
r0, r1, 0x05 ; compare with 0101
Digital Design — Chapter 7 — Processor Basics
17
Verilog
Shift Instructions

Logical shift/rotate register data and
put result in a register



shl, shr, rol, ror
Count specified as a literal operand
Condition codes


Z: 1 if result is zero, 0 if result is non-zero
C: the value of the last bit shifted/rotated
past the end of the byte
Digital Design — Chapter 7 — Processor Basics
18
Verilog
Shift Instructions

Examples



r4, r1, 3
r2, r2, 4
Multiply r4 by 8, ignoring overflow


shl
ror
shl
r4, r4, 3
Multiply r4 by 10, ignoring overflow

shl
shl
add
r1, r4, 1 ; multiply by 2
r4, r4, 3 ; multiply by 8
r4, r4, r1
Digital Design — Chapter 7 — Processor Basics
19
Verilog
Memory Instructions

Transfer data between registers and data
memory


Load register from memory


r1, (r2)+5
stm
r1, (r4)-2
Use r0 if base address is 0


ldm
Store from register to memory


Compute address by adding an offset to a base
register value
ldm
r3, 23  ldm
r3, (r0)+23
Condition codes not affected
Digital Design — Chapter 7 — Processor Basics
20
Verilog
Memory Instructions

Increment a 16-bit integer in memory


Little-endian: address
location
ldm r1, (r2)
add r1, r1, 1
stm r1, (r2)
ldm r1, (r2)+1
addc r1, r1, 0
stm r1, (r2)+1
of lsb in r2, msb in next
; increment lsb
; increment msb
; with carry
Digital Design — Chapter 7 — Processor Basics
21
Verilog
Input/Output Instructions

I/O controllers have registers that govern
their operation



Input from I/O register



inp
r3, 157  inp
r3, (r0)+157
Output to I/O register


Each has an address, like data memory
Gumnut has separate data and I/O address spaces
out
r3, (r7)  out
r3, (r7)+0
Condition codes not affected
Further examples in Chapter 8
Digital Design — Chapter 7 — Processor Basics
22
Verilog
Branch Instructions

Programs can evaluate conditions and take
alternate courses of action


Condition codes (Z, C) represent outcomes of
arithmetic/logical/shift instructions
Branch instructions examine Z or C



bz, bnz, bc, bnc
Add a displacement to PC if condition is true
Specifies how many instructions forward or
backward to skip

Counting from instruction after branch
Digital Design — Chapter 7 — Processor Basics
23
Verilog
Branch Example

Elapsed seconds in location 100


Increment,
ldm r1,
add r1,
sub r0,
bnz +1
add r1,
stm r1,
wrapping to 0 after 59
100
r1, 1
r1, 60 ; Z set if r1 = 60
; Skip to store if
r0, 0
;
Z is 0
100
Digital Design — Chapter 7 — Processor Basics
24
Verilog
Jump Instruction

Unconditionally skips forward or backward to
specified address


Changes the PC to the address
Example: if r1 = 0, clear data location 100 to
0; otherwise clear location 200 to 0


Assume instructions start at address 10
10: sub r0, r1, 0
11: bnz +2
12: stm r0, 100
13: jmp 15
14: stm r0, 200
15: ...
Digital Design — Chapter 7 — Processor Basics
25
Verilog
Subroutines

A sequence of instructions that perform
some operation


Can call them from different parts of a
program using a jsb instruction
Subroutine returns with a ret instruction
Digital Design — Chapter 7 — Processor Basics
26
Verilog
Subroutine Example

Subroutine to increment second count



Address of count in r2
ldm r1, (r2)
add r1, r1, 1
sub r0, r1, 60
bnz +1
add r1, r0, 0
stm r1, (r2)
ret
Call to increment locations 100 and 102

add
jsb
add
jsb
r2, r0, 100
20
r2, r0, 102
20
Digital Design — Chapter 7 — Processor Basics
27
Verilog
Return Address Stack

The jsb saves the return address for
use by the ret


But what if the subroutine includes a jsb?
Gumnut core includes an 8-entry pushdown stack of return addresses
return addr for third call
return addr for second call
return addr for second call
return addr for first call
return addr for first call
Digital Design — Chapter 7 — Processor Basics
28
Verilog
Miscellaneous Instructions

Instructions supporting interrupts






See Chapter 8
reti Return from interrupt
enai Enable interrupts
disi Disable interrupts
wait Wait for an interrupt
stby Stand by in low power mode until
an interrupt occurs
Digital Design — Chapter 7 — Processor Basics
29
Verilog
The Gumnut Assembler

Gasm: translates assembly programs



Generates memory images for program
text (binary-coded instructions) and data
See documentation on web site
Write a program as a text file




Instructions
Directives
Comments
Use symbolic labels
Digital Design — Chapter 7 — Processor Basics
30
Verilog
Example Program
; Program to determine greater of value_1 and value_2
text
org
0x000
; start here on reset
jmp
main
; Data memory layout
data
value_1:
byte
value_2:
byte
result:
bss
10
20
1
; Main program
text
org
main:
ldm
ldm
sub
bc
stm
jmp
value_2_greater: stm
finish:
jmp
0x010
r1, value_1
r2, value_2
r0, r1, r2
value_2_greater
r1, result
finish
r2, result
finish
; load values
; compare values
; value_1 is greater
; value_2 is greater
; idle loop
Digital Design — Chapter 7 — Processor Basics
31
Verilog
Gumnut Instruction Encoding

Instructions are a form of information


Can be encoded in binary
Gumnut encoding


18 bits per instruction
Divided into fields representing different
aspects of the instruction



Opcodes and function codes
Register numbers
Addresses
Digital Design — Chapter 7 — Processor Basics
32
Verilog
Gumnut Instruction Encoding
Arith/Logical
Register
Arith/Logical
Immediate
4
3
3
3
1 1 1 0
rd
rs
rs2
1
3
3
3
8
0
fn
rd
rs
immed
3
3
3
rd
rs
count
3
Shift
Memory, I/O
Branch
1
1 1 0
Miscellaneous
3
fn
3
2
3
3
8
1 0
fn
rd
rs
offset
6
2
1 1 1 1 1 0
fn
2
2
fn
2
5
Jump
2
8
disp
1
12
1 1 1 1 0 fn
addr
7
3
1 1 1 1 1 1 0
fn
8
Digital Design — Chapter 7 — Processor Basics
33
Verilog
Encoding Examples

Encoding for addc r3, r5, 24

Arithmetic immediate, fn = 001
1
3
3
3
8
0
fn
rd
rs
immed
0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 0


05D18
Instruction encoded by 2ECFC
1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 0
Branch
6
2
1 1 1 1 1 0
fn
2
8
disp
Digital Design — Chapter 7 — Processor Basics

bnc -4
34
Verilog
Other Instruction Sets

8-bit cores and microcontrollers


Xilinx PicoBlaze: like Gumnut
8051, and numerous like it






Originated as 8-bit microprocessors
Instructions encoded as one or more bytes
Instruction set is more complex and irregular
Complex instruction set computer (CISC)
C.f. Reduced instruction set computer (RISC)
16-, 32- and 64-bit cores


Mostly RISC
E.g., PowerPC, ARM, MIPS, Tensilica, …
Digital Design — Chapter 7 — Processor Basics
35
Verilog
Instruction and Data Memory

In embedded systems


Instruction memory is usually ROM, flash,
SRAM, or combination
Data memory is usually SRAM


DRAM if large capacity needed
Processor/memory interfacing

Gluing the signals together
Digital Design — Chapter 7 — Processor Basics
36
Verilog
Example: Gumnut Memory
data
SRAM
gumnut
instruction
ROM
clk_i
rst_i
clk_i
en
D
Q
clk_i
inst_cyc_o
inst_stb_o
data_cyc_o
data_stb_o
inst_ack_i
data_ack_i
Q
D
clk
clk
adr
dat_o
en
inst_adr_o
inst_dat_i
data_we_o
we
data_adr_o
data_dat_o
data_dat_i
adr
dat_i
dat_o
Digital Design — Chapter 7 — Processor Basics
37
Verilog
Example: Gumnut Memory
always @(posedge clk) // Instruction memory
if (inst_cyc_o && inst_stb_o) begin
inst_dat_i <= inst_ROM[inst_adr_o[10:0]];
inst_ack_i <= 1'b1;
end
else
inst_ack_i <= 1'b0;
Digital Design — Chapter 7 — Processor Basics
38
Verilog
Example: Gumnut Memory
always @(posedge clk) // Data memory
if (data_cyc_o && data_stb_o)
if (data_we_o) begin
data_RAM[data_adr_o] <= data_dat_o;
data_dat_i
<= data_dat_o;
data_ack_i
<= 1'b1;
end
else begin
data_dat_i <= data_RAM[data_adr_o];
data_ack_i <= 1'b1;
end
else
data_ack_i <= 1'b0;
Digital Design — Chapter 7 — Processor Basics
39
Verilog
Example: Microcontroller Memory
8051
SRAM
P2
A(15..8)
D
P0
ALE
PSEN
WR
RD
D Q
A(7..0)
LE
A(16)
WE
OE
CE
Digital Design — Chapter 7 — Processor Basics
40
Verilog
32-bit Memory

Four bytes per memory word


Little-endian: lsb at least address
Big-endian: msb at least address
0
4
8

2
6
10
3
7
11
Partial-word read


1
5
9
Read all bytes, processor selects those needed
Partial-word write

Use byte-enable signals
Digital Design — Chapter 7 — Processor Basics
41
Verilog
Example: MicroBlaze Memory
Addr
Data_Write
2:16
0:7
AS
SSRAM
A
D_in D_out
Write_Strobe
en
Byte_Enable(0)
wr
clk
Byte_Enable(1)
Byte_Enable(2)
SSRAM
Byte_Enable(3)
Read_Strobe
0:7
8:15
A
D_in D_out
8:15
en
Data_Read
wr
+V
clk
Ready
Clk
SSRAM
16:23
A
D_in D_out
16:23
en
wr
clk
SSRAM
24:31
A
D_in D_out
24:31
en
wr
clk
Digital Design — Chapter 7 — Processor Basics
42
Verilog
Cache Memory

For high-performance processors



Memory access time is several clock cycles
Performance bottleneck
Cache memory



Small fast memory attached to a processor
Stores most frequently accessed items,
plus adjacent items
Locality: those items are most likely to be
accessed again soon
Digital Design — Chapter 7 — Processor Basics
43
Verilog
Cache Memory

Memory contents divided into fixedsized blocks (lines)


Cache copies whole lines from memory
When processor accesses an item

If item is in cache: hit - fast access


Occurs most of the time
If item is not in cache: miss



Line containing item is copied from memory
Slower, but less frequent
May need to replace a line already in cache
Digital Design — Chapter 7 — Processor Basics
44
Verilog
Fast Main Memory Access

Optimize memory for line access by cache

Wide memory


Burst transfers


Send starting address, then read successive locations
Pipelining



Read a line in one access
Overlapping stages of memory access
E.g., address transfer, memory operation, data transfer
Double data rate (DDR), Quad data rate (QDR)

Transfer on both rising and falling clock edges
Digital Design — Chapter 7 — Processor Basics
45
Verilog
Summary

Embedded computer




Microprocessors, microcontrollers, and
processor cores
Soft-core processors for ASIC/FPGA
Processor instruction sets



Processor, memory, I/O controllers, buses
Binary encoding for instructions
Assembly language programs
Memory interfacing
Digital Design — Chapter 7 — Processor Basics
46