u-controller() / FPGA/ opponent

advertisement
Microcontrollers in FPGAs
Tomas Södergård
University of Vaasa
Contents






Finite state machine
Design of instructions
Architecture
Registry file
Hardware aspects of MCUs
Comparison of microcontrollers
–

Picoblaze, Nios II and Atmega328P
Conclusions
Finite state machine

Moore Machine
–

Mealy Machine
–

Output dependent on current state and external input.
Synchronisation (Zwolinski 2000: 82)
–
–

Output only dependent on current state (Pedroni: 2004:
159)
Clock
Reset
Programmable state machine
–
General purpose FSM (Meyer-Baese 2007: 537, Chu 2008:
324-326)
Programmable state machine
Control
Program
Memory
Data
Memory
ALU
Instructions
Operations
ALU operations
- Add
- Mul
- Not
Data move
- Move
- Push
- Pop
Branch
- Compare
- Jump
- Loop
Addressing modes

“Addressing modes describe how the operands for an operation
are located.” (Meyer-Baese 2007: 544)

Implied addressing (Meyer-Baese 2007: 544-545)
–
–

Location is implicitly defined
No operands in the instruction
Immediate addressing (Meyer-Baese 2007: 546)
–
–
One operand in the instruction
The operand is a constant
Addressing modes

Register addressing (Meyer-Baese 2007: 546–547)
–
–

Data is fetched from fast CPU registers
Used for ALU operations in most RISC machines
Memory addressing (Meyer-Baese 2007: 547–549)
–
Direct addressing



–
Additional register needed due to instruction size
In base addressing the additional register contains a constant that is
added to the constant in the instruction.
In page addressing the additional register contain the most significant
bits of the address. Full address is obtained by concatenation.
Indirect addressing

The additional register contains the full address
Data flow

An instruction contains at least one (the first) of the following:
–
–
–

Operation code
Operands
Result location
Parameters affecting the instruction size
–
–
–
Number of operations
Number of operands
Memory size
Zero address CPU – Stack machine

No operands in the instruction
All operations are performed on the two top elements of the
stack

Code example:

Push #5
Push #3
Add
Pop Reg1
(Meyer-Baese 2007: 552-553)
One address CPU – Accumulator
machine

One operand in the instruction
The second operand is the value of the accumulator
The destination is the accumulator

Code example


Load #5
Add #3
Store Reg1
(Meyer-Baese 2007: 553-554)
Two address CPU

The instruction contains two operands
The destination of the result is the location of the first operand

Code examples

Move Reg1, #5
Add Reg1, #3
(Meyer-Baese 2007: 555)
Move Reg2, #5
Move Reg1, #3
Add Reg1, Reg2
Three address CPU

The instruction contains three addresses
Destination and sources can be specified separately

Code examples

Move Reg2, #5
Move Reg3, #3
Add Reg1,Reg2,Reg3
(Meyer-Baese 2007: 555-556)
Add Reg1, #5, #3
Architecture

Von Neumann Architecture
–

Separate data and program memory =
Two buses
Program
Data
CPU
Program
Super Harvard Architecture
–

CPU
Harvard Architecture
–

Shared data and program memory = One
bus
Data &
Separate X and Y data memories and
separate program memory = Three buses
Fast cache registers for immediate results
Data X
CPU
Data Y
(Meyer-Baese 2007: 558)
Program
Registry file




Two dimensional bit array
Has a mechanism for storing data to the registry file
Has a mechanism for reading data from the registry file
Consumes many logical elements in a FPGA
–
The registry file in the example discussed on the following pages is
of size 8x16 and consumes 211 LEs (Meyer-Baese 2007: 560)
VHDL registry file example

Entity declaration (Meyer-Baese 2007: 560)
Entity reg_file IS
generic (W: integer:=7;
N: integer :=15);
port(clk, reg_ena : in std_logic;
data : in std_logic_vector(W downto 0);
rd, rs, rt : in integer range 0 to 15;
s, t : out std_logic_vector(W downto 0));
End;
VHDL registry file example

Architecture: type declarations (Meyer-Baese 2007: 560)
Architecture fpga of reg_file is
subtype bitw is std_logic_vector(W downto 0);
type SLV_NxW is array (0 to N) of bitw;
signal r : SLV_NxW;
Begin
Mux: Process
Begin
wait until clk=’1’;
if rd>0 then
r(rd)<=data;
end if;
End Process Mux;
VHDL registry file example

Architecture: Demux for outputs (Meyer-Baese 2007: 560)
Demux: Process(r,rs,rt)
Begin
if rs>0 then
s<=r(rs);
else
s<=(others=>’0’);
end if;
if rt>0 then
t<=r(rt);
else
t<=(others=>’0’);
end if;
End Process Demux;
FSM vs PSM
(Chu:2008:324)
FSM
PSM
Special purpose
General purpose
State register
Program counter (PC)
Generates certain output based on
simple logic
Generates outputs based on
encoding and decoding
Next state can be specified freely
Next state is normally an
incrementation of the PC.
Exceptions are branch instructions.
Structural aspects for FPGAs

Harvard Architecture better for FPGA MCUs
– Reason: Memory size more limited (and slower)

Data flow (Meyer-Baese 2007: 556-557)
–
A more complex instruction implies:






Easier assembly programming
More complicated C compiler development
Longer instruction
Fewer instructions needed
Lower speed
Larger constant is immediate addressing
Comparison of instructions
Parameter
Picoblaze
Nios II
Atmega328P
Architecture
Harvard
Harvard
Harvard
Registry file
16 x 8 bit
32 x 32 bit
32 x 8 bit
Clk/instr.
2
1
1-2
Instr. count
57
256
131
Data mem.
64 B
?
2 kB
Instr. width
18 bit
32 bit
?
LE count
~200
>700
-
Data flow
2 address
3 address
2 address
(Chu 2008: 323, 326-327, 329, 332-337 Altera Nios II/e, Altera Nios II/f, Altera 2011:
3,11-12, Atmega328P: 1, 8, Moshovos 2007)
Recently developed MCU


Article publiched in Semptember 2011 by Martin Shoeberl.
Properties:
–
–
–
–
–
–
–
–
Name= Leros
16 bit microcontroller
Accumulator machine/one address CPU
200 LEs
2 stage pipeline = fectch and decode
2 clock cycles/instruction
Portable= Successfully tested in Altera and Xilinx devices
Assembly compiler available
Conclusions – Useful technology?

Area optimisation
–
–

Reuse of code
–
–

Algorithms like FFT may consume less resources, but will hence
become slower. (Meyer-Baese 2007: 537)
Main purpose of FPGA technology is processing speed?
Controller and datpath partitioning (Zwolinski 2000: 160)
General vs special purpose state machine (Chu 2008: 324)
Complexity
–
Moves some of the complexity of VHDL (or Verilog) to the compiler
Conclusions – Useful technology?

Speed
–
–

No parallism anymore
Backwards development?
Especially useful when:
–
–
Part of a larger circuit
Multi controller systems that perform simpler tasks
Sources
Atmega 328P. 8-bit Microcontroller with 4/8/16/32K Bytes In-System Programmable
Flash [online] [cited 17.11.2011] Available from Internet: URL
http://www.atmel.com/dyn/resources/prod_documents/doc8271.pdf
AVR assembly. Beginner’s introduction to AVR assembler
[online][cited 17.11.2011] Available from Internet: URL
http://www.avr-asm-tutorial.net/avr_en/beginner/index.html
Altera Nios II (2011). Processor Architecture. [online][cited 18.11.2011]
http://www.altera.com/literature/hb/nios2/n2cpu_nii51002.pdf
Altera Nios II/e Core. Economy. [online][cited 18.11.2011] URL:
http://www.altera.com/devices/processor/nios2/cores/economy/ni2-economycore.html
Sources
Altera Nios II/f Core. Fast for Performance Critical Applications [online] [cited
18.11.2011]. URL: http://www.altera.com/devices/processor/nios2/cores/fast/ni2fast-core.html
Chu, Pong P. (2007). FPGA Prototyping by VHDL Examples. Ohio: Wiley.
Meyer-Baese, U. (1999). Digital Signal Processing with Field Programmable Gate
Arrays. 3. Edition. Heidelberg: Springer.
Moshovos, Andreas (2007). Using Assembly Language to Write Programs. [online]
[cited 18.11.2011]. Available from Internet. URL:
http://www.eecg.toronto.edu/~moshovos/ECE243-2009/lec5%20%20Intro%20to%20Assembly.htm
Sources
Shoeberl, Martin (2011). Leros: A Tiny Microcontroller for FPGAs. Field Programmable Logic and Applications (FPL), 2011 International Conference. 10–14.
Zwolinski, Mark (2000). Digital System Design with VHDL. 2. Edition. Essex: Pearson
Education Limited.
Download