Microcontrollers in FPGAs Tomas Södergård University of Vaasa Contents Finite state machine Design of instructions Architecture Registry file Hardware aspects of MCUs Comparison of microcontrollers – Picoblaze, Nios II and Atmega328P Conclusions Finite state machine Moore Machine – Mealy Machine – Output dependent on current state and external input. Synchronisation (Zwolinski 2000: 82) – – Output only dependent on current state (Pedroni: 2004: 159) Clock Reset Programmable state machine – General purpose FSM (Meyer-Baese 2007: 537, Chu 2008: 324-326) Programmable state machine Control Program Memory Data Memory ALU Instructions Operations ALU operations - Add - Mul - Not Data move - Move - Push - Pop Branch - Compare - Jump - Loop Addressing modes “Addressing modes describe how the operands for an operation are located.” (Meyer-Baese 2007: 544) Implied addressing (Meyer-Baese 2007: 544-545) – – Location is implicitly defined No operands in the instruction Immediate addressing (Meyer-Baese 2007: 546) – – One operand in the instruction The operand is a constant Addressing modes Register addressing (Meyer-Baese 2007: 546–547) – – Data is fetched from fast CPU registers Used for ALU operations in most RISC machines Memory addressing (Meyer-Baese 2007: 547–549) – Direct addressing – Additional register needed due to instruction size In base addressing the additional register contains a constant that is added to the constant in the instruction. In page addressing the additional register contain the most significant bits of the address. Full address is obtained by concatenation. Indirect addressing The additional register contains the full address Data flow An instruction contains at least one (the first) of the following: – – – Operation code Operands Result location Parameters affecting the instruction size – – – Number of operations Number of operands Memory size Zero address CPU – Stack machine No operands in the instruction All operations are performed on the two top elements of the stack Code example: Push #5 Push #3 Add Pop Reg1 (Meyer-Baese 2007: 552-553) One address CPU – Accumulator machine One operand in the instruction The second operand is the value of the accumulator The destination is the accumulator Code example Load #5 Add #3 Store Reg1 (Meyer-Baese 2007: 553-554) Two address CPU The instruction contains two operands The destination of the result is the location of the first operand Code examples Move Reg1, #5 Add Reg1, #3 (Meyer-Baese 2007: 555) Move Reg2, #5 Move Reg1, #3 Add Reg1, Reg2 Three address CPU The instruction contains three addresses Destination and sources can be specified separately Code examples Move Reg2, #5 Move Reg3, #3 Add Reg1,Reg2,Reg3 (Meyer-Baese 2007: 555-556) Add Reg1, #5, #3 Architecture Von Neumann Architecture – Separate data and program memory = Two buses Program Data CPU Program Super Harvard Architecture – CPU Harvard Architecture – Shared data and program memory = One bus Data & Separate X and Y data memories and separate program memory = Three buses Fast cache registers for immediate results Data X CPU Data Y (Meyer-Baese 2007: 558) Program Registry file Two dimensional bit array Has a mechanism for storing data to the registry file Has a mechanism for reading data from the registry file Consumes many logical elements in a FPGA – The registry file in the example discussed on the following pages is of size 8x16 and consumes 211 LEs (Meyer-Baese 2007: 560) VHDL registry file example Entity declaration (Meyer-Baese 2007: 560) Entity reg_file IS generic (W: integer:=7; N: integer :=15); port(clk, reg_ena : in std_logic; data : in std_logic_vector(W downto 0); rd, rs, rt : in integer range 0 to 15; s, t : out std_logic_vector(W downto 0)); End; VHDL registry file example Architecture: type declarations (Meyer-Baese 2007: 560) Architecture fpga of reg_file is subtype bitw is std_logic_vector(W downto 0); type SLV_NxW is array (0 to N) of bitw; signal r : SLV_NxW; Begin Mux: Process Begin wait until clk=’1’; if rd>0 then r(rd)<=data; end if; End Process Mux; VHDL registry file example Architecture: Demux for outputs (Meyer-Baese 2007: 560) Demux: Process(r,rs,rt) Begin if rs>0 then s<=r(rs); else s<=(others=>’0’); end if; if rt>0 then t<=r(rt); else t<=(others=>’0’); end if; End Process Demux; FSM vs PSM (Chu:2008:324) FSM PSM Special purpose General purpose State register Program counter (PC) Generates certain output based on simple logic Generates outputs based on encoding and decoding Next state can be specified freely Next state is normally an incrementation of the PC. Exceptions are branch instructions. Structural aspects for FPGAs Harvard Architecture better for FPGA MCUs – Reason: Memory size more limited (and slower) Data flow (Meyer-Baese 2007: 556-557) – A more complex instruction implies: Easier assembly programming More complicated C compiler development Longer instruction Fewer instructions needed Lower speed Larger constant is immediate addressing Comparison of instructions Parameter Picoblaze Nios II Atmega328P Architecture Harvard Harvard Harvard Registry file 16 x 8 bit 32 x 32 bit 32 x 8 bit Clk/instr. 2 1 1-2 Instr. count 57 256 131 Data mem. 64 B ? 2 kB Instr. width 18 bit 32 bit ? LE count ~200 >700 - Data flow 2 address 3 address 2 address (Chu 2008: 323, 326-327, 329, 332-337 Altera Nios II/e, Altera Nios II/f, Altera 2011: 3,11-12, Atmega328P: 1, 8, Moshovos 2007) Recently developed MCU Article publiched in Semptember 2011 by Martin Shoeberl. Properties: – – – – – – – – Name= Leros 16 bit microcontroller Accumulator machine/one address CPU 200 LEs 2 stage pipeline = fectch and decode 2 clock cycles/instruction Portable= Successfully tested in Altera and Xilinx devices Assembly compiler available Conclusions – Useful technology? Area optimisation – – Reuse of code – – Algorithms like FFT may consume less resources, but will hence become slower. (Meyer-Baese 2007: 537) Main purpose of FPGA technology is processing speed? Controller and datpath partitioning (Zwolinski 2000: 160) General vs special purpose state machine (Chu 2008: 324) Complexity – Moves some of the complexity of VHDL (or Verilog) to the compiler Conclusions – Useful technology? Speed – – No parallism anymore Backwards development? Especially useful when: – – Part of a larger circuit Multi controller systems that perform simpler tasks Sources Atmega 328P. 8-bit Microcontroller with 4/8/16/32K Bytes In-System Programmable Flash [online] [cited 17.11.2011] Available from Internet: URL http://www.atmel.com/dyn/resources/prod_documents/doc8271.pdf AVR assembly. Beginner’s introduction to AVR assembler [online][cited 17.11.2011] Available from Internet: URL http://www.avr-asm-tutorial.net/avr_en/beginner/index.html Altera Nios II (2011). Processor Architecture. [online][cited 18.11.2011] http://www.altera.com/literature/hb/nios2/n2cpu_nii51002.pdf Altera Nios II/e Core. Economy. [online][cited 18.11.2011] URL: http://www.altera.com/devices/processor/nios2/cores/economy/ni2-economycore.html Sources Altera Nios II/f Core. Fast for Performance Critical Applications [online] [cited 18.11.2011]. URL: http://www.altera.com/devices/processor/nios2/cores/fast/ni2fast-core.html Chu, Pong P. (2007). FPGA Prototyping by VHDL Examples. Ohio: Wiley. Meyer-Baese, U. (1999). Digital Signal Processing with Field Programmable Gate Arrays. 3. Edition. Heidelberg: Springer. Moshovos, Andreas (2007). Using Assembly Language to Write Programs. [online] [cited 18.11.2011]. Available from Internet. URL: http://www.eecg.toronto.edu/~moshovos/ECE243-2009/lec5%20%20Intro%20to%20Assembly.htm Sources Shoeberl, Martin (2011). Leros: A Tiny Microcontroller for FPGAs. Field Programmable Logic and Applications (FPL), 2011 International Conference. 10–14. Zwolinski, Mark (2000). Digital System Design with VHDL. 2. Edition. Essex: Pearson Education Limited.