Peter Yiannacouras
April 13, 2006
You all used FPGAs (ECE241)
Adders
7-segment decoders
Etc.
We are putting whole microprocessors on them
We call these soft processors
Hard Processors
Made of transistors
Costs millions to make
Soft Processor
Written in HDL
Programmed onto chip
Verilog
Faster
Smaller
Less Power
FPGAs are a common platform for digital systems
UART
Custom
Logic
Soft
Processor
Memory
Interface
Ethernet
Performs coordination and even computation
Better processors => less hardware to design
We aim to improve soft processors by customizing them
Soft processors have worse
Area
Speed
Power use to counteract
But are
Flexible
HOW???
Customize the processor’s architecture ie. Intel vs AMD ie. Motorola 68360 vs 68010
HOW????
1.
Understand tradeoffs in soft processors
Eg. A hardware multiplier is big but can perform multiplies fast
2.
Customize it to the application
Eg. Bubble sort doesn’t use multiplies, therefore remove hardware multiplier and save on area
We developed SPREE, software to help us do both
(Soft Processor Rapid Exploration Environment)
SPREE
Processor
Datapath
■ Input: Processor description
■ SPREE System
1.
Verify ISA against datapath
2.
Datapath Instantiation
3.
Control Generation
Verilog
■ Output: Synthesizable Verilog
Input: Instruction Set Architecture
(ISA) Description
■ ISA
■ Datapath
SPREE
Verilog
■ Graph of Generic Operations (GENOPs)
■ Edges indicate flow of data
MIPS ADD – add rd, rs, rt
FETCH
RFREAD RFREAD
ADD
RFWRITE
ISA currently fixed (subset of MIPS I)
■ ISA
■ Datapath
SPREE
RTL
■ Interconnection of hand-coded components
■ Allows efficient synthesis
■ Described using C++
Mul
Ifetch
Reg file
Data
Mem
ALU
Write
Back
Shifter
Back
Data
Mem
SPREE
Component
Library
Select by name
Names looked up in library
Stored in cpugen/rtl_lib
RTLComponent *ifetch=new RTLComponent("ifetch");
RTLComponent *reg_file=new RTLComponent("reg_file");
Ifetch rd rs rt offset
Regfile dst a_reg a_data b_reg b_data writedata
ALU opA result opB proc.addConnection(ifetch,"rs",reg_file,"a_reg"); proc.addConnection(ifetch,"rt",reg_file,"b_reg");
SPREE System + Backend
(Soft Processor Rapid Exploration Environment)
Processor
Description
SPREE generator
(spegen)
Verilog
Benchmarks
Mint
MIPS Simulator
(simulator/run)
Compare traces
Modelsim
Verilog Simulator
(spebenchmark)
4. Cycle Count
Quartus II
CAD Software
(specadflow)
1. Area
2. Clock Frequency
3. Power
Walking through an Example (see
README.txt)
Choose a pre-built processor
cpugen/src/arch lists all the processors
Let’s choose pipe3_serialshift
3-stage pipeline with serial shifter
Generate, benchmark, synthesize
% spegen pipe3_serialshift
% spebenchmark pipe3_serialshift
% specadflow pipe3_serialshift
% specompare pipe3_serialshift
←
Generates Verilog
←
Runs benchmarks
←
Synthesizes processor
←
Display results
Input: Processor description
Syntax: spegen <processor name>
Output:
A folder named after the processor
Hand-coded Verilog modules system.v
Generated hookup and control
OUT.cpugen
stages per instruction
Hazard window/branch penalty test_bench.v
test bench for Modelsim simulation
Run programs on the processor
Measure time taken till completion
Verify functionality
Can do this without knowing anything about the benchmarks themselves
Input: Processor implementation
Syntax: spebenchmark <processor>
Output: (ideally)
Cycle counts of all benchmarks
******* Benchmarking pipe3_serialshift ********
Simulating bubble_sort ... Success! Cycle count=2994
Simulating crc ... Success! Cycle count=112750
Simulating des ... Success! Cycle count=5129
Simulating fft ... Success! Cycle count=5077
Simulating fir ... Success! Cycle count=1214
...
Traces: /tmp/modelsim_trace.txt
C source benchmarks
Compiler
(gcc - MIPS)
Binary
Executable
Verilog
Mint
MIPS Simulator
(simulator/run)
Trace applications/<benchmark name>/mint
Compare traces
spebenchmark
Modelsim
Verilog Simulator
(spebenchmark)
Trace Cycle Count
/tmp/modelsim_trace.txt
/tmp/modelsim_store_trace.txt
Choose the path to your compiler (prebuilt)
Default: /jayar/b/b0/yiannac/spe/compiler
GCC 3.3.3, software division
Another: /jayar/b/b0/yiannac/spe/compiler-softmul
GCC 3.3.3, software division and software multiplication
% specompiler /jayar/b/b0/yiannac/spe/compiler-softmul
1.
specompiler will:
Compile all benchmarks (and store binaries)
2.
Simulate all benchmarks (and store traces)
After this point, you can just run spebenchmark
Shows discrepancy between MINT and
Modelsim
******* Benchmarking pipe3_serialshift ********
Simulating bubble_sort ... Error: Trace does not match, Cycle count=381
Discrepancy found at 6800000 ps
Modelsim: PC=04000064 | IR=24090001 | 05: 00000000
Mint: PC=040000b8 | IR=8c47004c | 07: 00000064
Clues to where the error occurred destination register value being written
Can see any signal within the processor
% sim_gui bubble_sort pipe3_serialshift
LEARN IT!!!
Quartus Simulator is vastly inferior, and even unusable for our purposes
What is it?
The stimulus and monitor for your circuit
SPREE automatically generates
And hence it works right away
Handcoding your own processor means
You have to interface with the test bench
Once you have the testbench you can use spebenchmark
Manual Interfacing with the
Testbench
Need only 6 wires
To track writes to register file and data mem test_bench.v
regfile_we regfile_dst regfile_data datamem_we datamem_addr datamem_data
Your soft processor
SPREE System + Backend
(Soft Processor Rapid Exploration Environment)
Processor
Description
SPREE generator
(spegen)
Verilog
Benchmarks
Mint
MIPS Simulator
(simulator/run)
Compare traces
Modelsim
Verilog Simulator
(spebenchmark)
4. Cycle Count
Quartus II
CAD Software
(specadflow)
1. Area
2. Clock Frequency
3. Power
Input: Processor implementation
Syntax: specadflow <processor name>
Performs a “seed sweep”
Average several runs since results are noisy
Run several instances of quartus
Across several machines in parallel
Output:
Synthesis results (hidden)
Summary output
Started Tue 6:27PM, Waiting for processes:
10.0.0.61 10.0.0.57 10.0.0.56 10.0.0.55 10.0.0.54
10.0.0.51 Finished Tue 6:33PM
1081 Area (LEs or ALUTs)
75.7812
Clock Frequency (MHz)
0.99822
Estimated Energy/cycle dissipated (nJ/cycle)
... Waiting on eda writer
Technical support, ask me
Copy and unpack the SPREE tarball:
/jayar/b/b0/yiannac/spree.tar.gz
Build all the SPREE software
% cd spree
% make
Follow instructions in INSTALL.txt
If there’s any errors, email me
spree applications compiler cpugen modelsim simulator quartus
Benchmarks
C source binutils gcc newlib the cpu generator
+ processor descriptions
Verilog simulator
MIPS simulator synthesis
Choose the cluster you’re using aenao – high performance, limited access
eecg – any eecg-connected machine
% specluster eecg
OR
% specluster aenao
Edit quartus/machines.txt
Put a list of 11 or so good eecg machines