SPREE Tutorial

advertisement

SPREE Tutorial

Peter Yiannacouras

April 13, 2006

Processors on FPGAs

 You all used FPGAs (ECE241)

 Adders

 7-segment decoders

 Etc.

 We are putting whole microprocessors on them

 We call these soft processors

Hard Versus Soft Processors

 Hard Processors

 Made of transistors

 Costs millions to make

 Soft Processor

 Written in HDL

 Programmed onto chip

Verilog

Faster

Smaller

Less Power

Processors and FPGA Systems

 FPGAs are a common platform for digital systems

UART

Custom

Logic

Soft

Processor

Memory

Interface

Ethernet

 Performs coordination and even computation

 Better processors => less hardware to design

We aim to improve soft processors by customizing them

Our Research Problem

 Soft processors have worse

 Area

 Speed

 Power use to counteract

 But are

 Flexible

HOW???

Customize the processor’s architecture ie. Intel vs AMD ie. Motorola 68360 vs 68010

HOW????

Research Goals

1.

Understand tradeoffs in soft processors

Eg. A hardware multiplier is big but can perform multiplies fast

2.

Customize it to the application

Eg. Bubble sort doesn’t use multiplies, therefore remove hardware multiplier and save on area

We developed SPREE, software to help us do both

SPREE System

(Soft Processor Rapid Exploration Environment)

SPREE

Processor

Datapath

■ Input: Processor description

■ SPREE System

1.

Verify ISA against datapath

2.

Datapath Instantiation

3.

Control Generation

Verilog

■ Output: Synthesizable Verilog

Input: Instruction Set Architecture

(ISA) Description

■ ISA

■ Datapath

SPREE

Verilog

■ Graph of Generic Operations (GENOPs)

■ Edges indicate flow of data

MIPS ADD – add rd, rs, rt

FETCH

RFREAD RFREAD

ADD

RFWRITE

ISA currently fixed (subset of MIPS I)

Input: Datapath Description

■ ISA

■ Datapath

SPREE

RTL

■ Interconnection of hand-coded components

■ Allows efficient synthesis

■ Described using C++

Mul

Ifetch

Reg file

Data

Mem

ALU

Write

Back

Shifter

Back

Data

Mem

SPREE

Component

Library

Component Selection

 Select by name

 Names looked up in library

 Stored in cpugen/rtl_lib

RTLComponent *ifetch=new RTLComponent("ifetch");

RTLComponent *reg_file=new RTLComponent("reg_file");

Datapath Wiring Example

Ifetch rd rs rt offset

Regfile dst a_reg a_data b_reg b_data writedata

ALU opA result opB proc.addConnection(ifetch,"rs",reg_file,"a_reg"); proc.addConnection(ifetch,"rt",reg_file,"b_reg");

SPREE System + Backend

(Soft Processor Rapid Exploration Environment)

Processor

Description

SPREE generator

(spegen)

Verilog

Benchmarks

Mint

MIPS Simulator

(simulator/run)

Compare traces

 

Modelsim

Verilog Simulator

(spebenchmark)

4. Cycle Count

Quartus II

CAD Software

(specadflow)

1. Area

2. Clock Frequency

3. Power

Walking through an Example (see

README.txt)

 Choose a pre-built processor

 cpugen/src/arch lists all the processors

 Let’s choose pipe3_serialshift

 3-stage pipeline with serial shifter

Using SPREE on a Processor

 Generate, benchmark, synthesize

% spegen pipe3_serialshift

% spebenchmark pipe3_serialshift

% specadflow pipe3_serialshift

% specompare pipe3_serialshift

Generates Verilog

Runs benchmarks

Synthesizes processor

Display results

spegen – Generating Processors

Input: Processor description

Syntax: spegen <processor name>

 Output:

 A folder named after the processor

Hand-coded Verilog modules system.v

Generated hookup and control

OUT.cpugen

 stages per instruction

Hazard window/branch penalty test_bench.v

 test bench for Modelsim simulation

Benchmarking

 Run programs on the processor

 Measure time taken till completion

 Verify functionality

 Can do this without knowing anything about the benchmarks themselves

spebenchmark – Benchmarking

Input: Processor implementation

Syntax: spebenchmark <processor>

 Output: (ideally)

 Cycle counts of all benchmarks

******* Benchmarking pipe3_serialshift ********

Simulating bubble_sort ... Success! Cycle count=2994

Simulating crc ... Success! Cycle count=112750

Simulating des ... Success! Cycle count=5129

Simulating fft ... Success! Cycle count=5077

Simulating fir ... Success! Cycle count=1214

...

 Traces: /tmp/modelsim_trace.txt

Benchmarking – under the hood

C source benchmarks

Compiler

(gcc - MIPS)

Binary

Executable

Verilog

Mint

MIPS Simulator

(simulator/run)

Trace applications/<benchmark name>/mint

Compare traces

  spebenchmark

Modelsim

Verilog Simulator

(spebenchmark)

Trace Cycle Count

/tmp/modelsim_trace.txt

/tmp/modelsim_store_trace.txt

specompiler - Setup compiler

Choose the path to your compiler (prebuilt)

Default: /jayar/b/b0/yiannac/spe/compiler

GCC 3.3.3, software division

Another: /jayar/b/b0/yiannac/spe/compiler-softmul

 GCC 3.3.3, software division and software multiplication

% specompiler /jayar/b/b0/yiannac/spe/compiler-softmul

1.

specompiler will:

Compile all benchmarks (and store binaries)

2.

Simulate all benchmarks (and store traces)

After this point, you can just run spebenchmark

spebenchmark - failure

 Shows discrepancy between MINT and

Modelsim

******* Benchmarking pipe3_serialshift ********

Simulating bubble_sort ... Error: Trace does not match, Cycle count=381

Discrepancy found at 6800000 ps

Modelsim: PC=04000064 | IR=24090001 | 05: 00000000

Mint: PC=040000b8 | IR=8c47004c | 07: 00000064

Clues to where the error occurred destination register value being written

spebenchmark - waveforms

 Can see any signal within the processor

% sim_gui bubble_sort pipe3_serialshift

Modelsim

 LEARN IT!!!

 Quartus Simulator is vastly inferior, and even unusable for our purposes

The Testbench (test_bench.v)

 What is it?

 The stimulus and monitor for your circuit

 SPREE automatically generates

 And hence it works right away

 Handcoding your own processor means

 You have to interface with the test bench

 Once you have the testbench you can use spebenchmark

Manual Interfacing with the

Testbench

 Need only 6 wires

 To track writes to register file and data mem test_bench.v

regfile_we regfile_dst regfile_data datamem_we datamem_addr datamem_data

Your soft processor

SPREE System + Backend

(Soft Processor Rapid Exploration Environment)

Processor

Description

SPREE generator

(spegen)

Verilog

Benchmarks

Mint

MIPS Simulator

(simulator/run)

Compare traces

 

Modelsim

Verilog Simulator

(spebenchmark)

4. Cycle Count

Quartus II

CAD Software

(specadflow)

1. Area

2. Clock Frequency

3. Power

specadflow – Synthesis

 Input: Processor implementation

 Syntax: specadflow <processor name>

 Performs a “seed sweep”

 Average several runs since results are noisy

 Run several instances of quartus

 Across several machines in parallel

specadflow Output

 Output:

 Synthesis results (hidden)

 Summary output

Started Tue 6:27PM, Waiting for processes:

10.0.0.61 10.0.0.57 10.0.0.56 10.0.0.55 10.0.0.54

10.0.0.51 Finished Tue 6:33PM

1081 Area (LEs or ALUTs)

75.7812

Clock Frequency (MHz)

0.99822

Estimated Energy/cycle dissipated (nJ/cycle)

... Waiting on eda writer

Any Questions?

 Technical support, ask me

EXTRAS

Setup/Install

 Copy and unpack the SPREE tarball:

 /jayar/b/b0/yiannac/spree.tar.gz

 Build all the SPREE software

% cd spree

% make

 Follow instructions in INSTALL.txt

 If there’s any errors, email me

SPREE Directory Structure

spree applications compiler cpugen modelsim simulator quartus

Benchmarks

C source binutils gcc newlib the cpu generator

+ processor descriptions

Verilog simulator

MIPS simulator synthesis

Setup cluster

Choose the cluster you’re using aenao – high performance, limited access

 eecg – any eecg-connected machine

% specluster eecg

OR

% specluster aenao

Edit quartus/machines.txt

Put a list of 11 or so good eecg machines

Download