Liquid Architecture Microarchitecture Optimization for Embedded Systems

advertisement
Liquid Architecture
Microarchitecture Optimization
for Embedded Systems
D. Schuehler, B. Brodie, R. Chamberlain, R. Cytron,
S. Friedman, J. Fritts, P. Jones, P. Krishnamurthy,
J. Lockwood, S. Padmanabhan, and H. Zhang
Dept. of Computer Science and Engineering
Washington University in St. Louis
Supported by NSF ITR-0313203
Liquid Architecture
• Configurable architecture that can adapt
to needs of particular application
• E.g., within an FPGA
– Soft-core processors
• E.g., as an embedded processor
– Tensilica supports configuration at fab time
– Stretch support configuration at run time
• Today’s discussion is on performance
analysis and configuration choice
Block Diagram
Event Bus
Statistics
Module
LEON
SPARCcompatible
processor
External
Memory
I-Cache D-Cache
AHB
LED UART
Memory
Controller
`
Network
Interface
`
Control
Packet
Processor
FPX
FPGA
APB
Adapter
Layered Internet Protocol Wrappers
Boot
Rom
Microarchitecture Configurability
• Instruction set
• Memory subsystem
– Cache size (I and D)
– Associativity
– Cache line size
• Co-processor(s)
• Instruction pipeline
• Full HDL source is available
Design Flow
Internet
Write and compile
embedded SPARC
application with GCC
Identify
configuration for
candidate
architecture
Reconfigure FPX
hardware via Internet and
upload system software.
Execute program
on FPX Platform
and measure runtime performance
Method
Time /
Cycles
Cycle-accurate profiling
.text
main
addQuery
findMatch
computeKey
computeBase
computeStep
fillQuery
Rnd
• Choose methods to profile from the
user interface
Method
Address
Range
.text
main
Lo
addQuery
0x4000027C
0x400003EF
Hi
findMatch
computeKey
computeBase
computeStep
fillQuery
Rnd
Event Bus
Method
PC
Statistics Module
.text
0x4000035A
main
Lo
addQuery
0x4000027C
0x400003EF
Hi
findMatch
computeKey
computeBase
computeStep
fillQuery
Rnd
CLK
Event Bus
Function
PC
.text
Statistics Module
Lo
main
0x4000027C
≤
0x4000035A
≤
addQuery
findMatch
computeKey
computeBase
computeStep
fillQuery
Rnd
INCR
Counter
0x400003EF
Hi
CLK
Event Bus
Function
PC
.text
Statistics Module
Lo
main
0x4000027C
≤
0x4000035A
≤
0x400003EF
Hi
addQuery
INCR
findMatch
Counter
computeKey
computeBase
Lo
0x400005D8
≤
0x4000035A
≤
computeStep
fillQuery
Rnd
INCR
Counter
0x4000061F
Hi
CLK
Event Bus
PC
Statistics Module
Lo
0x4000027C
≤
0x4000035A
INCR
≤
0x400003EF
Hi
Counter
To User
Lo
0x400005D8
≤
0x4000035A
INCR
≤
Counter
0x4000061F
Hi
CLK
Where is time spent?
100%
90%
% of total runtime
80%
Rest
coreLoop
findMatch
70%
60%
BLASTN
biosequence
search
application
50%
40%
30%
20%
10%
0%
128K
32K
Size of hash table (Bytes)
Function
Time /
Cycles
Cache Hits / Misses
Read
Write
.text
main
addQuery
findMatch
computeKey
computeBase
computeStep
fillQuery
Rnd
Expand to
measure cache
hits/misses
Measure Several Configurations
Impact of D-cache Configuration
100
Total
findMatch
coreLoop
98
hit rate (%)
96
BLASTN
biosequence
search
application
94
92
90
88
86
128K, 1Kx1
128K,
32Kx1
128K,
16Kx2
32K, 1Kx1 32K, 32Kx1 32K, 16Kx2
Size of hash table, D-cache configuration
Impact of I-cache Configuration
35
Run time (secs)
30
1KB I-Cache
4KB I-Cache
25
BLASTN
biosequence
search
application
20
15
10
5
0
128K
32K
BLASTN hash table sizes (Bytes)
Function
.text
main
addQuery
findMatch
computeKey
computeBase
computeStep
fillQuery
Rnd
Time /
Cycles
Cache Hits / Misses
Read
Write
Pipeline
Stalls
Branch
Predict
Time for Single Run
100000
80000
10000
Time (sec)
1800
1000
100
10
1
SimpleScalar 3.0
LEON
Almost 2
orders of
magnitude
faster than
simulation
Implications of Slow Simulation
• Focus has historically been on measuring
the performance of a single thread of a
single application
• Real apps are often executed in a
multitasking environment
– Impacts cache behavior
– Ignores OS (system call) performance
• Liquid architecture system enables direct
measurement, including OS
OS Boot Sequence
Summary
• Run-time reconfigurable processors will be
available sooner rather than later
• Determining desired configuration is a
difficult design task
– Large search space
– Depends on accurate performance data
• Liquid architecture system enables direct
measurement of performance properties
Current and Future Work
• Evaluation of several arch. design ideas
• Automated search of the design space
• Characterizing performance analysis
methods
– Analytic models
– Simulation models
– Direct execution models
• Usable as is for evaluating soft-core procs
• Like to extend to higher-speed procs
Download