ch7-3

advertisement
Chapter 7, part 3:
Hardware/Software Co-Design
High Performance Embedded
Computing
Wayne Wolf
High Performance Embedded Computing
© 2007 Elsevier
Topics





Multi-objective optimization.
Co-synthesis for control.
Co-synthesis for caches.
Co-synthesis for reconfigurable platforms.
Hardware/software co-simulation.
© 2006 Elsevier
Multi-objective optimization


Operations research provides notions for
optimization functions with multiple
objectives.
Pareto optimality: optimal solution cannot be
improved without making something else
worse.
© 2006 Elsevier
GOPS



Feasibility factor
computed from
throughput factors.
Upper-bound
throughput for RMS:
Upper-bound
throughput for EDF:
© 2006 Elsevier
Upper bound feasibility

Upper-bound feasibility
tests:
© 2006 Elsevier
Lower bound feasibility test

Lower bound:
© 2006 Elsevier
Feasibility factor


Feasibility factor lP:
Use feasibility factor to prune the search space and
as an optimization objective.
© 2006 Elsevier
Genetic algorithms

Modeled as:



Genes = strings of symbols.
Mutations = changes to strings.
Types of moves:



Reproduction makes a copy of a string.
Mutation changes a string.
Crossover interchanges parts of two strings.
© 2006 Elsevier
MOGAC


Technology tables characterize hardware
components.
Genetic model:




Processing element allocation string lists all PEs
and types.
Task allocation string shows assignment of tasks
to PEs.
Link allocation task maps communication to links.
IC allocation string maps tasks to chips.
© 2006 Elsevier
MOGAC optimization procedure





Forms initial solution.
Repeats
evolve/evaluate cycle.
Evaluation determines
noninferior solutions.
Some noninferior
solutions may not
survive evolution.
Clusters solutions to
reduce run time.
© 2006 Elsevier
[Dic98] © 1998 IEEE
MOGAC constraints



nis(x): noninferior solutions in x.
dom(a,b) = 1 if a is not dominated by b.
Cluster rank:
© 2006 Elsevier
Energy-aware task scheduling


Yang et al. schedule multiprocessors for
energy.
Combine design-time and runtime:


At design time, scheduler evaluates
scheduling/allocation choices; optimizes with
genetic algorithms; generates table.
At run time, heuristics use the table to choose
best scheduling/allocation pattern.
© 2006 Elsevier
Co-synthesis for wireless


Wireless systems are bandwidth and energy
limited.
COWLS uses parallel recombinative
simulated annealing.


Ranked by communication time, computation
time, utilization.
Scheduling influences both power
consumption and timing.

Slack determines idle time.
© 2006 Elsevier
Control and I/O synthesis




Control finite-state machine (CFSM) model
describges control-dominated systems.
Event-driven model.
Finite, non-zero, unbounded reaction times.
Implementations:



Hardware is logic guarded by latches.
Software is synthesized from s-graph that models control
flow graph.
Can be used as an intermediate representation for
Esterel, etc.
© 2006 Elsevier
Modal process model

Chou et al. use modal models:


I/O behavior depends on current mode and on
inputs.
Abstract control types define control
operations with known properties.
© 2006 Elsevier
Interface synthesis

Chou et al. represent I/O as control flow
graphs.


Generate tasks, allocate I/O ports, split wide-word
operations, use memory mapped I/O where
ncessary, generate I/O sequencer.
Daveau et al. synthesize communication by
allocating operations to units in a library.

Communication unit must provide requred
services, use the right protocol, and run at the
required data rate.
© 2006 Elsevier
Cache modeling for co-synthesis


Cache state affects
task execution time.
Li and Wolf used twostate model for
processes in cache:



One time if in cache.
Another time if not in
cache.
This model is more
abstract than cache line
model.
© 2006 Elsevier
[Li99] © 1999 IEEE
Co-synthesis with caches

System cost:

Hierarchical scheduling
algorithm:



Schedule tasks (>=
process) over hyperperiod.
Refine schedule by moving
processes within a task.
Dynamic urgency models
how process uses cache:
© 2006 Elsevier
Wuytack et al.

1.
2.
3.
4.
5.
6.
Methodology for dynamic memory management:
Define application using abstract data types.
Refine ADTs into concrete data structures.
Virtual memory divided among several memory
managers.
Spit virtual memory segments into groups to
parallelize data accesses.
Order background memory accesses to optimize
bandwidth.
Allocate physical memories.
© 2006 Elsevier
Co-synthesis for reconfigurable systems


FPGA fabric can hold
different accelerators at
different times.
Combinations of
accelerators may be
limited.


Must take floorplan into
account.
Schedule must take
reconfiguration time,
energy into account.
© 2006 Elsevier
CORDS

CORDS uses evolutionary algorithms similar
to MOGAC.



Adds reconfiguration delay to costs based on
current schedule state.
Dynamic priority of task depends on slack +
reconfiguration delay.
Increases dynamic priority of tasks with low
reconfiguration time to group together several
reconfigurations and save energy.
© 2006 Elsevier
Nimble




Performs fine-grained
partitioning for instructionlevel parallelism.
Platform described in
architecture description
language.
Program represented as
control flow graph.
Selects interesting parts of
loops by analyzing control
dependence graph.
[Li00] © 2000 IEEE
© 2006 Elsevier
Hardware/software co-simulation



Must connect models with
different models of
computation, different time
scales.
Simulation backplane
manages communication.
Becker et al. used PLI in
Verilog-XL to add C code
that communicates with
software models, UNIX
networking to connect
hardware simulator.
© 2006 Elsevier
Mentor Graphics Seamless




Hardware modules described using standard
HDLs.
Software can be loaded as C or binary.
Bus interface module connects hardware
models to processor instruction set simulator.
Coherent memory server manages shared
memory.
© 2006 Elsevier
Download