Computers as Components 2 nd ed.

advertisement
Introduction
What are embedded computing systems?
Challenges in embedded computing
system design.
Design methodologies.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Definition
Embedded computing system: any device
that includes a programmable computer
but is not itself a general-purpose
computer.
Take advantage of application
characteristics to optimize the design:
don’t need all the general-purpose bells and
whistles.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Embedding a computer
CPU
embedded
computer
© 2008 Wayne Wolf
output
analog
input
analog
mem
Overheads for Computers as
Components
Examples
Cell phone.
Printer.
Automobile: engine, brakes, dash, etc.
Airplane: engine, flight controls,
nav/comm.
Digital television.
Household appliances.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Early history
Late 1940’s: MIT Whirlwind computer was
designed for real-time operations.
Originally designed to control an aircraft
simulator.
First microprocessor was Intel 4004 in
early 1970’s.
HP-35 calculator used several chips to
implement a microprocessor in 1972.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Early history, cont’d.
Automobiles used microprocessor-based
engine controllers starting in 1970’s.
Control fuel/air mixture, engine timing, etc.
Multiple modes of operation: warm-up,
cruise, hill climbing, etc.
Provides lower emissions, better fuel
efficiency.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Microprocessor varieties
Microcontroller: includes I/O devices, onboard memory.
Digital signal processor (DSP):
microprocessor optimized for digital signal
processing.
Typical embedded word sizes: 8-bit, 16bit, 32-bit.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Application examples
Simple control: front panel of microwave
oven, etc.
Canon EOS 3 has three microprocessors.
32-bit RISC CPU runs autofocus and eye
control systems.
Digital TV: programmable CPUs +
hardwired logic for video/audio decode,
menus, etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Automotive embedded
systems
Today’s high-end automobile may have
100 microprocessors:
4-bit microcontroller checks seat belt;
microcontrollers run dashboard devices;
16/32-bit microprocessor controls engine.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
BMW 850i brake and
stability control system
Anti-lock brake system (ABS): pumps
brakes to reduce skidding.
Automatic stability control (ASC+T):
controls engine to improve stability.
ABS and ASC+T communicate.
ABS was introduced first---needed to
interface to existing ABS module.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
BMW 850i, cont’d.
sensor
sensor
brake
brake
ABS
hydraulic
pump
brake
brake
sensor
sensor
© 2008 Wayne Wolf
Overheads for Computers as
Components
Characteristics of
embedded systems
Sophisticated functionality.
Real-time operation.
Low manufacturing cost.
Low power.
Designed to tight deadlines by small
teams.
© 2008 Wayne Wolf
Overheads for Computers as
Components
Functional complexity
Often have to run sophisticated
algorithms or multiple algorithms.
Cell phone, laser printer.
Often provide sophisticated user
interfaces.
© 2008 Wayne Wolf
Overheads for Computers as
Components
Real-time operation
Must finish operations by deadlines.
Hard real time: missing deadline causes
failure.
Soft real time: missing deadline results in
degraded performance.
Many systems are multi-rate: must handle
operations at widely varying rates.
© 2008 Wayne Wolf
Overheads for Computers as
Components
Non-functional
requirements
Many embedded systems are massmarket items that must have low
manufacturing costs.
Limited memory, microprocessor power, etc.
Power consumption is critical in batterypowered devices.
Excessive power consumption increases
system cost even in wall-powered devices.
© 2008 Wayne Wolf
Overheads for Computers as
Components
Design teams
Often designed by a small team of
designers.
Often must meet tight deadlines.
6 month market window is common.
Can’t miss back-to-school window for
calculator.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Why use microprocessors?
Alternatives: field-programmable gate
arrays (FPGAs), custom logic, etc.
Microprocessors are often very efficient:
can use same logic to perform many
different functions.
Microprocessors simplify the design of
families of products.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
The performance paradox
Microprocessors use much more logic to
implement a function than does custom
logic.
But microprocessors are often at least as
fast:
heavily pipelined;
large design teams;
aggressive VLSI technology.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Power
Custom logic uses less power, but CPUs have
advantages:
Modern microprocessors offer features to
help control power consumption.
Software design techniques can help reduce
power consumption.
Heterogeneous systems: some custom logic for
well-defined functions, CPUs+software for
everything else.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Platforms
Embedded computing platform: hardware
architecture + associated software.
Many platforms are multiprocessors.
Examples:
Single-chip multiprocessors for cell phone
baseband.
Automotive network + processors.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
The physics of software
Computing is a physical act.
Software doesn’t do anything without
hardware.
Executing software consumes energy,
requires time.
To understand the dynamics of software
(time, energy), we need to characterize
the platform on which the software runs.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
What does “performance”
mean?
In general-purpose computing,
performance often means average-case,
may not be well-defined.
In real-time systems, performance means
meeting deadlines.
Missing the deadline by even a little is bad.
Finishing ahead of the deadline may not
help.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Characterizing
performance
We need to analyze the system at several
levels of abstraction to understand
performance:
CPU.
Platform.
Program.
Task.
Multiprocessor.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Challenges in embedded
system design
How much hardware do we need?
How big is the CPU? Memory?
How do we meet our deadlines?
Faster hardware or cleverer software?
How do we minimize power?
Turn off unnecessary logic? Reduce memory
accesses?
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Challenges, etc.
Does it really work?
Is the specification correct?
Does the implementation meet the spec?
How do we test for real-time characteristics?
How do we test on real data?
How do we work on the system?
Observability, controllability?
What is our development platform?
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Design methodologies
A procedure for designing a system.
Understanding your methodology helps
you ensure you didn’t skip anything.
Compilers, software engineering tools,
computer-aided design (CAD) tools, etc.,
can be used to:
help automate methodology steps;
keep track of the methodology itself.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Design goals
Performance.
Overall speed, deadlines.
Functionality and user interface.
Manufacturing cost.
Power consumption.
Other requirements (physical size, etc.)
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Levels of abstraction
requirements
specification
architecture
component
design
system
integration
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Top-down vs. bottom-up
Top-down design:
start from most abstract description;
work to most detailed.
Bottom-up design:
work from small components to big system.
Real design uses both techniques.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Stepwise refinement
At each level of abstraction, we must:
analyze the design to determine
characteristics of the current state of the
design;
refine the design to add detail.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Requirements
Plain language description of what the
user wants and expects to get.
May be developed in several ways:
talking directly to customers;
talking to marketing representatives;
providing prototypes to users for comment.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Functional vs. nonfunctional requirements
Functional requirements:
output as a function of input.
Non-functional requirements:
time required to compute output;
size, weight, etc.;
power consumption;
reliability;
etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Our requirements form
name
purpose
inputs
outputs
functions
performance
manufacturing cost
power
physical size/weight
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Example: GPS moving map
requirements
Moving map
obtains position
from GPS, paints
map from local
database.
Scotch Road
I-78
lat: 40 13 lon: 32 19
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
GPS moving map needs
Functionality: For automotive use. Show major
roads and landmarks.
User interface: At least 400 x 600 pixel screen.
Three buttons max. Pop-up menu.
Performance: Map should scroll smoothly. No
more than 1 sec power-up. Lock onto GPS
within 15 seconds.
Cost: $120 street price = approx. $30 cost of
goods sold.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
GPS moving map needs,
cont’d.
Physical size/weight: Should fit in hand.
Power consumption: Should run for 8
hours on four AA batteries.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
GPS moving map
requirements form
name
purpose
inputs
outputs
functions
performance
manufacturing cost
power
physical size/weight
© 2008 Wayne Wolf
GPS moving map
consumer-grade
moving map for driving
power button, two
control buttons
back-lit LCD 400 X 600
5-receiver GPS; three
resolutions; displays
current lat/lon
updates screen within
0.25 sec of movement
$100 cost-of-goodssold
100 mW
no more than 2: X 6:,
12 oz.
Overheads for Computers as
Components, 2nd ed.
Specification
A more precise description of the system:
should not imply a particular architecture;
provides input to the architecture design
process.
May include functional and non-functional
elements.
May be executable or may be in
mathematical form for proofs.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
GPS specification
Should include:
What is received from GPS;
map data;
user interface;
operations required to satisfy user requests;
background operations needed to keep the
system running.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Architecture design
What major components go satisfying the
specification?
Hardware components:
CPUs, peripherals, etc.
Software components:
major programs and their operations.
Must take into account functional and
non-functional specifications.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
GPS moving map block
diagram
GPS
receiver
search
engine
database
© 2008 Wayne Wolf
renderer
user
interface
Overheads for Computers as
Components, 2nd ed.
display
GPS moving map hardware
architecture
display
frame
buffer
CPU
GPS
receiver
memory
© 2008 Wayne Wolf
panel I/O
Overheads for Computers as
Components, 2nd ed.
GPS moving map software
architecture
position
© 2008 Wayne Wolf
database
search
renderer
user
interface
timer
Overheads for Computers as
Components, 2nd ed.
pixels
Designing hardware and
software components
Must spend time architecting the system
before you start coding.
Some components are ready-made, some
can be modified from existing designs,
others must be designed from scratch.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
System integration
Put together the components.
Many bugs appear only at this stage.
Have a plan for integrating components to
uncover bugs quickly, test as much
functionality as early as possible.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Summary
Embedded computers are all around us.
Many systems have complex embedded
hardware and software.
Embedded systems pose many design
challenges: design time, deadlines, power,
etc.
Design methodologies help us manage
the design process.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Introduction
Object-oriented design.
Unified Modeling Language (UML).
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
System modeling
Need languages to describe systems:
useful across several levels of abstraction;
understandable within and between
organizations.
Block diagrams are a start, but don’t
cover everything.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Object-oriented design
Object-oriented (OO) design: A
generalization of object-oriented
programming.
Object = state + methods.
State provides each object with its own
identity.
Methods provide an abstract interface to the
object.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Objects and classes
Class: object type.
Class defines the object’s state elements
but state values may change over time.
Class defines the methods used to
interact with all objects of that type.
Each object has its own state.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
OO design principles
Some objects will closely correspond to
real-world objects.
Some objects may be useful only for
description or implementation.
Objects provide interfaces to read/write
state, hiding the object’s implementation
from the rest of the system.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
UML
Developed by Booch et al.
Goals:
object-oriented;
visual;
useful at many levels of abstraction;
usable for all aspects of design.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
UML object
object name
class name
d1: Display
pixels is a
2-D array
pixels: array[] of pixels
elements
menu_items
comment
attributes
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
UML class
Display
class name
pixels
elements
menu_items
mouse_click()
draw_box
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
operations
The class interface
The operations provide the abstract
interface between the class’s
implementation and other classes.
Operations may have arguments, return
values.
An operation can examine and/or modify
the object’s state.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Choose your interface
properly
If the interface is too small/specialized:
object is hard to use for even one application;
even harder to reuse.
If the interface is too large:
class becomes too cumbersome for designers to
understand;
implementation may be too slow;
spec and implementation are probably buggy.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Relationships between
objects and classes
Association: objects communicate but one
does not own the other.
Aggregation: a complex object is made of
several smaller objects.
Composition: aggregation in which owner
does not allow access to its components.
Generalization: define one class in terms
of another.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Class derivation
May want to define one class in terms of
another.
Derived class inherits attributes, operations
of base class.
Derived_class
UML
generalization
Base_class
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Class derivation example
Display
base
class
pixels
elements
menu_items
derived class
BW_display
© 2008 Wayne Wolf
pixel()
set_pixel()
mouse_click()
draw_box
Color_map_display
Overheads for Computers as
Components, 2nd ed.
Multiple inheritance
base classes
Speaker
Display
Multimedia_display
derived class
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Links and associations
Link: describes relationships between
objects.
Association: describes relationship
between classes.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Link example
Link defines the contains relationship:
message
msg = msg1
length = 1102
message set
count = 2
message
msg = msg2
length = 2114
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Association example
# containing message sets
# contained messages
message
0..*
msg: ADPCM_stream
length : integer
© 2008 Wayne Wolf
1
contains
Overheads for Computers as
Components, 2nd ed.
message set
count : integer
Stereotypes
Stereotype: recurring combination of
elements in an object or class.
Example:
<<foo>>
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Behavioral description
Several ways to describe behavior:
internal view;
external view.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
State machines
transition
a
state
© 2008 Wayne Wolf
b
state name
Overheads for Computers as
Components, 2nd ed.
Event-driven state
machines
Behavioral descriptions are written as
event-driven state machines.
Machine changes state when receiving an
input.
An event may come from inside or outside
of the system.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Types of events
Signal: asynchronous event.
Call: synchronized communication.
Timer: activated by time.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Signal event
<<signal>>
mouse_click
leftorright: button
x, y: position
a
mouse_click(x,y,button)
b
declaration
event description
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Call event
draw_box(10,5,3,2,blue)
c
© 2008 Wayne Wolf
d
Overheads for Computers as
Components, 2nd ed.
Timer event
tm(time-value)
e
© 2008 Wayne Wolf
f
Overheads for Computers as
Components, 2nd ed.
Example state machine
start
input/output
mouse_click(x,y,button)/
find_region(region)
region
found
region = drawing/
find_object(objid)
got menu
item
call_menu(I)
called
menu item
highlight(objid)
found
object
© 2008 Wayne Wolf
region = menu/
which_menu(i)
object
highlighted
Overheads for Computers as
Components, 2nd ed.
finish
Sequence diagram
Shows sequence of operations over time.
Relates behaviors of multiple objects.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Sequence diagram
example
m: Mouse
d1: Display
mouse_click(x,y,button)
which_menu(x,y,i)
time
call_menu(i)
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
u: Menu
Summary
Object-oriented design helps us organize
a design.
UML is a transportable system design
language.
Provides structural and behavioral description
primitives.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Introduction
Example: model train controller.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components 2nd ed.
Purposes of example
Follow a design through several levels of
abstraction.
Gain experience with UML.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Model train setup
rcvr
motor
power
supply
console
ECC
© 2008 Wayne Wolf
command
address
Overheads for Computers as
Components 2nd ed.
header
Requirements
Console can control 8 trains on 1 track.
Throttle has at least 63 levels.
Inertia control adjusts responsiveness
with at least 8 levels.
Emergency stop button.
Error detection scheme on messages.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Requirements form
name
purpose
inputs
model train controller
control speed of <= 8 model trains
throttle, inertia, emergency stop,
train #
outputs
train control signals
functions
set engine speed w. inertia;
emergency stop
performance
can update train speed at least 10
times/sec
manufacturing cost $50
power
wall powered
physical
console comfortable for 2 hands; < 2
size/weight
lbs.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Digital Command Control
DCC created by model railroad hobbyists,
picked up by industry.
Defines way in which model trains,
controllers communicate.
Leaves many system design aspects open,
allowing competition.
This is a simple example of a big trend:
Cell phones, digital TV rely on standards.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
DCC documents
Standard S-9.1, DCC Electrical Standard.
Defines how bits are encoded on the rails.
Standard S-9.2, DCC Communication
Standard.
Defines packet format and semantics.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
DCC electrical standard
Voltage moves
around the power
supply voltage; adds
no DC component.
1 is 58 ms, 0 is at
least 100 ms.
logic 1
time
58 ms
© 2008 Wayne Wolf
logic 0
Overheads for Computers as
Components 2nd ed.
>= 100 ms
DCC communication
standard
Basic packet format: PSA(sD)+E.
P: preamble = 1111111111.
S: packet start bit = 0.
A: address data byte.
s: data byte start bit.
D: data byte (data payload).
E: packet end bit = 1.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
DCC packet types
Baseline packet: minimum packet that
must be accepted by all DCC
implementations.
Address data byte gives receiver address.
Instruction data byte gives basic instruction.
Error correction data byte gives ECC.
© 2008 Wayne Wolf
Overheads for Computers as
Components, 2nd ed.
Conceptual specification
Before we create a detailed specification,
we will make an initial, simplified
specification.
Gives us practice in specification and UML.
Good idea in general to identify potential
problems before investing too much effort in
detail.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Basic system commands
command name
parameters
set-speed
speed
(positive/negative)
inertia-value (nonnegative)
none
set-inertia
estop
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Typical control sequence
:console
set-inertia
set-speed
set-speed
estop
set-speed
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
:train_rcvr
Message classes
command
set-speed
set-inertia
value: integer
value: unsignedinteger
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
estop
Roles of message classes
Implemented message classes derived
from message class.
Attributes and operations will be filled in for
detailed specification.
Implemented message classes specify
message type by their class.
May have to add type as parameter to data
structure in implementation.
© 2008 Wayne Wolf
Overheads for Computers as
Components
Subsystem collaboration
diagram
Shows relationship between console and
receiver (ignores role of track):
1..n: command
:console
© 2008 Wayne Wolf
:receiver
Overheads for Computers as
Components 2nd ed.
System structure modeling
Some classes define non-computer
components.
Denote by *name.
Choose important systems at this point to
show basic relationships.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Major subsystem roles
Console:
read state of front panel;
format messages;
transmit messages.
Train:
receive message;
interpret message;
control the train.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Console system classes
1
1
panel
console
1 1
formatter
1 1
receiver*
© 2008 Wayne Wolf
1
1
transmitter
1 1
sender*
Overheads for Computers as
Components 2nd ed.
Console class roles
panel: describes analog knobs and
interface hardware.
formatter: turns knob settings into bit
streams.
transmitter: sends data on track.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Train system classes
train set
1
receiver
1 1
detector*
© 2008 Wayne Wolf
1
1 1..t
train
1
1 1
controller
Overheads for Computers as
Components 2nd ed.
1 motor
interface
1 1
pulser*
Train class roles
receiver: digitizes signal from track.
controller: interprets received commands
and makes control decisions.
motor interface: generates signals
required by motor.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Detailed specification
We can now fill in the details of the
conceptual specification:
more classes;
behaviors.
Sketching out the spec first helps us
understand the basic relationships in the
system.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Train speed control
Motor controlled by pulse width
modulation:
duty
cycle
© 2008 Wayne Wolf
+
V
-
Overheads for Computers as
Components 2nd ed.
Console physical object
classes
knobs*
train-knob: integer
speed-knob: integer
inertia-knob: unsignedinteger
emergency-stop: boolean
© 2008 Wayne Wolf
pulser*
pulse-width: unsignedinteger
direction: boolean
sender*
detector*
send-bit()
read-bit() : integer
Overheads for Computers as
Components 2nd ed.
Panel and motor interface
classes
panel
motor-interface
train-number() : integer
speed() : integer
inertia() : integer
estop() : boolean
new-settings()
© 2008 Wayne Wolf
speed: integer
Overheads for Computers as
Components 2nd ed.
Class descriptions
panel class defines the controls.
new-settings() behavior reads the controls.
motor-interface class defines the motor
speed held as state.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Transmitter and receiver
classes
transmitter
receiver
send-speed(adrs: integer,
speed: integer)
send-inertia(adrs: integer,
val: integer)
set-estop(adrs: integer)
© 2008 Wayne Wolf
current: command
new: boolean
read-cmd()
new-cmd() : boolean
rcv-type(msg-type:
command)
rcv-speed(val: integer)
rcv-inertia(val:integer)
Overheads for Computers as
Components 2nd ed.
Class descriptions
transmitter class has one behavior for
each type of message sent.
receiver function provides methods to:
detect a new message;
determine its type;
read its parameters (estop has no
parameters).
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Formatter class
formatter
current-train: integer
current-speed[ntrains]: integer
current-inertia[ntrains]:
unsigned-integer
current-estop[ntrains]: boolean
send-command()
panel-active() : boolean
operate()
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Formatter class
description
Formatter class holds state for each train,
setting for current train.
The operate() operation performs the
basic formatting task.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Control input cases
Use a soft panel to show current panel
settings for each train.
Changing train number:
must change soft panel settings to reflect
current train’s speed, etc.
Controlling throttle/inertia/estop:
read panel, check for changes, perform
command.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
change in change in speed/
train number inertia/estop
Control input sequence
diagram
:knobs
:panel
:formatter
:transmitter
change in
read panel
panel-active
control
panel settings
settings
send-command
read panel
send-speed,
send-inertia.
panel settings
send-estop
read panel
change in
panel settings
train
number
new-settings
set-knobs
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Formatter operate
behavior
update-panel()
panel-active()
new train number
idle
other
© 2008 Wayne Wolf
send-command()
Overheads for Computers as
Components 2nd ed.
Panel-active behavior
T
panel*:read-train()
F
T
panel*:read-speed()
current-train = train-knob
update-screen
changed = true
current-speed = throttle
changed = true
F
...
© 2008 Wayne Wolf
...
Overheads for Computers as
Components 2nd ed.
Controller class
controller
current-train: integer
current-speed[ntrains]: integer
current-direction[ntrains]: boolean
current-inertia[ntrains]:
unsigned-integer
operate()
issue-command()
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Setting the speed
Don’t want to change speed
instantaneously.
Controller should change speed gradually
by sending several commands.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Sequence diagram for setspeed command
:receiver
:controller
new-cmd
cmd-type
rcv-speed
:motor-interface
set-speed
set-pulse
set-pulse
set-pulse
set-pulse
set-pulse
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
:pulser*
Controller operate
behavior
wait for a
command
from receiver
receive-command()
issue-command()
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Refined command classes
command
type: 3-bits
address: 3-bits
parity: 1-bit
set-speed
set-inertia
estop
type=010
value: 7-bits
type=001
value: 3-bits
type=000
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Summary
Separate specification and programming.
Small mistakes are easier to fix in the spec.
Big mistakes in programming cost a lot of
time.
You can’t completely separate
specification and architecture.
Make a few tasteful assumptions.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Instruction sets
Computer architecture taxonomy.
Assembly language.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
von Neumann architecture
Memory holds data, instructions.
Central processing unit (CPU) fetches
instructions from memory.
Separate CPU and memory distinguishes
programmable computer.
CPU registers help out: program counter
(PC), instruction register (IR), generalpurpose registers, etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CPU + memory
address
memory
200
PC
data
CPU
200
ADD r5,r1,r3
© 2008 Wayne Wolf
ADD IR
r5,r1,r3
Overheads for Computers as
Components 2nd ed.
Harvard architecture
address
data memory
data
address
program memory
© 2008 Wayne Wolf
data
Overheads for Computers as
Components 2nd ed.
PC
CPU
von Neumann vs. Harvard
Harvard can’t use self-modifying code.
Harvard allows two simultaneous memory
fetches.
Most DSPs use Harvard architecture for
streaming data:
greater memory bandwidth;
more predictable bandwidth.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
RISC vs. CISC
Complex instruction set computer (CISC):
many addressing modes;
many operations.
Reduced instruction set computer (RISC):
load/store;
pipelinable instructions.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Instruction set
characteristics
Fixed vs. variable length.
Addressing modes.
Number of operands.
Types of operands.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Programming model
Programming model: registers visible to
the programmer.
Some registers are not visible (IR).
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Multiple implementations
Successful architectures have several
implementations:
varying clock speeds;
different bus widths;
different cache sizes;
etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Assembly language
One-to-one with instructions (more or
less).
Basic features:
One instruction per line.
Labels provide names for addresses (usually
in first column).
Instructions often start in later columns.
Columns run to end of line.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM assembly language
example
label1 ADR
LDR
ADR
LDR
SUB
© 2008 Wayne Wolf
r4,c
r0,[r4] ; a comment
r4,d
r1,[r4]
r0,r0,r1 ; comment
Overheads for Computers as
Components 2nd ed.
Pseudo-ops
Some assembler directives don’t
correspond directly to instructions:
Define current address.
Reserve storage.
Constants.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM instruction set
ARM
ARM
ARM
ARM
ARM
ARM
versions.
assembly language.
programming model.
memory organization.
data operations.
flow of control.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM versions
ARM architecture has been extended over
several versions.
We will concentrate on ARM7.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM assembly language
Fairly standard assembly language:
label
© 2008 Wayne Wolf
LDR r0,[r8] ; a comment
ADD r4,r0,r1
Overheads for Computers as
Components 2nd ed.
ARM programming model
r0
r1
r2
r3
r4
r5
r6
r7
© 2008 Wayne Wolf
r8
r9
r10
r11
r12
r13
r14
r15 (PC)
Overheads for Computers as
Components 2nd ed.
0
31
CPSR
NZCV
Endianness
Relationship between bit and byte/word
ordering defines endianness:
bit 31
bit 0
byte 3 byte 2 byte 1 byte 0
bit 0
byte 0 byte 1 byte 2 byte 3
little-endian
© 2008 Wayne Wolf
bit 31
big-endian
Overheads for Computers as
Components 2nd ed.
ARM data types
Word is 32 bits long.
Word can be divided into four 8-bit bytes.
ARM addresses cam be 32 bits long.
Address refers to byte.
Address 4 starts at byte 4.
Can be configured at power-up as either
little- or bit-endian mode.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM status bits
Every arithmetic, logical, or shifting
operation sets CPSR bits:
N (negative), Z (zero), C (carry), V
(overflow).
Examples:
-1 + 1 = 0: NZCV = 0110.
231-1+1 = -231: NZCV = 0101.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM data instructions
Basic format:
ADD r0,r1,r2
Computes r1+r2, stores in r0.
Immediate operand:
ADD r0,r1,#2
Computes r1+2, stores in r0.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM data instructions
ADD, ADC : add (w.
carry)
SUB, SBC : subtract
(w. carry)
RSB, RSC : reverse
subtract (w. carry)
MUL, MLA : multiply
(and accumulate)
© 2008 Wayne Wolf
AND, ORR, EOR
BIC : bit clear
LSL, LSR : logical shift
left/right
ASL, ASR : arithmetic
shift left/right
ROR : rotate right
RRX : rotate right
extended with C
Overheads for Computers as
Components 2nd ed.
Data operation varieties
Logical shift:
fills with zeroes.
Arithmetic shift:
fills with ones.
RRX performs 33-bit rotate, including C
bit from CPSR above sign bit.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM comparison
instructions
CMP : compare
CMN : negated compare
TST : bit-wise test
TEQ : bit-wise negated test
These instructions set only the NZCV bits
of CPSR.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM move instructions
MOV, MVN : move (negated)
MOV r0, r1 ; sets r0 to r1
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM load/store
instructions
LDR, LDRH, LDRB : load (half-word, byte)
STR, STRH, STRB : store (half-word,
byte)
Addressing modes:
register indirect : LDR r0,[r1]
with second register : LDR r0,[r1,-r2]
with constant : LDR r0,[r1,#4]
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM ADR pseudo-op
Cannot refer to an address directly in an
instruction.
Generate value by performing arithmetic
on PC.
ADR pseudo-op generates instruction
required to calculate address:
ADR r1,FOO
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: C assignments
C:
x = (a + b) - c;
Assembler:
ADR
LDR
ADR
LDR
ADD
ADR
LDR
v
r4,a
r0,[r4]
r4,b
r1,[r4]
r3,r0,r1
r4,c
r2[r4]
;
;
;
;
;
;
;
get address for a
get value of a
get address for b, reusing r4
get value of b
compute a+b
get address for c
get value of c
Overheads for Computers as
Components 2nd ed.
C assignment, cont’d.
SUB r3,r3,r2
ADR r4,x
STR r3[r4]
© 2008 Wayne Wolf
; complete computation of x
; get address for x
; store value of x
Overheads for Computers as
Components 2nd ed.
Example: C assignment
C:
y = a*(b+c);
Assembler:
ADR
LDR
ADR
LDR
ADD
ADR
LDR
r4,b ; get address for b
r0,[r4] ; get value of b
r4,c ; get address for c
r1,[r4] ; get value of c
r2,r0,r1 ; compute partial result
r4,a ; get address for a
r0,[r4] ; get value of a
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C assignment, cont’d.
MUL r2,r2,r0 ; compute final value for y
ADR r4,y ; get address for y
STR r2,[r4] ; store y
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: C assignment
C:
z = (a << 2) |
(b & 15);
Assembler:
ADR
LDR
MOV
ADR
LDR
AND
ORR
r4,a ; get address for a
r0,[r4] ; get value of a
r0,r0,LSL 2 ; perform shift
r4,b ; get address for b
r1,[r4] ; get value of b
r1,r1,#15 ; perform AND
r1,r0,r1 ; perform OR
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C assignment, cont’d.
ADR r4,z ; get address for z
STR r1,[r4] ; store value for z
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Additional addressing
modes
Base-plus-offset addressing:
LDR r0,[r1,#16]
Loads from location r1+16
Auto-indexing increments base register:
LDR r0,[r1,#16]!
Post-indexing fetches, then does offset:
LDR r0,[r1],#16
Loads r0 from r1, then adds 16 to r1.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM flow of control
All operations can be performed
conditionally, testing CPSR:
EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE,
LT, GT, LE
Branch operation:
B #100
Can be performed conditionally.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: if statement
C:
if (a > b) { x = 5; y = c + d; } else x = c - d;
Assembler:
; compute and test condition
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
BGE fblock ; if a >= b, branch to false block
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
If statement, cont’d.
; true block
MOV r0,#5 ; generate value for x
ADR r4,x ; get address for x
STR r0,[r4] ; store x
ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value of d
ADD r0,r0,r1 ; compute y
ADR r4,y ; get address for y
STR r0,[r4] ; store y
B after ; branch around false block
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
If statement, cont’d.
; false block
fblock ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute a-b
ADR r4,x ; get address for x
STR r0,[r4] ; store value of x
after ...
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: Conditional
instruction implementation
; true block
MOVLT r0,#5 ; generate value
ADRLT r4,x ; get address for
STRLT r0,[r4] ; store x
ADRLT r4,c ; get address for
LDRLT r0,[r4] ; get value of
ADRLT r4,d ; get address for
LDRLT r1,[r4] ; get value of
ADDLT r0,r0,r1 ; compute y
ADRLT r4,y ; get address for
STRLT r0,[r4] ; store y
© 2008 Wayne Wolf
for x
x
c
c
d
d
y
Overheads for Computers as
Components 2nd ed.
Example: switch
statement
C:
switch (test) { case 0: … break; case 1: … }
Assembler:
ADR r2,test ; get address for test
LDR r0,[r2] ; load value for test
ADR r1,switchtab ; load address for switch table
LDR r1,[r1,r0,LSL #2] ; index switch table
switchtab DCD case0
DCD case1
...
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: FIR filter
C:
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];
Assembler
; loop initiation code
MOV r0,#0 ; use r0 for I
MOV r8,#0 ; use separate index for arrays
ADR r2,N ; get address for N
LDR r1,[r2] ; get value of N
MOV r2,#0 ; use r2 for f
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
FIR filter, cont’.d
ADR r3,c ; load r3 with base of c
ADR r5,x ; load r5 with base of x
; loop body
loop LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sum
ADD r8,r8,#4 ; add one word offset to array index
ADD r0,r0,#1 ; add 1 to i
CMP r0,r1 ; exit?
BLT loop ; if i < N, continue
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM subroutine linkage
Branch and link instruction:
BL foo
Copies current PC to r14.
To return from subroutine:
MOV r15,r14
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Nested subroutine calls
Nesting/recursion requires coding
convention:
f1
LDR r0,[r13] ; load arg into r0 from stack
; call f2()
STR r13!,[r14] ; store f1’s return adrs
STR r13!,[r0] ; store arg to f2 on stack
BL f2 ; branch and link to f2
; return from f1()
SUB r13,#4 ; pop f2’s arg off stack
LDR r13!,r15 ; restore register and return
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Summary
Load/store architecture
Most instructions are RISCy, operate in
single cycle.
Some multi-register operations take longer.
All instructions can be executed
conditionally.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
TI C55x instruction set
C55x
C55x
C55x
C55x
C55x
programming model.
assembly language.
memory organization.
data operations.
flow of control.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
TI C55x overview
Accumulator architecture:
acc = operand op acc.
Very useful in loops for DSP.
C55x assembly language:
Label:
MPY *AR0, *CDP+, AC0
MOV #1, T0
C55x algebraic assembly language:
AC1 = AR0 * coef(*CDP)
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Intrinsic functions
Compiler support for assembly language.
Intrinsic function maps directly onto an
instruction.
Example:
int_sadd(arg1,arg2)
Performs saturation arithmetic addition.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x data types
Data types:
Word: 16 bits.
Longword: 32 bits.
Instructions are byte-addressable.
Some instructions operate on register bits.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x registers
Terminology:
Register: any type of register.
Accumulator: acc = operand op ac.
Most registers are memory-mapped.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x program counter and
control flow registers
PC is program counter.
XPC is program counter extension.
RETA is subroutine return address.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x accumulators and
status registers
Four 40-bit accumulators: AC0, AC1, AC2,
and AC3.
Low-order bits 0-15 are AC0L, etc.
High-order bits 16-31 are AC0H, etc.
Guard bits 32-39 are AC0G, etc.
ST0, ST1, PMST, ST0_55, ST1_55,
ST2_55, ST3_55 provide arithmetic/bit
manipulation flags, etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x stack pointers
SP keeps track of user stack.
SSP holds system stack pointer.
SPH is extended data page pointer for
both SP and SSP.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x auxiliary registers
and circular buffer pointers
AR0-AR7 are auxiliary instructions.
CDP points to coefficients for polynomial
evaluation instructions. CDPH is main data
page pointer.
BK47 is used for circular buffer operations
along with AR4-7.
BK03 addresses circular buffers.
BKC is size register for CDP.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x block repeat
registers
BRC0 counts block repeat instructions.
RSA0L and REA0L keep track of start and
end points of blocks.
BRC1 and BRS1 are used to repeat blocks
of instructions.
RSA0 and RSA1 are the start address
registers for block repeats.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x addressing mode and
interrupt registers
DP and DPH set base address for data
access.
PDP determines base address for I/O.
IER0 and IER1 are interrupt mask
registers.
IFR0 and IFR1 keep track of currently
pending registers.
DBIER0 and DBIER1 are for debugging.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x memory map
24-bit address space,
16 MB of memory.
Data, program, I/O all
mapped to same
physical memory.
Addressability:
memory
mappedpage
registers
main data
0
main data page 1
main data page 2
…
Program space
main data page 127
address is 24 bits.
Data space is 23 bits.
I/O address is 16
bits.
Overheads for Computers as
© 2008 Wayne Wolf
Components 2nd ed.
C55x addressing modes
Three addressing modes:
Absolute addressing supplies an address in
an instruction.
Direct addressing supplies an offset.
Indirect addressing uses a register as a
pointer.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x direct addressing
DP addressing accesses data pages:
ADP = DPH[22:15](DP+Doffset)
SP addressing accesses stack values:
ASP = SPH[22:15](SP+Soffset)
Register-bit direcdt addressing accesses
bits in registers.
PDP addressing accesses data pages:
APDP = PDP[15:6]PDPoffset
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x indirect addressing
AR indirect addressing uses auxiliary register to
point to data.
Dual AR indirect addressing allows two
simultaneous accesses.
CDP indirect addressing uses CDP to access
coefficients.
Coefficient indirect addressing is similar to CDP
indirect but for instructions with 3 memory
operands per cycle.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x stack operations
Two stacks: data and system.
Three different stack configurations:
Dual 16-bit stack with fast return has
independent data and system stacks.
Dual 16-bit stack with slow return,
independent data and system stacks but
RETA and CFCT are not used for slow returns.
32-bit stack with slow return, SP and SSP are
both modified by the same amount.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x data operations
MOV moves data between registers and
memory:
MOV src, dst
Varieties of ADDs:
ADD src,dst
ADD dual(LMEM),ACx,ACy
Multiplication:
MPY src,dst
MAC AC,TX,ACy
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x flow of control
Unconditional branch:
B ACx
B label
Conditional branch:
BCC label, cond
Loops:
Single-instruction repeat
Block repeat
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Efficient loops
General rules:
Don’t use function calls.
Keep loop body small to enable local repeat
(only forward branches).
Use unsigned integer for loop counter.
Use <= to test loop counter.
Make use of compiler---global optimization,
software pipelining.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Single-instruction repeat loop
example
STM #4000h,AR2
; load pointer to source
STM #100h,AR3
; load pointer to destination
RPT #(1024-1)
MVDD *AR2+,*AR3+
; move
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x subroutines
Unconditional subroutine call:
CALL target
Conditional subroutine call:
CALLCC adrs,cond
Two types of return:
Fast return gives return address and loop
context in registers.
Slow return puts return address/loop on
Overheads for Computers as
stack.
© 2008 Wayne Wolf
Components 2nd ed.
C55x interrupts
Handled using subroutine mechanism.
Four step handling process:
Receive interrupt.
Acknowledge interrupt.
Prepare for ISR by finishing current
instruction, retrieving interrupt vector.
Processing the interrupt service routine.
32 possible interrupt vectors, 27 priorities.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CPUs
Input and output.
Supervisor mode, exceptions, traps.
Co-processors.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
I/O devices
CPU
status
reg
data
reg
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
mechanism
Usually includes some non-digital
component.
Typical digital interface to CPU:
Application: 8251 UART
Universal asynchronous receiver
transmitter (UART) : provides serial
communication.
8251 functions are integrated into
standard PC interface chip.
Allows many communication parameters
to be programmed.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Serial communication
Characters are transmitted separately:
no
char
start
bit 0
bit 1
...
bit n-1 stop
time
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Serial communication
parameters
Baud (bit) rate.
Number of bits per character.
Parity/no parity.
Even/odd parity.
Length of stop bit (1, 1.5, 2 bits).
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
8251 CPU interface
CPU
status
(8 bit)
8251
data
(8 bit)
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
xmit/
rcv
serial
port
Programming I/O
Two types of instructions can support I/O:
special-purpose I/O instructions;
memory-mapped load/store instructions.
Intel x86 provides in, out instructions.
Most other CPUs use memory-mapped
I/O.
I/O instructions do not preclude memorymapped I/O.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM memory-mapped I/O
Define location for device:
DEV1 EQU 0x1000
Read/write code:
LDR
LDR
LDR
STR
r1,#DEV1 ; set up device adrs
r0,[r1] ; read DEV1
r0,#8 ; set up value to write
r0,[r1] ; write value to device
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Peek and poke
Traditional HLL interfaces:
int peek(char *location) {
return *location; }
void poke(char *location, char
newval) {
(*location) = newval; }
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Busy/wait output
Simplest way to program device.
Use instructions to test when device is ready.
current_char = mystring;
while (*current_char != ‘\0’) {
poke(OUT_CHAR,*current_char);
while (peek(OUT_STATUS) != 0);
current_char++;
}
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Simultaneous busy/wait
input and output
while (TRUE) {
/* read */
while (peek(IN_STATUS) == 0);
achar = (char)peek(IN_DATA);
/* write */
poke(OUT_DATA,achar);
poke(OUT_STATUS,1);
while (peek(OUT_STATUS) != 0);
}
Overheads for Computers as
© 2008 Wayne Wolf
Components 2nd ed.
Interrupt I/O
Busy/wait is very inefficient.
CPU can’t do other work while testing device.
Hard to do simultaneous I/O.
Interrupts allow a device to change the
flow of control in the CPU.
Causes subroutine call to handle device.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Interrupt interface
intr ack
data/address
© 2008 Wayne Wolf
status
reg
data
reg
Overheads for Computers as
Components 2nd ed.
mechanism
CPU
PC
IR
intr request
Interrupt behavior
Based on subroutine call mechanism.
Interrupt forces next instruction to be a
subroutine call to a predetermined
location.
Return address is saved to resume executing
foreground program.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Interrupt physical
interface
CPU and device are connected by CPU
bus.
CPU and device handshake:
device asserts interrupt request;
CPU asserts interrupt acknowledge when it
can handle the interrupt.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: character I/O
handlers
void input_handler() {
achar = peek(IN_DATA);
gotchar = TRUE;
poke(IN_STATUS,0);
}
void output_handler() {
}
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: interrupt-driven
main program
main() {
while (TRUE) {
if (gotchar) {
poke(OUT_DATA,achar);
poke(OUT_STATUS,1);
gotchar = FALSE;
}
}
}
Overheads for Computers as
© 2008 Wayne Wolf
Components 2nd ed.
Example: interrupt I/O with
buffers
Queue for characters:
a
head tail
© 2008 Wayne Wolf
tail
Overheads for Computers as
Components 2nd ed.
Buffer-based input handler
void input_handler() {
char achar;
if (full_buffer()) error = 1;
else { achar = peek(IN_DATA);
add_char(achar); }
poke(IN_STATUS,0);
if (nchars == 1)
{ poke(OUT_DATA,remove_char();
poke(OUT_STATUS,1); }
}
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
I/O sequence diagram
:foreground
:input
:output
:queue
empty
a
empty
b
bc
c
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Debugging interrupt code
What if you forget to change registers?
Foreground program can exhibit mysterious
bugs.
Bugs will be hard to repeat---depend on
interrupt timing.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Priorities and vectors
Two mechanisms allow us to make
interrupts more specific:
Priorities determine what interrupt gets CPU
first.
Vectors determine what code is called for
each type of interrupt.
Mechanisms are orthogonal: most CPUs
provide both.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Prioritized interrupts
device 1
device 2
interrupt
acknowledge
L1 L2 .. Ln
CPU
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
device n
Interrupt prioritization
Masking: interrupt with priority lower than
current priority is not recognized until
pending interrupt is complete.
Non-maskable interrupt (NMI): highestpriority, never masked.
Often used for power-down.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: Prioritized I/O
:interrupts
:foreground
:A
B
C
A
A,B
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
:B
:C
Interrupt vectors
Allow different devices to be handled by
different code.
Interrupt vector table:
Interrupt
vector
table head
handler 0
handler 1
handler 2
handler 3
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Interrupt vector
acquisition
:CPU
:device
receive
request
receive
ack
receive
vector
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Generic interrupt
mechanism
continue
execution
N
N
ignore
intr?
Y
intr priority >
current
priority?
Y
ack
Y
bus error
Y
timeout?
N
vector?
Y
call table[vector]
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Assume priority selection is
handled before this
point.
Interrupt sequence
CPU acknowledges request.
Device sends vector.
CPU calls handler.
Software processes request.
CPU restores state to foreground
program.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Sources of interrupt
overhead
Handler execution time.
Interrupt mechanism overhead.
Register save/restore.
Pipeline-related penalties.
Cache-related penalties.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM interrupts
ARM7 supports two types of interrupts:
Fast interrupt requests (FIQs).
Interrupt requests (IRQs).
Interrupt table starts at location 0.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM interrupt procedure
CPU actions:
Save PC. Copy CPSR to SPSR.
Force bits in CPSR to record interrupt.
Force PC to vector.
Handler responsibilities:
Restore proper PC.
Restore CPSR from SPSR.
Clear interrupt disable flags.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM interrupt latency
Worst-case latency to respond to interrupt
is 27 cycles:
Two cycles to synchronize external request.
Up to 20 cycles to complete current
instruction.
Three cycles for data abort.
Two cycles to enter interrupt handling state.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x interrupts
 Latency is between 7 and 13 cycles.
 Maskable interrupt sequence:
Interrupt flag register is set.
Interrupt enable register is checked.
Interrupt mask register is checked.
Interrupt flag register is cleared.
Appropriate registers are saved.
INTM set to 1, DBGM set to 1, EALLOW set to 0.
Branch to ISR.
 Two styles of return: fast and slow.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Supervisor mode
May want to provide protective barriers
between programs.
Avoid memory corruption.
Need supervisor mode to manage the
various programs.
C55x does not have a supervisor mode.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM supervisor mode
Use SWI instruction to enter supervisor
mode, similar to subroutine:
SWI CODE_1
Sets PC to 0x08.
Argument to SWI is passed to supervisor
mode code.
Saves CPSR in SPSR.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Exception
Exception: internally detected error.
Exceptions are synchronous with
instructions but unpredictable.
Build exception mechanism on top of
interrupt mechanism.
Exceptions are usually prioritized and
vectorized.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Trap
Trap (software interrupt): an exception
generated by an instruction.
Call supervisor mode.
ARM uses SWI instruction for traps.
SHARC offers three levels of software
interrupts.
Called by setting bits in IRPTL register.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Co-processor
Co-processor: added function unit that is
called by instruction.
Floating-point units are often structured as
co-processors.
ARM allows up to 16 designer-selected coprocessors.
Floating-point co-processor uses units 1, 2.
C55x uses co-processors as well.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x image/video hardware
extensions
Available in 5509 and 5510.
Equivalent C-callable functions for other
devices.
Available extensions:
DCT/IDCT.
Pixel interpolation
Motion estimation.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
DCT/IDCT
2-D DCT/IDCT is
computed from
two 1-D
DCT/IDCT.
Put data in
different banks to
maximize
throughput.
© 2008 Wayne Wolf
block
Column DCT
interim
Overheads for Computers as
Components 2nd ed.
Row
DCT
DCT
C55 DCT/IDCT coprocessor
extensions
Load, compute, transfer to accumulators:
ACy=copr(k8,ACx,Xmem,Ymem)
Compute, transfer, mem write:
ACy=copr(k8,ACx,ACy), Lmem=ACz
Special:
ACy=copr(k8,ACx,ACy)
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Software pipelined
load/compute/store for DCT
Iteration i-1
Iteration i
Iteration i+1
Dual_load
Dual_load
Dual_load
4 empty
4 empty
4 empty
3
Dual_load
3
Dual_load
3
Dual_load
8
compute
8
compute
8
compute
empty
empty
empty
4
Long_store
© 2008 Wayne Wolf
4
4
Long_store
Long_store
Overheads for Computers
as
Components 2nd ed.
op_i(0), load_i+1(0,1)
op_i(1), store_i-1(0,1)
op_i(2), store_i-1(2,3)
op_i(2), store_i-1(4,5)
op_i(2), store_i-1(6,7)
op_i(2), load_i+1(2,3)
…
C55 motion estimation
Search strategy:
Full vs. non-full.
Accuracy:
Full-pixel vs. half-pixel.
Number of returned motion vectors:
1 (one 16x16) vs. 4 (four 8x8).
Algorithms:
3-step algorithm (distance 4,2,1).
4-step algorithm (distance 8,4,2,1).
4-step with half-pixel refinement.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Four-step motion estimation
breakdown
d = {8,4,2,1};
for (i=0; i<4; i++) {
compute 3 upper differences
for d[i];
compute 3 middle differences
for d[i];
compute 3 lower differences
for d[i];
compute minimum value;
move to next d;
}
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
X
C55 motion estimation
accelerator
Includes 3 16-bit pixel data paths, 3 16-bit
absolute differences (ADs).
Basic operation:
[ACx,ACy] = copr(k8,ACx,ACy,Xmem,Ymem,Coeff)
K8 = control bits (enable AD units, etc.)
ACx, ACy = accumulated absolute differences
Xmem, Ymem = pointers to odd, even lines of the
search window
Pointer to two adjacent pixels from reference window
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55 pixel interpolation
Given four pixels A, B, C, D, interpolate
three half-pixels:
A
C
© 2008 Wayne Wolf
U
B
M
R
D
Overheads for Computers as
Components 2nd ed.
Pixel interpolation
coprocessor operations
Load pixels and compute:
ACy=copr(k8,AC,Lmem)
Load pixels, compute, and store:
ACy=copr(k8,AACx,Lmem) || Lmem=ACz
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CPUs
Caches.
Memory management.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Caches and CPUs
CPU
data
© 2008 Wayne Wolf
data
cache
controller
address
cache
address
data
Overheads for Computers as
Components 2nd ed.
main
memory
Cache operation
Many main memory locations are mapped
onto one cache entry.
May have caches for:
instructions;
data;
data + instructions (unified).
Memory access time is no longer
deterministic.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Terms
Cache hit: required location is in cache.
Cache miss: required location is not in
cache.
Working set: set of locations used by
program in a time interval.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Types of misses
Compulsory (cold): location has never
been accessed.
Capacity: working set is too large.
Conflict: multiple locations in working set
map to same cache entry.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Memory system
performance
h = cache hit rate.
tcache = cache access time, tmain = main
memory access time.
Average memory access time:
tav = htcache + (1-h)tmain
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Multiple levels of cache
CPU
© 2008 Wayne Wolf
L1 cache
Overheads for Computers as
Components 2nd ed.
L2 cache
Multi-level cache access
time
h1 = cache hit rate.
h2 = hit rate on L2.
Average memory access time:
tav = h1tL1 + (h2-h1)tL2 + (1- h2-h1)tmain
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Replacement policies
Replacement policy: strategy for choosing
which cache entry to throw out to make
room for a new memory location.
Two popular strategies:
Random.
Least-recently used (LRU).
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Cache organizations
Fully-associative: any memory location
can be stored anywhere in the cache
(almost never implemented).
Direct-mapped: each memory location
maps onto exactly one cache entry.
N-way set-associative: each memory
location can go into one of n sets.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Cache performance
benefits
Keep frequently-accessed locations in fast
cache.
Cache retrieves more than one word at a
time.
Sequential accesses are faster after first
access.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Direct-mapped cache
1
valid
0xabcd
tag
byte byte byte ...
data
cache block
tag
index
offset
=
hit
© 2008 Wayne Wolf
value
byte
Overheads for Computers as
Components 2nd ed.
Write operations
Write-through: immediately copy write to
main memory.
Write-back: write to main memory only
when location is removed from cache.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Direct-mapped cache
locations
Many locations map onto the same cache
block.
Conflict misses are easy to generate:
Array a[] uses locations 0, 1, 2, …
Array b[] uses locations 1024, 1025, 1026, …
Operation a[i] + b[i] generates conflict
misses.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Set-associative cache
A set of direct-mapped caches:
Set 1
Set 2
hit
© 2008 Wayne Wolf
...
data
Overheads for Computers as
Components 2nd ed.
Set n
Example: direct-mapped
vs. set-associative
address
000
001
010
011
100
101
110
111
© 2008 Wayne Wolf
data
0101
1111
0000
0110
1000
0001
1010
0100
Overheads for Computers as
Components 2nd ed.
Direct-mapped cache
behavior
After 001 access:
block
00
01
10
11
© 2008 Wayne Wolf
tag
0
-
data
1111
-
After 010 access:
block
00
01
10
11
Overheads for Computers as
Components 2nd ed.
tag
0
0
-
data
1111
0000
-
Direct-mapped cache
behavior, cont’d.
After 011 access:
block
00
01
10
11
© 2008 Wayne Wolf
tag
0
0
0
data
1111
0000
0110
After 100 access:
block
00
01
10
11
Overheads for Computers as
Components 2nd ed.
tag
1
0
0
0
data
1000
1111
0000
0110
Direct-mapped cache
behavior, cont’d.
After 101 access:
block
00
01
10
11
© 2008 Wayne Wolf
tag
1
1
0
0
data
1000
0001
0000
0110
After 111 access:
block
00
01
10
11
Overheads for Computers as
Components 2nd ed.
tag
1
1
0
1
data
1000
0001
0000
0100
2-way set-associtive cache
behavior
Final state of cache (twice as big as
direct-mapped):
set blk 0 tag
00 1
01 0
10 0
11 0
© 2008 Wayne Wolf
blk 0 data
1000
1111
0000
0110
blk 1 tag
1
1
Overheads for Computers as
Components 2nd ed.
blk 1 data
0001
0100
2-way set-associative
cache behavior
Final state of cache (same size as directmapped):
set blk 0 tag
0 01
1 10
© 2008 Wayne Wolf
blk 0 data
0000
0111
blk 1 tag
10
11
Overheads for Computers as
Components 2nd ed.
blk 1 data
1000
0100
Example caches
StrongARM:
16 Kbyte, 32-way, 32-byte block instruction
cache.
16 Kbyte, 32-way, 32-byte block data cache
(write-back).
C55x:
Various models have 16KB, 24KB cache.
Can be used as scratch pad memory.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Scratch pad memories
Alternative to cache:
Software determines what is stored in
scratch pad.
Provides predictable behavior at the cost
of software control.
C55x cache can be configured as scratch
pad.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Memory management units
Memory management unit (MMU)
translates addresses:
logical
address
CPU
© 2008 Wayne Wolf
memory
management
unit
physical
address
Overheads for Computers as
Components 2nd ed.
main
memory
Memory management
tasks
Allows programs to move in physical
memory during execution.
Allows virtual memory:
memory images kept in secondary storage;
images returned to main memory on demand
during execution.
Page fault: request for location not
resident in memory.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Address translation
Requires some sort of register/table to
allow arbitrary mappings of logical to
physical addresses.
Two basic schemes:
segmented;
paged.
Segmentation and paging can be
combined (x86).
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Segments and pages
page 1
page 2
segment 1
memory
segment 2
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Segment address
translation
segment base address
logical address
+
segment lower bound
segment upper bound
range
check
physical address
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
range
error
Page address translation
page
offset
page i base
concatenate
page
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
offset
Page table organizations
page
descriptor
page descriptor
flat
© 2008 Wayne Wolf
tree
Overheads for Computers as
Components 2nd ed.
Caching address
translations
Large translation tables require main
memory access.
TLB: cache for address translation.
Typically small.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM memory management
Memory region types:
section: 1 Mbyte block;
large page: 64 kbytes;
small page: 4 kbytes.
An address is marked as section-mapped
or page-mapped.
Two-level translation scheme.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM address translation
Translation table
base register
descriptor
1st level table
1st index 2nd index
offset
concatenate
concatenate
descriptor
2nd level table
© 2008 Wayne Wolf
physical address
Overheads for Computers as
Components 2nd ed.
CPUs
CPU performance
CPU power consumption.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Elements of CPU
performance
Cycle time.
CPU pipeline.
Memory system.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Pipelining
Several instructions are executed
simultaneously at different stages of
completion.
Various conditions can cause pipeline
bubbles that reduce utilization:
branches;
memory system delays;
etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Performance measures
Latency: time it takes for an instruction to
get through the pipeline.
Throughput: number of instructions
executed per time period.
Pipelining increases throughput without
reducing latency.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM7 pipeline
ARM 7 has 3-stage pipe:
fetch instruction from memory;
decode opcode and operands;
execute.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM pipeline execution
fetch
sub r2,r3,r6
execute
fetch
decode
execute
fetch
decode
cmp r2,#3
1
© 2008 Wayne Wolf
add r0,r1,#5
decode
2
3
Overheads for Computers as
Components 2nd ed.
time
execute
Pipeline stalls
If every step cannot be completed in the
same amount of time, pipeline stalls.
Bubbles introduced by stall increase
latency, reduce throughput.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM multi-cycle LDMIA
instruction
ldmia
fetch decodeex ld r2ex ld r3
r0,{r2,r3}
sub
r2,r3,r6
cmp
r2,#3
fetch
decode ex sub
fetch decodeex cmp
time
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Control stalls
Branches often introduce stalls (branch
penalty).
Stall time may depend on whether branch is
taken.
May have to squash instructions that
already started executing.
Don’t know what to fetch until condition is
evaluated.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM pipelined branch
bne foo
sub
r2,r3,r6
foo add
r0,r1,r2
fetch decode ex bne ex bne ex bne
fetch decode
fetch decode ex add
time
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Delayed branch
To increase pipeline efficiency, delayed
branch mechanism requires n instructions
after branch always executed whether
branch is executed or not.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: ARM execution
time
Determine execution time of FIR filter:
for (i=0; i<N; i++)
f = f + c[i]*x[i];
Only branch in loop test may take more
than one cycle.
BLT loop takes 1 cycle best case, 3 worst
case.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
FIR filter ARM code
; loop initiation code
MOV r0,#0 ; use r0 for i, set to 0
MOV r8,#0 ; use a separate index for arrays
ADR r2,N
;
get address for N
LDR r1,[r2] ; get value of N
MOV r2,#0 ; use r2 for f, set to 0
ADR r3,c ; load r3 with address of base of c
ADR r5,x ; load r5 with address of base of x
© 2008 Wayne Wolf
; loop body
loop LDR r4,[r3,r8] ; get value of c[i]
LDR r6,[r5,r8] ; get value of x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sum
; update loop counter and array index
ADD r8,r8,#4 ; add one to array index
ADD r0,r0,#1 ; add 1 to i
; test for exit
CMP r0,r1
BLT loop
if i < N, continue loop
loopend
...
Overheads for Computers as
Components 2nd ed.
;
FIR filter performance by
block
Block
Variable
# instructions
# cycles
Initialization
tinit
7
7
Body
tbody
4
4
Update
tupdate
2
2
Test
ttest
2
[2,4]
tloop = tinit+ N(tbody + tupdate) + (N-1) ttest,worst + ttest,best
Loop test succeeds is worst case
Loop test fails is best case
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
C55x pipeline
C55x has 7-stage pipe:
fetch;
decode;
address: computes data/branch addresses;
access 1: reads data;
access 2: finishes data read;
Read stage: puts operands on internal
busses;
execute.
Overheads for Computers as
© 2008 Wayne Wolf
Components 2nd ed.
C55x organization
B
C,
D busses
D bus
16
3 data read address busses
24
program address bus
24
3 data read busses
program
read bus
32
Instruction
unit
Dual
Dual-multiply
operand
Instruction
Data read
Single
Writes
operand
read
coefficient
fetch
from memory
2 data write busses
2 data write address busses
© 2008 Wayne Wolf
Program
flow
unit
16
24
Overheads for Computers as
Components 2nd ed.
Address
unit
Data
unit
C55x pipeline hazards
Processor structure:
Three computation units.
14 operators.
Can perform two operations per
instruction.
Some combinations of operators are not
legal.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x hazards
A-unit ALU/A-unit ALU.
A-unit swap/A-unit swap.
D-unit ALU,shifter,MAC/D-unit ALU,shifter,MAC
D-unit shifter/D-unit shift, store
D-unit shift, store/D-unit shift, store
D-unit swap/D-unit swap
P-unit control/P-unit control
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Memory system
performance
Caches introduce indeterminacy in
execution time.
Depends on order of execution.
Cache miss penalty: added time due to a
cache miss.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Types of cache misses
Compulsory miss: location has not been
referenced before.
Conflict miss: two locations are fighting
for the same block.
Capacity miss: working set is too large.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CPU power consumption
Most modern CPUs are designed with
power consumption in mind to some
degree.
Power vs. energy:
heat depends on power consumption;
battery life depends on energy consumption.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CMOS power consumption
Voltage drops: power consumption
proportional to V2.
Toggling: more activity means more
power.
Leakage: basic circuit characteristics; can
be eliminated by disconnecting power.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CPU power-saving
strategies
Reduce power supply voltage.
Run at lower clock frequency.
Disable function units with control signals
when not in use.
Disconnect parts from power supply when
not in use.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C55x low power features
Parallel execution units---longer idle shutdown
times.
Multiple data widths:
16-bit ALU vs. 40-bit ALU.
Instruction caches minimizes main memory
accesses.
Power management:
Function unit idle detection.
Memory idle detection.
User-configurable IDLE domains allow programmer
control of what hardware is shut down.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Power management styles
Static power management: does not
depend on CPU activity.
Example: user-activated power-down mode.
Dynamic power management: based on
CPU activity.
Example: disabling off function units.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Application: PowerPC 603
energy features
Provides doze, nap, sleep modes.
Dynamic power management features:
Uses static logic.
Can shut down unused execution units.
Cache organized into subarrays to minimize
amount of active circuitry.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
PowerPC 603 activity
Percentage of time units are idle for SPEC
integer/floating-point:
unit
D cache
I cache
load/store
fixed-point
floating-point
system register
© 2008 Wayne Wolf
Specint92
29%
29%
35%
38%
99%
89%
Overheads for Computers as
Components 2nd ed.
Specfp92
28%
17%
17%
76%
30%
97%
Power-down costs
Going into a power-down mode costs:
time;
energy.
Must determine if going into mode is
worthwhile.
Can model CPU power states with power
state machine.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Application: StrongARM
SA-1100 power saving
Processor takes two supplies:
VDD is main 3.3V supply.
VDDX is 1.5V.
Three power modes:
Run: normal operation.
Idle: stops CPU clock, with logic still powered.
Sleep: shuts off most of chip activity; 3 steps, each
about 30 ms; wakeup takes > 10 ms.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
SA-1100 power state
machine
Prun = 400 mW
run
10 ms
160 ms
90 ms
10 ms
idle
Pidle = 50 mW
© 2008 Wayne Wolf
90 ms
sleep
Psleep = 0.16 mW
Overheads for Computers as
Components 2nd ed.
CPUs
Example: data compressor.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Goals
Compress data transmitted over serial
line.
Receives byte-size input symbols.
Produces output symbols packed into bytes.
Will build software module only here.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Collaboration diagram for
compressor
:input
1..m: packed
1..n: input
output
symbols
symbols
:data compressor
:output
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Huffman coding
Early statistical text compression algorithm.
Select non-uniform size codes.
Use shorter codes for more common symbols.
Use longer codes for less common symbols.
To allow decoding, codes must have unique
prefixes.
No code can be a prefix of a longer valid code.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Huffman example
character
a
b
c
d
e
f
v
P
.45
.24
.11
.08
.07
.05
P=1
P=.55
P=.31
P=.19
P=.12
Overheads for Computers as
Components 2nd ed.
Example Huffman code
Read code from root to leaves:
a
1
b
01
c
0000
d
0001
e
0010
f
0011
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Huffman coder
requirements table
name
data compression module
purpose
functions
code module for Huffman
compression
encoding table, uncoded
byte-size inputs
packed compression output
symbols
Huffman coding
performance
fast
manufacturing cost
N/A
power
N/A
physical size/weight
N/A
inputs
outputs
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Building a specification
Collaboration diagram shows only steadystate input/output.
A real system must:
Accept an encoding table.
Allow a system reset that flushes the
compression buffer.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
data-compressor class
data-compressor
buffer: data-buffer
table: symbol-table
current-bit: integer
encode(): boolean,
data-buffer
flush()
new-symbol-table()
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
data-compressor behaviors
encode: Takes one-byte input, generates
packed encoded symbols and a Boolean
indicating whether the buffer is full.
new-symbol-table: installs new symbol
table in object, throws away old table.
flush: returns current state of buffer,
including number of valid bits in buffer.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Auxiliary classes
data-buffer
symbol-table
databuf[databuflen] :
character
len : integer
symbols[nsymbols] :
data-buffer
len : integer
insert()
length() : integer
value() : symbol
load()
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Auxiliary class roles
data-buffer holds both packed and
unpacked symbols.
Longest Huffman code for 8-bit inputs is 256
bits.
symbol-table indexes encoded verison of
each symbol.
load() puts data in a new symbol table.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Class relationships
1
data-compressor
1
1
data-buffer
© 2008 Wayne Wolf
1
symbol-table
Overheads for Computers as
Components 2nd ed.
Encode behavior
input symbol
T
encode
return true
buffer filled?
F
© 2008 Wayne Wolf
create new buffer
add to buffers
add to buffer
Overheads for Computers as
Components 2nd ed.
return false
Insert behavior
input
symbol
T
pack into
this buffer
fills buffer?
F
© 2008 Wayne Wolf
pack bottom bits
into this buffer,
top bits into
overflow buffer
Overheads for Computers as
Components 2nd ed.
update
length
Program design
In an object-oriented language, we can
reflect the UML specification in the code
more directly.
In a non-object-oriented language, we
must either:
add code to provide object-oriented features;
diverge from the specification structure.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C++ classes
Class data_buffer {
char databuf[databuflen];
int len;
int length_in_chars() { return len/bitsperbyte; }
public:
void insert(data_buffer,data_buffer&);
int length() { return len; }
int length_in_bytes() { return (int)ceil(len/8.0); }
int initialize();
...
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C++ classes, cont’d.
class data_compressor {
data_buffer buffer;
int current_bit;
symbol_table table;
public:
boolean encode(char,data_buffer&);
void new_symbol_table(symbol_table);
int flush(data_buffer&);
data_compressor();
~data_compressor();
}
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C code
struct data_compressor_struct {
data_buffer buffer;
int current_bit;
sym_table table;
}
typedef struct data_compressor_struct data_compressor,
*data_compressor_ptr;
boolean data_compressor_encode(data_compressor_ptr
mycmptrs, char isymbol, data_buffer *fullbuf) ...
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Testing
Test by encoding, then decoding:
symbol table
input symbols
encoder
decoder
compare
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
result
Code inspection tests
Look at the code for potential problems:
Can we run past end of symbol table?
What happens when the next symbol does
not fill the buffer? Does fill it?
Do very long encoded symbols work
properly? Very short symbols?
Does flush() work properly?
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Bus-Based Computer
Systems
Busses.
Memory devices.
I/O devices:
serial links
timers and counters
keyboards
displays
analog I/O
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
The CPU bus
Bus allows CPU, memory, devices to
communicate.
Shared communication medium.
A bus is:
A set of wires.
A communications protocol.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Bus protocols
Bus protocol determines how devices
communicate.
Devices on the bus go through sequences
of states.
Protocols are specified by state machines,
one state machine per actor in the protocol.
May contain asynchronous logic behavior.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Four-cycle handshake
device 1
enq
device 1
device 2
ack
device 2
1
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
2
3
4
time
Four-cycle handshake,
cont’d.
1.
2.
3.
4.
Device
Device
Device
Device
© 2008 Wayne Wolf
1
2
2
1
raises enq.
responds with ack.
lowers ack once it has finished.
lowers enq.
Overheads for Computers as
Components 2nd ed.
Microprocessor busses
 Clock provides
synchronization.
 R/W is true when
reading (R/W’ is false
when reading).
 Address is a-bit bundle
of address lines.
 Data is n-bit bundle of
data lines.
 Data ready signals
when n-bit data is
ready.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Timing diagrams
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Bus read
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
State diagrams for bus
read
Get
data
Send
data
Done
See
ack
Ack
Adrs
Wait
CPU
© 2008 Wayne Wolf
Release
ack
Adrs
Wait
start
Overheads for Computers as
Components 2nd ed.
device
Bus wait state
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Bus burst read
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Bus multiplexing
data enable
device
data
CPU
adrs
adrs
Adrs enable
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
DMA
Direct memory access
(DMA) performs data
transfers without
executing
instructions.
CPU sets up transfer.
DMA engine fetches,
writes.
DMA controller is a
separate unit.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Bus mastership
By default, CPU is bus master and initiates
transfers.
DMA must become bus master to perform
its work.
CPU can’t use bus while DMA operates.
Bus mastership protocol:
Bus request.
Bus grant.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
DMA operation
 CPU sets DMA registers
for start address, length.
 DMA status register
controls the unit.
 Once DMA is bus master,
it transfers automatically.
May run continuously until
complete.
May use every nth bus
cycle.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Bus transfer sequence
diagram
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
System bus configurations
Multiple busses allow
parallelism:
CPU
A bridge connects
two busses.
© 2008 Wayne Wolf
bridge
Slow devices on one
bus.
memory
Fast devices on
separate bus.
slow device
slow device
high-speed
device
Overheads for Computers as
Components 2nd ed.
Bridge state diagram
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM AMBA bus
 Two varieties:
AHB is high-performance.
APB is lower-speed, lower
cost.
 AHB supports pipelining,
burst transfers, split
transactions, multiple bus
masters.
 All devices are slaves on
APB.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Memory components
Several different
types of memory:
DRAM.
SRAM.
Flash.
Each type of memory
comes in varying:
Capacities.
Widths.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Random-access memory
Dynamic RAM is dense, requires refresh.
Synchronous DRAM is dominant type.
SDRAM uses clock to improve performance,
pipeline memory accesses.
Static RAM is faster, less dense, consumes
more power.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
SDRAM operation
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Read-only memory
ROM may be programmed at factory.
Flash is dominant form of fieldprogrammable ROM.
Electrically erasable, must be block erased.
Random access, but write/erase is much
slower than read.
NOR flash is more flexible.
NAND flash is more dense.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Flash memory
Non-volatile memory.
Flash can be programmed in-circuit.
Random access for read.
To write:
Erase a block to 1.
Write bits to 0.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Flash writing
Write is much slower than read.
1.6 ms write, 70 ns read.
Blocks are large (approx. 1 Mb).
Writing causes wear that eventually
destroys the device.
Modern lifetime approx. 1 million writes.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Types of flash
NOR:
Word-accessible read.
Erase by blocks.
NAND:
Read by pages (512-4K bytes).
Erase by blocks.
NAND is cheaper, has faster erase,
sequential access times.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Timers and counters
Very similar:
a timer is incremented by a periodic signal;
a counter is incremented by an
asynchronous, occasional signal.
Rollover causes interrupt.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Watchdog timer
Watchdog timer is periodically reset by
system timer.
If watchdog is not reset, it generates an
interrupt to reset the host.
interrupt
host CPU
© 2008 Wayne Wolf
reset
watchdog
timer
Overheads for Computers as
Components 2nd ed.
Switch debouncing
A switch must be debounced to multiple
contacts caused by eliminate mechanical
bouncing:
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Encoded keyboard
An array of switches is read by an
encoder.
N-key rollover remembers multiple key
depressions.
row
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
LED
Must use resistor to limit current:
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
7-segment LCD display
May use parallel or multiplexed input.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Types of high-resolution
display
Liquid crystal display (LCD) is dominant
form.
Plasma, OLED, etc.
Frame buffer holds current display
contents.
Written by processor.
Read by video.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Touchscreen
Includes input and output device.
Input device is a two-dimensional
voltmeter:
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Touchscreen position
sensing
ADC
voltage
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Digital-to-analog
conversion
Use resistor tree:
R
bn
bn-1
bn-2
Vout
2R
4R
8R
bn-3
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Flash A/D conversion
N-bit result requires 2n comparators:
Vin
encoder
...
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Dual-slope conversion
Use counter to time required to
charge/discharge capacitor.
Charging, then discharging eliminates
non-linearities.
Vin
© 2008 Wayne Wolf
timer
Overheads for Computers as
Components 2nd ed.
Sample-and-hold
Samples data:
Vin
© 2008 Wayne Wolf
converter
Overheads for Computers as
Components 2nd ed.
Bus-Based Computer
Systems
Designing with microprocessors.
Development and debugging.
System-level performance analysis.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
System architectures
Architectures and components:
software;
hardware.
Some software is very hardwaredependent.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Hardware platform
architecture
Contains several elements:
CPU;
bus;
memory;
I/O devices: networking, sensors,
actuators, etc.
How big/fast much each one be?
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Software architecture
Functional description must be broken into
pieces:
division among people;
conceptual organization;
performance;
testability;
maintenance.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Hardware and software
architectures
Hardware and software are intimately
related:
software doesn’t run without hardware;
how much hardware you need is
determined by the software requirements:
speed;
memory.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Evaluation boards
Designed by CPU manufacturer or others.
Includes CPU, memory, some I/O devices.
May include prototyping section.
CPU manufacturer often gives out
evaluation board netlist---can be used as
starting point for your custom board
design.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Adding logic to a board
Programmable logic devices (PLDs)
provide low/medium density logic.
Field-programmable gate arrays (FPGAs)
provide more logic and multi-level logic.
Application-specific integrated circuits
(ASICs) are manufactured for a single
purpose.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
The PC as a platform
Advantages:
cheap and easy to get;
rich and familiar software environment.
Disadvantages:
requires a lot of hardware resources;
not well-adapted to real-time.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Typical PC hardware
platform
CPU
memory
CPU bus
intr
ctrl
DMA
controller
bus
interface
device
high-speed bus
timers
bus
interface
low-speed bus
device
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Typical busses
PCI: standard for high-speed interfacing
33 or 66 MHz.
PCI Express.
USB (Universal Serial Bus), Firewire (IEEE
1394): relatively low-cost serial interface
with high speed.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Software elements
IBM PC uses BIOS (Basic I/O System) to
implement low-level functions:
boot-up;
minimal device drivers.
BIOS has become a generic term for the
lowest-level system software.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: StrongARM
StrongARM system includes:
CPU chip (3.686 MHz clock)
system control module (32.768 kHz clock).
•
•
•
•
•
•
© 2008 Wayne Wolf
Real-time clock;
operating system timer
general-purpose I/O;
interrupt controller;
power manager controller;
reset controller.
Overheads for Computers as
Components 2nd ed.
Debugging embedded
systems
Challenges:
target system may be hard to observe;
target may be hard to control;
may be hard to generate realistic inputs;
setup sequence may be complex.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Host/target design
Use a host system to prepare software for
target system:
target
system
host system
© 2008 Wayne Wolf
serial line
Overheads for Computers as
Components 2nd ed.
Host-based tools
Cross compiler:
compiles code on host for target system.
Cross debugger:
displays target state, allows target system to
be controlled.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Software debuggers
A monitor program residing on the target
provides basic debugger functions.
Debugger should have a minimal footprint
in memory.
User program must be careful not to
destroy debugger program, but , should
be able to recover from some damage
caused by user code.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Breakpoints
A breakpoint allows the user to stop
execution, examine system state, and
change state.
Replace the breakpointed instruction with
a subroutine call to the monitor program.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ARM breakpoints
0x400
0x404
0x408
0x40c
MUL r4,r6,r6
ADD r2,r2,r4
ADD r0,r0,#1
B loop
uninstrumented code
© 2008 Wayne Wolf
0x400
0x404
0x408
0x40c
MUL r4,r6,r6
ADD r2,r2,r4
ADD r0,r0,#1
BL bkpoint
code with breakpoint
Overheads for Computers as
Components 2nd ed.
Breakpoint handler actions
Save registers.
Allow user to examine machine.
Before returning, restore system state.
Safest way to execute the instruction is to
replace it and execute in place.
Put another breakpoint after the replaced
breakpoint to allow restoring the original
breakpoint.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
In-circuit emulators
A microprocessor in-circuit emulator is a
specially-instrumented microprocessor.
Allows you to stop execution, examine
CPU state, modify registers.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Logic analyzers
A logic analyzer is an array of low-grade
oscilloscopes:
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Logic analyzer
architecture
sample
memory
UUT
system clock
microprocessor
vector
address
controller
clock
gen
state or
timing mode
© 2008 Wayne Wolf
keypad
Overheads for Computers as
Components 2nd ed.
display
Boundary scan
Simplifies testing of
multiple chips on a
board.
Registers on pins can
be configured as a
scan chain.
Used for debuggers,
in-circuit emulators.
© 2008 Wayne Wolf
Overheads for Computers as
Components
How to exercise code
Run on host system.
Run on target system.
Run in instruction-level simulator.
Run on cycle-accurate simulator.
Run in hardware/software co-simulation
environment.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Debugging real-time code
Bugs in drivers can cause nondeterministic behavior in the foreground
problem.
Bugs may be timing-dependent.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
System-level performance
analysis
Performance depends
on all the elements of
the system:
CPU.
Cache.
Bus.
Main memory.
I/O device.
© 2008 Wayne Wolf
memory
CPU
cache
Overheads for Computers as
Components 2nd ed.
Bandwidth as performance
Bandwidth applies to several components:
Memory.
Bus.
CPU fetches.
Different parts of the system run at
different clock rates.
Different components may have different
widths (bus, memory).
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Bandwidth and data
transfers
Video frame: 320 x 240 x 3 = 230,400
bytes.
Transfer in 1/30 sec.
Transfer 1 byte/msec, 0.23 sec per frame.
Too slow.
Increase bandwidth:
Increase bus width.
Increase bus clock rate.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Bus bandwidth
T: # bus cycles.
P: time/bus cycle.
Total time for
transfer:
O1
D
O2
W
t = TP.
D: data payload
length.
O1 + O2 = overhead
O.
© 2008 Wayne Wolf
Tbasic(N) = (D+O)N/W
Overheads for Computers as
Components 2nd ed.
Bus burst transfer
bandwidth
T: # bus cycles.
P: time/bus cycle.
Total time for
transfer:
1
2
B
…
O
W
t = TP.
D: data payload
length.
O1 + O2 = overhead
O.
© 2008 Wayne Wolf
Tburst(N) = (BD+O)N/(BW)
Overheads for Computers as
Components 2nd ed.
Memory aspect ratios
16 M
64 M
8M
1
© 2008 Wayne Wolf
4
Overheads for Computers as
Components 2nd ed.
8
Memory access times
Memory component access times comes
from chip data sheet.
Page modes allow faster access for
successive transfers on same page.
If data doesn’t fit naturally into physical
words:
A = [(E/w)mod W]+1
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Bus performance
bottlenecks
Transfer 320 x 240
video frame @ 30
frames/sec = 612,000
bytes/sec.
Is performance
bottleneck bus or
memory?
© 2008 Wayne Wolf
memory
Overheads for Computers as
Components 2nd ed.
CPU
Bus performance
bottlenecks, cont’d.
Bus: assume 1 MHz bus, D=1, O=3:
Tbasic = (1+3)612,000/2 = 1,224,000 cycles
= 1.224 sec.
Memory: try burst mode B=4, width
w=0.5.
Tmem = (4*1+4)612,000/(4*0.5) = 2,448,000
cycles = 0.2448 sec.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Performance spreadsheet
bus
clock period
W
D
O
N
T_basic
t
© 2000 Morgan
Kaufman
1.00E-06
2
1
3
612000
1224000
1.22E+00
memory
clock period
W
D
O
B
N
1.00E-08
0.5
1
4
4
612000
T_mem
t
2448000
2.45E-02
Overheads for Computers as
Components
Parallelism
Speed things up by
running several units
at once.
DMA provides
parallelism if CPU
doesn’t need the bus:
DMA + bus.
CPU.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Bus-Based Computer
Systems
Example: alarm clock
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Alarm clock interface
Alarm on
Alarm off
buzzer
PM
Alarm
ready
light
set
time
© 2008 Wayne Wolf
set
alarm
hour minute
Overheads for Computers as
Components 2nd ed.
button
Operations
Set time: hold set time, depress hour,
minute.
Set alarm time: hold set alarm, depress
hour, minute.
Turn alarm on/off: depress alarm on/off.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Alarm clock requirements
name
purpose
inputs
outputs
functions
alarm clock
24-hour digital clock with one alarm
set time, set alarm, hour, minute, alarm on/off
four-digit display, PM indicator, alarm ready, buzzer
keep time, set time, set alarm, turn alarm on/off,
activate buzzer by alarm
performance
hours and digits, no seconds; not high precision
manufacturing consumer product
cost
power
AC
physical
fits on stand
size/weight
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Alarm clock class diagram
1
Lights*
1
Display
1
1
1
1
Buttons*
Speaker*
© 2008 Wayne Wolf
1
Overheads for Computers as
Components
Mechanism
1
Alarm clock physical
classes
Lights*
Buttons*
digit-val()
digit-scan()
alarm-on-light()
PM-light()
set-time(): boolean
set-alarm(): boolean
alarm-on(): boolean
alarm-off(): boolean
minute(): boolean
hour(): boolean
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Speaker*
buzz()
Display class
Display
time[4]: integer
alarm-indicator: boolean
PM-indicator: boolean
set-time()
alarm-light-on()
alarm-light-off()
PM-light-on()
PM-light-off()
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Mechanism class
Mechanism
Seconds: integer
PM: boolean
tens-hours, ones-hours: boolean
tens-minutes, ones-minutes: boolean
alarm-ready: boolean
alarm-tens-hours, alarm-ones-hours:
boolean
alarm-tens-minutes, alarm-ones-minutes:
boolean
scan-keyboard()
update-time()
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Update-time behavior
update seconds
with rollover
Rollover?
display.set-time(current time)
F
Time >= alarm and alarm-on?
T
update hh:mm
with rollover
AM->PM
PM=true
© 2008 Wayne Wolf
T
alarm.buzzer(true)
PM->AM
PM=false
Overheads for Computers as
Components 2nd ed.
F
Scan-keyboard behavior
compute button activations
alarm-ready=
true
alarm-ready=
false
alarm.buzzer(false)
© 2008 Wayne Wolf
Alarm-on
Alarm-off
save button
states
Overheads for Computers as
Components 2nd ed.
Set-time and
not set-alarm
and hours
Increment time
tens w. rollover
and AM/PM
Increment time
ones w. rollover
and AM/PM
Set-time and
not set-alarm
and minutes
System architecture
Includes:
periodic behavior (clock);
aperiodic behavior (buttons, buzzer
activation).
Two major software components:
interrupt-driven routine updates time;
foreground program deals with buttons,
commands.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Interrupt-driven routine
Timer probably can’t handle one-minute
interrupt interval.
Use software variable to convert interrupt
frequency to seconds.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Foreground program
Operates as while loop:
while (TRUE) {
read_buttons(button_values);
process_command(button_values);
check_alarm();
}
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Testing
Component testing:
test interrupt code on the platform;
can test foreground program using a mockup.
System testing:
relatively few components to integrate;
check clock accuracy;
check recognition of buttons, buzzer, etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Program design and
analysis
Software components.
Representations of programs.
Assembly and linking.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Software state machine
State machine keeps internal state as a
variable, changes state based on inputs.
Uses:
control-dominated code;
reactive systems.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
State machine example
no seat/no seat/
buzzer off
idle
seat/timer on
no seat/buzzer
belt/
buzzer off
© 2008 Wayne Wolf
seated
Belt/buzzer on
belt/belted
no belt/timer on
Overheads for Computers as
Components 2nd ed.
no belt
and no
timer/-
C implementation
#define IDLE 0
#define SEATED 1
#define BELTED 2
#define BUZZER 3
switch (state) {
case IDLE: if (seat) { state = SEATED; timer_on = TRUE; }
break;
case SEATED: if (belt) state = BELTED;
else if (timer) state = BUZZER;
break;
…
}
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Signal processing and
circular buffer
Commonly used in signal processing:
new data constantly arrives;
each datum has a limited lifetime.
time time t+1
d1
d2 d3 d4
d5
d6 d7
Use a circular buffer to hold the data
stream.
Overheads for Computers as
© 2008 Wayne Wolf
Components 2nd ed.
Circular buffer
x1
x2
x3
t1
x4
t2
x5
t3
Data stream
x1
x5
x2
x6
x3
x7
x4
Circular buffer
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
x6
Circular buffers
Indexes locate currently used data,
current input data:
input
use
d1
use
d5
d2
input
d2
d3
d3
d4
d4
time t1+1
time t1
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Circular buffer
implementation: FIR filter
int circ_buffer[N], circ_buffer_head = 0;
int c[N]; /* coefficients */
…
int ibuf, ic;
for (f=0, ibuff=circ_buff_head, ic=0;
ic<N; ibuff=(ibuff==N-1?0:ibuff++), ic++)
f = f + c[ic]*circ_buffer[ibuf];
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Queues
Elastic buffer: holds data that arrives
irregularly.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Buffer-based queues
#define Q_SIZE 32
#define Q_MAX (Q_SIZE-1)
int q[Q_MAX], head, tail;
void initialize_queue() { head =
tail = 0; }
void enqueue(int val) {
if (((tail+1)%Q_SIZE) ==
head) error();
q[tail]=val;
if (tail == Q_MAX) tail = 0;
else tail++;
}
© 2008 Wayne Wolf
int dequeue() {
int returnval;
if (head == tail) error();
returnval = q[head];
if (head == Q_MAX) head =
0;
else head++;
return returnval;
}
Overheads for Computers as
Components 2nd ed.
Models of programs
Source code is not a good representation
for programs:
clumsy;
leaves much information implicit.
Compilers derive intermediate
representations to manipulate and optiize
the program.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Data flow graph
DFG: data flow graph.
Does not represent control.
Models basic block: code with no entry or
exit.
Describes the minimal ordering
requirements on operations.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Single assignment form
x = a + b;
y = c - d;
z = x * y;
y = b + d;
x = a + b;
y = c - d;
z = x * y;
y1 = b + d;
original basic block
single assignment form
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Data flow graph
x = a + b;
y = c - d;
z = x * y;
y1 = b + d;
a
b
c
+
-
y
x
single assignment form
*
z
DFG
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
d
+
y1
DFGs and partial orders
a
b
c
d
+
-
y
x
© 2008 Wayne Wolf
*
+
z
y1
Partial order:
a+b, c-d; b+d x*y
Can do pairs of
operations in any
order.
Overheads for Computers as
Components 2nd ed.
Control-data flow graph
CDFG: represents control and data.
Uses data flow graphs as components.
Two types of nodes:
decision;
data flow.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Data flow node
Encapsulates a data flow graph:
x = a + b;
y=c+d
Write operations in basic block form for
simplicity.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Control
T
v1
v4
cond
F
value
v2
Equivalent forms
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
v3
CDFG example
if (cond1) bb1();
else bb2();
bb3();
switch (test1) {
case c1: bb4(); break;
case c2: bb5(); break;
case c3: bb6(); break;
}
T
cond1
F
bb2()
bb3()
c1
c3
test1
c2
bb4()
© 2008 Wayne Wolf
bb1()
Overheads for Computers as
Components 2nd ed.
bb5()
bb6()
for loop
for (i=0; i<N; i++)
loop_body();
i=0
for loop
F
i<N
i=0;
while (i<N) {
loop_body(); i++; }
equivalent
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
T
loop_body()
Assembly and linking
Last steps in compilation:
HLL
HLL
HLL
compile
link
© 2008 Wayne Wolf
assembly
assembly
assembly
executable
Overheads for Computers as
Components 2nd ed.
assemble
link
Multiple-module programs
Programs may be composed from several
files.
Addresses become more specific during
processing:
relative addresses are measured relative to
the start of a module;
absolute addresses are measured relative to
the start of the CPU address space.
© 2008 Wayne Wolf
Overheads for Computers as
Components
Assemblers
Major tasks:
generate binary for symbolic instructions;
translate labels into addresses;
handle pseudo-ops (data, etc.).
Generally one-to-one translation.
Assembly labels:
label1
© 2008 Wayne Wolf
ORG 100
ADR r4,c
Overheads for Computers as
Components 2nd ed.
Symbol table
xx
yy
ADD r0,r1,r2
ADD r3,r4,r5
CMP r0,r3
SUB r5,r6,r7
assembly code
© 2008 Wayne Wolf
xx
yy
0x8
0x10
symbol table
Overheads for Computers as
Components 2nd ed.
Symbol table generation
Use program location counter (PLC) to
determine address of each location.
Scan program, keeping count of PLC.
Addresses are generated at assembly
time, not execution time.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Symbol table example
PLC=0x7
ADD
r0,r1,r2
PLC=0x7
xx
ADD r3,r4,r5
PLC=0x7
CMP r0,r3
PLC=0x7
yy
SUB r5,r6,r7
© 2008 Wayne Wolf
xx
yy
0x8
0x10
Overheads for Computers as
Components 2nd ed.
Two-pass assembly
Pass 1:
generate symbol table
Pass 2:
generate binary instructions
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Relative address
generation
Some label values may not be known at
assembly time.
Labels within the module may be kept in
relative form.
Must keep track of external labels---can’t
generate full binary for instructions that
use external labels.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Pseudo-operations
Pseudo-ops do not generate instructions:
ORG sets program location.
EQU generates symbol table entry without
advancing PLC.
Data statements define data blocks.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Linking
Combines several object modules into a
single executable module.
Jobs:
put modules in order;
resolve labels across modules.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Externals and entry points
entry point
xxx
yyy
a
ADD r1,r2,r3
B a external reference
%1
© 2008 Wayne Wolf
ADR r4,yyy
ADD r3,r4,r5
Overheads for Computers as
Components 2nd ed.
Module ordering
Code modules must be placed in absolute
positions in the memory space.
Load map or linker flags control the order
of modules.
module1
module2
module3
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Dynamic linking
Some operating systems link modules
dynamically at run time:
shares one copy of library among all
executing programs;
allows programs to be updated with new
versions of libraries.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Program design and
analysis
Compilation flow.
Basic statement translation.
Basic optimizations.
Interpreters and just-in-time compilers.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Compilation
Compilation strategy (Wirth):
compilation = translation + optimization
Compiler determines quality of code:
use of CPU resources;
memory access scheduling;
code size.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Basic compilation phases
HLL
parsing, symbol table
machine-independent
optimizations
machine-dependent
optimizations
assembly
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Statement translation and
optimization
Source code is translated into
intermediate form such as CDFG.
CDFG is transformed/optimized.
CDFG is translated into instructions with
optimization decisions.
Instructions are further optimized.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Arithmetic expressions
a*b + 5*(c-d)
expression
b
a
c
*
5
*
+
DFG
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
d
Arithmetic expressions,
cont’d.
b
a
1
c
*
2
5
3
4
*
+
d
-
ADR r4,a
MOV r1,[r4]
ADR r4,b
MOV r2,[r4]
ADD r3,r1,r2
ADR r4,c
MOV r1,[r4]
ADR r4,d
MOV r5,[r4]
SUB r6,r4,r5
MUL r7,r6,#5
ADD r8,r7,r3
DFG
© 2008 Wayne Wolf
code
Overheads for Computers as
Components 2nd ed.
Control code generation
if (a+b > 0)
x = 5;
else
x = 7;
a+b>0
x=7
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
x=5
Control code generation,
cont’d.
1
a+b>0
3
x=7
© 2008 Wayne Wolf
x=5
ADR r5,a
LDR r1,[r5]
ADR r5,b
2 LDR r2,b
ADD r3,r1,r2
BLE label3
LDR r3,#5
ADR r5,x
STR r3,[r5]
B stmtent
LDR r3,#7
ADR r5,x
STR r3,[r5]
stmtent ...
Overheads for Computers as
Components 2nd ed.
Procedure linkage
Need code to:
call and return;
pass parameters and results.
Parameters and returns are passed on
stack.
Procedures with few parameters may use
registers.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Procedure stacks
growth
proc1(int a) {
proc2(5);
}
proc1
FP
frame pointer
proc2
5
SP
stack pointer
© 2008 Wayne Wolf
accessed relative to SP
Overheads for Computers as
Components 2nd ed.
ARM procedure linkage
APCS (ARM Procedure Call Standard):
r0-r3 pass parameters into procedure. Extra
parameters are put on stack frame.
r0 holds return value.
r4-r7 hold register values.
r11 is frame pointer, r13 is stack pointer.
r10 holds limiting address on stack size to
check for stack overflows.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Data structures
Different types of data structures use
different data layouts.
Some offsets into data structure can be
computed at compile time, others must be
computed at run time.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
One-dimensional arrays
C array name points to 0th element:
a
a[0]
a[1]
= *(a + 1)
a[2]
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Two-dimensional arrays
Column-major layout:
a[0,0]
N
...
a[1,0]
a[1,1]
© 2008 Wayne Wolf
M
a[0,1]
...
= a[i*M+j]
Overheads for Computers as
Components 2nd ed.
Structures
Fields within structures are static offsets:
aptr
struct {
int field1;
char field2;
} mystruct;
field1
field2
struct mystruct a, *aptr = &a;
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
4 bytes
*(aptr+4)
Expression simplification
Constant folding:
8+1 = 9
Algebraic:
a*b + a*c = a*(b+c)
Strength reduction:
a*2 = a<<1
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Dead code elimination
Dead code:
#define DEBUG 0
if (DEBUG) dbg(p1);
0
0
Can be eliminated by
analysis of control
flow, constant
folding.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
1
dbg(p1);
Procedure inlining
Eliminates procedure linkage overhead:
int foo(a,b,c) { return a + b - c;}
z = foo(w,x,y);

z = w + x + y;
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Loop transformations
Goals:
reduce loop overhead;
increase opportunities for pipelining;
improve memory system performance.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Loop unrolling
Reduces loop overhead, enables some
other optimizations.
for (i=0; i<4; i++)
a[i] = b[i] * c[i];

for (i=0; i<2; i++) {
a[i*2] = b[i*2] * c[i*2];
a[i*2+1] = b[i*2+1] * c[i*2+1];
}
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Loop fusion and
distribution
Fusion combines two loops into 1:
for (i=0; i<N; i++) a[i] = b[i] * 5;
for (j=0; j<N; j++) w[j] = c[j] * d[j];
 for (i=0; i<N; i++) {
a[i] = b[i] * 5; w[i] = c[i] * d[i];
}
Distribution breaks one loop into two.
Changes optimizations within loop body.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Loop tiling
Breaks one loop into a nest of loops.
Changes order of accesses within array.
Changes cache behavior.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components 2nd ed.
Loop tiling example
for (i=0; i<N; i++)
for (j=0; j<N; j++)
c[i] = a[i,j]*b[i];
© 2008 Wayne Wolf
for (i=0; i<N; i+=2)
for (j=0; j<N; j+=2)
for (ii=0; ii<min(i+2,n); ii++)
for (jj=0; jj<min(j+2,N); jj++)
c[ii] = a[ii,jj]*b[ii];
Overheads for Computers as
Components 2nd ed.
Array padding
Add array elements to change mapping
into cache:
a[0,0] a[0,1] a[0,2]
a[0,0] a[0,1] a[0,2] a[0,2]
a[1,0] a[1,1] a[1,2]
a[1,0] a[1,1] a[1,2] a[1,2]
before
© 2008 Wayne Wolf
after
Overheads for Computers as
Components 2nd ed.
Register allocation
Goals:
choose register to hold each variable;
determine lifespan of varible in the register.
Basic case: within basic block.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Register lifetime graph
w = a + b; t=1
x = c + w; t=2
y = c + d; t=3
a
b
c
d
w
x
y
1
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
2
3
time
Instruction scheduling
Non-pipelined machines do not need
instruction scheduling: any order of
instructions that satisfies data
dependencies runs equally fast.
In pipelined machines, execution time of
one instruction depends on the nearby
instructions: opcode, operands.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Reservation table
A reservation table
relates
instructions/time to
CPU resources.
© 2008 Wayne Wolf
Time/instr
instr1
instr2
instr3
instr4
Overheads for Computers as
Components 2nd ed.
A
X
X
X
B
X
X
Software pipelining
Schedules instructions across loop
iterations.
Reduces instruction latency in iteration i
by inserting instructions from iteration
i+1.
© 2008 Wayne Wolf
Overheads for Computers as
Components
Instruction selection
May be several ways to implement an
operation or sequence of operations.
Represent operations as graphs, match
possible instruction sequences onto
graph.
+
*
expression
© 2008 Wayne Wolf
*
MUL
+
ADD
templates
Overheads for Computers as
Components 2nd ed.
+
*
MADD
Using your compiler
Understand various optimization levels (O1, -O2, etc.)
Look at mixed compiler/assembler output.
Modifying compiler output requires care:
correctness;
loss of hand-tweaked code.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Interpreters and JIT
compilers
Interpreter: translates and executes
program statements on-the-fly.
JIT compiler: compiles small sections of
code into instructions during program
execution.
Eliminates some translation overhead.
Often requires more memory.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Program design and
analysis
Program-level performance analysis.
Optimizing for:
Execution time.
Energy/power.
Program size.
Program validation and testing.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Program-level performance
analysis
 Need to understand
performance in detail:
Real-time behavior, not
just typical.
On complex platforms.
 Program performance 
CPU performance:
Pipeline, cache are
windows into program.
We must analyze the entire
program.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Complexities of program
performance
Varies with input data:
Different-length paths.
Cache effects.
Instruction-level performance variations:
Pipeline interlocks.
Fetch times.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
How to measure program
performance
Simulate execution of the CPU.
Makes CPU state visible.
Measure on real CPU using timer.
Requires modifying the program to control
the timer.
Measure on real CPU using logic analyzer.
Requires events visible on the pins.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Program performance
metrics
Average-case execution time.
Typically used in application programming.
Worst-case execution time.
A component in deadline satisfaction.
Best-case execution time.
Task-level interactions can cause best-case
program behavior to result in worst-case
system behavior.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Elements of program
performance
Basic program execution time formula:
execution time = program path + instruction timing
Solving these problems independently helps
simplify analysis.
Easier to separate on simpler CPUs.
Accurate performance analysis requires:
Assembly/binary code.
Execution platform.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Data-dependent paths in
an if statement
if (a || b) { /* T1 */
if ( c ) /* T2 */
x = r*s+t; /* A1 */
else y=r+s; /* A2 */
z = r+s+u; /* A3 */
}
else {
if ( c ) /* T3 */
y = r-t; /* A4 */
}
© 2008 Wayne Wolf
a
b
c
path
0
0
0
T1=F, T3=F: no assignments
0
0
1
T1=F, T3=T: A4
0
1
0
T1=T, T2=F: A2, A3
0
1
1
T1=T, T2=T: A1, A3
1
0
0
T1=T, T2=F: A2, A3
1
0
1
T1=T, T2=T: A1, A3
1
1
0
T1=T, T2=F: A2, A3
1
1
1
T1=T, T2=T: A1, A3
Overheads for Computers as
Components 2nd ed.
Paths in a loop
for (i=0, f=0; i<N; i++)
f = f + c[i] * x[i];
i=0
f=0
N
i=N
Y
f = f + c[i] * x[i]
i=i+1
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Instruction timing
 Not all instructions take the same amount of time.
Multi-cycle instructions.
Fetches.
 Execution times of instructions are not independent.
Pipeline interlocks.
Cache effects.
 Execution times may vary with operand value.
Floating-point operations.
Some multi-cycle integer operations.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Mesaurement-driven
performance analysis
Not so easy as it sounds:
Must actually have access to the CPU.
Must know data inputs that give worst/best
case performance.
Must make state visible.
Still an important method for performance
analysis.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Feeding the program
Need to know the desired input values.
May need to write software scaffolding to
generate the input values.
Software scaffolding may also need to
examine outputs to generate feedbackdriven inputs.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Trace-driven measurement
Trace-driven:
Instrument the program.
Save information about the path.
Requires modifying the program.
Trace files are large.
Widely used for cache analysis.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Physical measurement
In-circuit emulator allows tracing.
Affects execution timing.
Logic analyzer can measure behavior at pins.
Address bus can be analyzed to look for events.
Code can be modified to make events visible.
Particularly important for real-world input
streams.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CPU simulation
Some simulators are less accurate.
Cycle-accurate simulator provides
accurate clock-cycle timing.
Simulator models CPU internals.
Simulator writer must know how CPU works.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
SimpleScalar FIR filter
simulation
int x[N] = {8, 17, … };
int c[N] = {1, 2, … };
main() {
int i, k, f;
for (k=0; k<COUNT;
k++)
for (i=0; i<N; i++)
f += c[i]*x[i];
}
© 2008 Wayne Wolf
N
total sim
cycles
sim cycles
per filter
execution
100
25854
259
1,000
155759
156
1,0000
1451840
145
Overheads for Computers as
Components 2nd ed.
Performance optimization
motivation
Embedded systems must often meet
deadlines.
Faster may not be fast enough.
Need to be able to analyze execution
time.
Worst-case, not typical.
Need techniques for reliably improving
execution time.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Programs and performance
analysis
Best results come from analyzing
optimized instructions, not high-level
language code:
non-obvious translations of HLL statements
into instructions;
code may move;
cache effects are hard to predict.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Loop optimizations
Loops are good targets for optimization.
Basic loop optimizations:
code motion;
induction-variable elimination;
strength reduction (x*2 -> x<<1).
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Code motion
for (i=0; i<N*M; i++)
z[i] = a[i] + b[i];
i=0; Xi=0;
= N*M
i<N*M
i<X
Y
z[i] = a[i] + b[i];
i = i+1;
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
N
Induction variable
elimination
Induction variable: loop index.
Consider loop:
for (i=0; i<N; i++)
for (j=0; j<M; j++)
z[i,j] = b[i,j];
Rather than recompute i*M+j for each
array in each iteration, share induction
variable between arrays, increment at end
Overheads for Computers as
of
loop
body.
© 2008 Wayne Wolf
Components 2 ed.
nd
Cache analysis
Loop nest: set of loops, one inside other.
Perfect loop nest: no conditionals in nest.
Because loops use large quantities of
data, cache conflicts are common.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Array conflicts in cache
a[0,0]
1024
1024
b[0,0]
...
4099
main memory
© 2008 Wayne Wolf
4099
cache
Overheads for Computers as
Components 2nd ed.
Array conflicts, cont’d.
Array elements conflict because they are
in the same line, even if not mapped to
same location.
Solutions:
move one array;
pad array.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Performance optimization
hints
Use registers efficiently.
Use page mode memory accesses.
Analyze cache behavior:
instruction conflicts can be handled by
rewriting code, rescheudling;
conflicting scalar data can easily be moved;
conflicting array data can be moved, padded.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Energy/power optimization
Energy: ability to do work.
Most important in battery-powered systems.
Power: energy per unit time.
Important even in wall-plug systems---power
becomes heat.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Measuring energy
consumption
Execute a small loop, measure current:
I
while (TRUE)
a();
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Sources of energy
consumption
Relative energy per operation (Catthoor et
al):
memory transfer: 33
external I/O: 10
SRAM write: 9
SRAM read: 4.4
multiply: 3.6
add: 1
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Cache behavior is
important
Energy consumption has a sweet spot as
cache size changes:
cache too small: program thrashes, burning
energy on external memory accesses;
cache too large: cache itself burns too much
power.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Cache sweet spot
[Li98] © 1998 IEEE
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Optimizing for energy
First-order optimization:
high performance = low energy.
Not many instructions trade speed for
energy.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Optimizing for energy,
cont’d.
Use registers efficiently.
Identify and eliminate cache conflicts.
Moderate loop unrolling eliminates some
loop overhead instructions.
Eliminate pipeline stalls.
Inlining procedures may help: reduces
linkage, but may increase cache
thrashing.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Efficient loops
General rules:
Don’t use function calls.
Keep loop body small to enable local repeat
(only forward branches).
Use unsigned integer for loop counter.
Use <= to test loop counter.
Make use of compiler---global optimization,
software pipelining.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Single-instruction repeat loop
example
STM #4000h,AR2
; load pointer to source
STM #100h,AR3
; load pointer to destination
RPT #(1024-1)
MVDD *AR2+,*AR3+
; move
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Optimizing for program
size
Goal:
reduce hardware cost of memory;
reduce power consumption of memory units.
Two opportunities:
data;
instructions.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Data size minimization
Reuse constants, variables, data buffers in
different parts of code.
Requires careful verification of correctness.
Generate data using instructions.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Reducing code size
Avoid function inlining.
Choose CPU with compact instructions.
Use specialized instructions where
possible.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Program validation and
testing
But does it work?
Concentrate here on functional
verification.
Major testing strategies:
Black box doesn’t look at the source code.
Clear box (white box) does look at the
source code.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Clear-box testing
Examine the source code to determine whether
it works:
Can you actually exercise a path?
Do you get the value you expect along a path?
Testing procedure:
Controllability: rovide program with inputs.
Execute.
Observability: examine outputs.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Controlling and observing
programs
firout = 0.0;
for (j=curr, k=0; j<N; j++, k++)
firout += buff[j] * c[k];
for (j=0; j<curr; j++, k++)
firout += buff[j] * c[k];
if (firout > 100.0) firout = 100.0;
if (firout < -100.0) firout = -100.0;
Controllability:
Must fill circular buffer
with desired N values.
Other code governs
how we access the
buffer.
Observability:
Want to examine
firout before limit
testing.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Execution paths and
testing
Paths are important in functional testing
as well as performance analysis.
In general, an exponential number of
paths through the program.
Show that some paths dominate others.
Heuristically limit paths.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Choosing the paths to test
Possible criteria:
Execute every
statement at least
not covered
once.
Execute every branch
direction at least once.
Equivalent for
structured programs.
Not true for gotos.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Basis paths
Approximate CDFG
with undirected
graph.
Undirected graphs
have basis paths:
All paths are linear
combinations of basis
paths.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Cyclomatic complexity
Cyclomatic complexity
is a bound on the size
of basis sets:
e = # edges
n = # nodes
p = number of graph
components
M = e – n + 2p.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Branch testing
Heuristic for testing branches.
Exercise true and false branches of
conditional.
Exercise every simple condition at least once.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Branch testing example
Correct:
Test:
if (a || (b >= c)) {
printf(“OK\n”); }
Incorrect:
if (a && (b >= c)) {
printf(“OK\n”); }
© 2008 Wayne Wolf
a = F
(b >=c) = T
Example:
Correct: [0 || (3 >=
2)] = T
Incorrect: [0 && (3
>= 2)] = F
Overheads for Computers as
Components 2nd ed.
Another branch testing
example
Correct:
if ((x == good_pointer) &&
x->field1 == 3)) {
printf(“got the value\n”); }
Incorrect:
 if ((x = good_pointer) &&
x->field1 == 3)) {
printf(“got the value\n”); }
© 2008 Wayne Wolf
Incorrect code
changes pointer.
Assignment returns
new LHS in C.
Test that catches
error:
(x != good_pointer)
&& x->field1 = 3)
Overheads for Computers as
Components 2nd ed.
Domain testing
Heuristic test for
linear inequalities.
Test on each side +
boundary of
inequality.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Def-use pairs
Variable def-use:
Def when value is
assigned (defined).
Use when used on
right-hand side.
Exercise each def-use
pair.
Requires testing
correct path.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Loop testing
Loops need specialized tests to be tested
efficiently.
Heuristic testing strategy:
Skip loop entirely.
One loop iteration.
Two loop iterations.
# iterations much below max.
n-1, n, n+1 iterations where n is max.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Black-box testing
Complements clear-box testing.
May require a large number of tests.
Tests software in different ways.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Black-box test vectors
Random tests.
May weight distribution based on software
specification.
Regression tests.
Tests of previous versions, bugs, etc.
May be clear-box tests of previous versions.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
How much testing is
enough?
Exhaustive testing is impractical.
One important measure of test quality---bugs
escaping into field.
Good organizations can test software to give
very low field bug report rates.
Error injection measures test quality:
Add known bugs.
Run your tests.
Determine % injected bugs that are caught.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Program design and
analysis
Software modem.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Theory of operation
Frequency-shift keying:
separate frequencies for 0 and 1.
0
1
time
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
FSK encoding
Generate waveforms based on current bit:
0110101
© 2008 Wayne Wolf
bit-controlled
waveform
generator
Overheads for Computers as
Components 2nd ed.
A/D converter
FSK decoding
© 2008 Wayne Wolf
zero filter
detector
0 bit
one filter
detector
1 bit
Overheads for Computers as
Components 2nd ed.
Transmission scheme
Send data in 8-bit bytes. Arbitrary spacing
between bytes.
Byte starts with 0 start bit.
Receiver measures length of start bit to
synchronize itself to remaining 8 bits.
start (0)
© 2008 Wayne Wolf
bit 1
bit 2
bit 3
Overheads for Computers as
Components 2nd ed.
...
bit 8
Requirements
Inputs
Analog sound input, reset button.
Outputs
Analog sound output, LED bit display.
Functions
Transmitter: Sends data from memory
in 8-bit bytes plus start bit.
Receiver: Automatically detects bytes
and reads bits. Displays current bit on
LED.
1200 baud.
Performance
Manufacturing cost
Power
Physical
size/weight
© 2008 Wayne Wolf
Dominated by microprocessor and
analog I/O
Powered by AC.
Small desktop object.
Overheads for Computers as
Components 2nd ed.
Specification
Line-in*
1
1
Receiver
sample-in()
bit-out()
input()
Transmitter
1
1
bit-in()
sample-out()
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Line-out*
output()
System architecture
Interrupt handlers for samples:
input and output.
Transmitter.
Receiver.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Transmitter
Waveform generation by table lookup.
float sine_wave[N_SAMP] = { 0.0, 0.5,
0.866, 1, 0.866, 0.5, 0.0, -0.5, -0.866, -1.0, 0.866, -0.5, 0};
time
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Receiver
Filters (FIR for simplicity) use circular
buffers to hold data.
Timer measures bit length.
State machine recognizes start bits, data
bits.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Hardware platform
CPU.
A/D converter.
D/A converter.
Timer.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Component design and
testing
Easy to test transmitter and receiver on
host.
Transmitter can be verified with speaker
outputs.
Receiver verification tasks:
start bit recognition;
data bit recognition.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
System integration and
testing
Use loopback mode to test components
against each other.
Loopback in software or by connecting D/A
and A/D converters.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Processes and operating
systems
Multiple tasks and multiple processes.
Specifications of process timing.
Preemptive real-time operating systems.
Processes and UML.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Reactive systems
Respond to external events.
Engine controller.
Seat belt monitor.
Requires real-time response.
System architecture.
Program implementation.
May require a chain reaction among
multiple processors.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Tasks and processes
A task is a functional
description of a
connected set of
operations.
(Task can also mean
a collection of
processes.)
 A process is a unique
execution of a program.
Several copies of a
program may run
simultaneously or at
different times.
 A process has its own
state:
registers;
memory.
 The operating system
manages processes.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Why multiple processes?
Multiple tasks means multiple processes.
Processes help with timing complexity:
multiple rates
multimedia
automotive
asynchronous input
user interfaces
communication systems
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Multi-rate systems
Tasks may be synchronous or
asynchronous.
Synchronous tasks may recur at different
rates.
Processes run at different rates based on
computational needs of the tasks.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: engine control
Tasks:
spark control
crankshaft sensing
fuel/air mixture
oxygen sensor
Kalman filter
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
engine
controller
Typical rates in engine
controllers
Variable
Full range time (ms)
Update period (ms)
Engine spark timing
300
2
Throttle
40
2
Air flow
30
4
Battery voltage
80
4
Fuel flow
250
10
Recycled exhaust gas
500
25
Status switches
100
20
Air temperature
Seconds
400
Barometric pressure
Seconds
1000
Spark (dwell)
10
1
Fuel adjustment
80
8
Carburetor
500
25
Mode actuators
100
Overheads for Computers as
100
© 2008 Wayne Wolf
Components 2nd ed.
Real-time systems
Perform a computation to conform to external
timing constraints.
Deadline frequency:
Periodic.
Aperiodic.
Deadline type:
Hard: failure to meet deadline causes system failure.
Soft: failure to meet deadline causes degraded
response.
Firm: late response is useless but some late
responses can be tolerated.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Timing specifications on
processes
Release time: time at which process
becomes ready.
Deadline: time at which process must
finish.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Release times and
deadlines
deadline
P1
initiating
event
© 2008 Wayne Wolf
period
aperiodic
process
periodic
process
periodic process initiated
initiated
by event
at start
of period
Overheads for Computers as
Components 2nd ed.
time
Rate requirements on
processes
Period: interval
between process
activations.
Rate: reciprocal of
period.
Initiatino rate may be
higher than period--several copies of
process run at once.
© 2008 Wayne Wolf
CPU 1
CPU 2
CPU 3
CPU 4
Overheads for Computers as
Components 2nd ed.
P11
P12
P13
P14
time
Timing violations
What happens if a process doesn’t finish
by its deadline?
Hard deadline: system fails if missed.
Soft deadline: user may notice, but system
doesn’t necessarily fail.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Example: Space Shuttle
software error
Space Shuttle’s first launch was delayed
by a software timing error:
Primary control system PASS and backup
system BFS.
BFS failed to synchronize with PASS.
Change to one routine added delay that
threw off start time calculation.
1 in 67 chance of timing problem.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Task graphs
 Tasks may have data
dependencies---must
execute in certain order.
 Task graph shows
data/control
dependencies between
processes.
 Task: connected set of
processes.
 Task set: One or more
tasks.
© 2008 Wayne Wolf
P1
P2
P5
P3
P6
P4
task 1
Overheads for Computers as
Components 2nd ed.
task 2
task set
Communication between
tasks
 Task graph assumes that
all processes in each task
run at the same rate,
tasks do not
communicate.
 In reality, some amount
of inter-task
communication is
necessary.
It’s hard to require
immediate response for
multi-rate communication.
© 2008 Wayne Wolf
MPEG
system
layer
MPEG
audio
Overheads for Computers as
Components 2nd ed.
MPEG
video
Process execution
characteristics
Process execution time Ti.
Execution time in absence of preemption.
Possible time units: seconds, clock cycles.
Worst-case, best-case execution time may be useful
in some cases.
Sources of variation:
Data dependencies.
Memory system.
CPU pipeline.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Utilization
CPU utilization:
Fraction of the CPU that is doing useful work.
Often calculated assuming no scheduling
overhead.
Utilization:
U =
(CPU time for useful work)/ (total available CPU time)
=[S
= T/t
© 2008 Wayne Wolf
t1 ≤ t ≤ t2
T(t) ] / [t2 – t1]
Overheads for Computers as
Components 2nd ed.
State of a process
A process can be in
one of three states:
executing on the CPU;
ready to run;
waiting for data.
executing
gets
CPU
preempted
needs
data
gets data
ready
waiting
needs data
© 2008 Wayne Wolf
gets data
and CPU
Overheads for Computers as
Components 2nd ed.
The scheduling problem
Can we meet all deadlines?
Must be able to meet deadlines in all cases.
How much CPU horsepower do we need
to meet our deadlines?
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Scheduling feasibility
Resource constraints
make schedulability
analysis NP-hard.
Must show that the
deadlines are met for
all timings of resource
requests.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
P1
P2
I/O device
Simple processor feasibility
Assume:
No resource conflicts.
Constant process
execution times.
Require:
T2
T
T ≥ Si Ti
Can’t use more than
100% of the CPU.
© 2008 Wayne Wolf
T1
Overheads for Computers as
Components 2nd ed.
T3
Hyperperiod
Hyperperiod: least common multiple
(LCM) of the task periods.
Must look at the hyperperiod schedule to
find all task interactions.
Hyperperiod can be very long if task
periods are not chosen carefully.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Hyperperiod example
Long hyperperiod:
P1 7 ms.
P2 11 ms.
P3 15 ms.
LCM = 1155 ms.
Shorter hyperperiod:
P1 8 ms.
P2 12 ms.
P3 16 ms.
LCM = 96 ms.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Simple processor feasibility
example
P1 period 1 ms, CPU
time 0.1 ms.
P2 period 1 ms, CPU
time 0.2 ms.
P3 period 5 ms, CPU
time 0.3 ms.
© 2004 Wayne Wolf
LCM
P1
P2
P3
5.00E-03
peirod
CPU time CPU time/LCM
1.00E-03 1.00E-04 5.00E-04
1.00E-03 2.00E-04 1.00E-03
5.00E-03 3.00E-04 3.00E-04
total CPU/LCM
utilization
Overheads for Computers as
Components 2nd ed.
1.80E-03
3.60E-01
Cyclostatic/TDMA
Schedule in time
slots.
T
Same process
activation
irrespective of
workload.
1
T2
P
Time slots may be
equal size or
unequal.
© 2004 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
T3
T1
T2
P
T3
TDMA assumptions
Schedule based on
least common
multiple (LCM) of
the process
periods.
Trivial scheduler > very small
scheduling
overhead.
© 2008 Wayne Wolf
P1
P1
P2
Overheads for Computers as
Components 2nd ed.
P1
P2
PLCM
TDMA schedulability
Always same CPU utilization (assuming
constant process execution times).
Can’t handle unexpected loads.
Must schedule a time slot for aperiodic
events.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
TDMA schedulability example
TDMA period = 10
ms.
P1 CPU time 1 ms.
P2 CPU time 3 ms.
P3 CPU time 2 ms.
P4 CPU time 2 ms.
© 2008 Wayne Wolf
TDMA period
CPU time
P1
1.00E-03
P2
3.00E-03
P3
2.00E-03
P4
2.00E-03
total
8.00E-03
utilization 8.00E-01
Overheads for Computers as
Components 2nd ed.
1.00E-02
Round-robin
Schedule process
only if ready.
Always test
processes in the
same order.
Variations:
T1
T2
P
Constant system
period.
Start round-robin
again after finishing
a round.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
T3
T2
T3
P
Round-robin assumptions
Schedule based on least common multiple
(LCM) of the process periods.
Best done with equal time slots for
processes.
Simple scheduler -> low scheduling
overhead.
Can be implemented in hardware.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Round-robin schedulability
Can bound maximum CPU load.
May leave unused CPU cycles.
Can be adapted to handle unexpected
load.
Use time slots at end of period.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Schedulability and
overhead
The scheduling process consumes CPU
time.
Not all CPU time is available for processes.
Scheduling overhead must be taken into
account for exact schedule.
May be ignored if it is a small fraction of total
execution time.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Running periodic
processes
Need code to control execution of
processes.
Simplest implementation: process =
subroutine.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
while loop implementation
Simplest
implementation has
one loop.
No control over
execution timing.
© 2008 Wayne Wolf
while (TRUE) {
p1();
p2();
}
Overheads for Computers as
Components 2nd ed.
Timed loop implementation
Encapuslate set of all
processes in a single
function that
implements the task
set,.
Use timer to control
execution of the task.
void pall(){
p1();
p2();
}
No control over timing
of individual
processes.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Multiple timers
implementation
Each task has its own
function.
Each task has its own
timer.
May not have enough
timers to implement
all the rates.
© 2008 Wayne Wolf
void pA(){ /* rate A */
p1();
p3();
}
void B(){ /* rate B */
p2();
p4();
p5();
}
Overheads for Computers as
Components 2nd ed.
Timer + counter
implementation
Use a software count
to divide the timer.
Only works for clean
multiples of the timer
period.
© 2008 Wayne Wolf
int p2count = 0;
void pall(){
p1();
if (p2count >= 2) {
p2();
p2count = 0;
}
else p2count++;
p3();
}
Overheads for Computers as
Components 2nd ed.
Implementing processes
All of these implementations are
inadequate.
Need better control over timing.
Need a better mechanism than
subroutines.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Processes and operating
systems
Operating systems.
Operating systems
The operating system controls resources:
who gets the CPU;
when I/O takes place;
how much memory is allocated.
The most important resource is the CPU
itself.
CPU access controlled by the scheduler.
Process state
A process can be in
one of three states:
executing on the CPU;
ready to run;
waiting for data.
executing
gets
CPU
preempted
gets data
and CPU
needs
data
gets data
ready
waiting
needs data
Operating system
structure
OS needs to keep track of:
process priorities;
scheduling state;
process activation record.
Processes may be created:
statically before system starts;
dynamically during execution.
Embedded vs. generalpurpose scheduling
Workstations try to avoid starving
processes of CPU access.
Fairness = access to CPU.
Embedded systems must meet deadlines.
Low-priority processes may not run for a
long time.
Priority-driven scheduling
Each process has a priority.
CPU goes to highest-priority process that
is ready.
Priorities determine scheduling policy:
fixed priority;
time-varying priorities.
Priority-driven scheduling
example
Rules:
each process has a fixed priority (1 highest);
highest-priority ready process gets CPU;
process continues until done.
Processes
P1: priority 1, execution time 10
P2: priority 2, execution time 30
P3: priority 3, execution time 20
Priority-driven scheduling
example
P3 ready t=18
P2 ready t=0 P1 ready t=15
P2
0
P1
10
20
P2
30
P3
40
60
50
time
The scheduling problem
Can we meet all deadlines?
Must be able to meet deadlines in all cases.
How much CPU horsepower do we need
to meet our deadlines?
Process initiation
disciplines
Periodic process: executes on (almost)
every period.
Aperiodic process: executes on demand.
Analyzing aperiodic process sets is harder--must consider worst-case combinations
of process activations.
Timing requirements on
processes
Period: interval between process
activations.
Initiation interval: reciprocal of period.
Initiation time: time at which process
becomes ready.
Deadline: time at which process must
finish.
Timing violations
What happens if a process doesn’t finish
by its deadline?
Hard deadline: system fails if missed.
Soft deadline: user may notice, but system
doesn’t necessarily fail.
Example: Space Shuttle
software error
Space Shuttle’s first launch was delayed
by a software timing error:
Primary control system PASS and backup
system BFS.
BFS failed to synchronize with PASS.
Change to one routine added delay that
threw off start time calculation.
1 in 67 chance of timing problem.
Interprocess
communication
Interprocess communication (IPC): OS
provides mechanisms so that processes
can pass data.
Two types of semantics:
blocking: sending process waits for response;
non-blocking: sending process continues.
IPC styles
Shared memory:
processes have some memory in common;
must cooperate to avoid destroying/missing
messages.
Message passing:
processes send messages along a
communication channel---no common
address space.
Shared memory
Shared memory on a bus:
CPU 1
memory
CPU 2
Race condition in shared
memory
Problem when two CPUs try to write the
same location:
CPU 1 reads flag and sees 0.
CPU 2 reads flag and sees 0.
CPU 1 sets flag to one and writes location.
CPU 2 sets flag to one and overwrites
location.
Atomic test-and-set
Problem can be solved with an atomic
test-and-set:
single bus operation reads memory location,
tests it, writes it.
ARM test-and-set provided by SWP:
ADR r0,SEMAPHORE
LDR r1,#1
GETFLAG SWP r1,r1,[r0]
BNZ GETFLAG
Critical regions
Critical region: section of code that cannot
be interrupted by another process.
Examples:
writing shared memory;
accessing I/O device.
Semaphores
Semaphore: OS primitive for controlling
access to critical regions.
Protocol:
Get access to semaphore with P().
Perform critical region operations.
Release semaphore with V().
Message passing
Message passing on a network:
CPU 1
CPU 2
message
message
message
Process data
dependencies
One process may not
be able to start until
another finishes.
Data dependencies
defined in a task
graph.
All processes in one
task run at the same
rate.
P1
P2
P3
P4
Other operating system
functions
Date/time.
File system.
Networking.
Security.
Processes and operating
systems
Scheduling policies:
RMS;
EDF.
Scheduling modeling assumptions.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Metrics
How do we evaluate a scheduling policy:
Ability to satisfy all deadlines.
CPU utilization---percentage of time devoted
to useful work.
Scheduling overhead---time required to make
scheduling decision.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Rate monotonic scheduling
RMS (Liu and Layland): widely-used,
analyzable scheduling policy.
Analysis is known as Rate Monotonic
Analysis (RMA).
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
RMA model
All process run on single CPU.
Zero context switch time.
No data dependencies between
processes.
Process execution time is constant.
Deadline is at end of period.
Highest-priority ready process runs.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Process parameters
Ti is computation time of process i; ti is
period of process i.
period ti
Pi
computation time Ti
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Rate-monotonic analysis
Response time: time required to finish
process.
Critical instant: scheduling state that gives
worst response time.
Critical instant occurs when all higherpriority processes are ready to execute.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Critical instant
interfering processes
P1
P1
P2
critical
instant
P3
P1 P1
P2
P1
P2
P3
P4
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
RMS priorities
Optimal (fixed) priority assignment:
shortest-period process gets highest priority;
priority inversely proportional to period;
break ties arbitrarily.
No fixed-priority scheme does better.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
RMS example
P2 period
P2
P1
0
P1 period
P1
P1
5
10
time
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
RMS CPU utilization
Utilization for n processes is
S i Ti / ti
As number of tasks approaches infinity,
maximum utilization approaches 69%.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
RMS CPU utilization,
cont’d.
RMS cannot use 100% of CPU, even with
zero context switch overhead.
Must keep idle cycles available to handle
worst-case scenario.
However, RMS guarantees all processes
will always meet their deadlines.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
RMS implementation
Efficient implementation:
scan processes;
choose highest-priority active process.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Earliest-deadline-first
scheduling
EDF: dynamic priority scheduling scheme.
Process closest to its deadline has highest
priority.
Requires recalculating processes at every
timer interrupt.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
EDF analysis
EDF can use 100% of CPU.
But EDF may fail to miss a deadline.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
EDF implementation
On each timer interrupt:
compute time to deadline;
choose process closest to deadline.
Generally considered too expensive to use
in practice.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Fixing scheduling
problems
What if your set of processes is
unschedulable?
Change deadlines in requirements.
Reduce execution times of processes.
Get a faster CPU.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Priority inversion
Priority inversion: low-priority process
keeps high-priority process from running.
Improper use of system resources can
cause scheduling problems:
Low-priority process grabs I/O device.
High-priority device needs I/O device, but
can’t get it until low-priority process is done.
Can cause deadlock.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Solving priority inversion
Give priorities to system resources.
Have process inherit the priority of a
resource that it requests.
Low-priority process inherits priority of
device if higher.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Data dependencies
Data dependencies
allow us to improve
utilization.
Restrict combination
of processes that can
run simultaneously.
P1 and P2 can’t run
simultaneously.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
P1
P2
Context-switching time
Non-zero context switch time can push
limits of a tight schedule.
Hard to calculate effects---depends on
order of context switches.
In practice, OS context switch overhead is
small (hundreds of clock cycles) relative to
many common task periods (ms – ms).
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Processes and operating
systems
Interprocess communication.
Operating system performance.
Power management.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Interprocess
communication
OS provides interprocess communication
mechanisms:
various efficiencies;
communication power.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Interprocess
communication
Interprocess communication (IPC): OS
provides mechanisms so that processes
can pass data.
Two types of semantics:
blocking: sending process waits for response;
non-blocking: sending process continues.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
IPC styles
Shared memory:
processes have some memory in common;
must cooperate to avoid destroying/missing
messages.
Message passing:
processes send messages along a
communication channel---no common
address space.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Shared memory
Shared memory on a bus:
CPU 1
© 2008 Wayne Wolf
memory
Overheads for Computers as
Components 2nd ed.
CPU 2
Race condition in shared
memory
Problem when two CPUs try to write the
same location:
CPU 1 reads flag and sees 0.
CPU 2 reads flag and sees 0.
CPU 1 sets flag to one and writes location.
CPU 2 sets flag to one and overwrites
location.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Atomic test-and-set
Problem can be solved with an atomic
test-and-set:
single bus operation reads memory location,
tests it, writes it.
ARM test-and-set provided by SWP:
ADR r0,SEMAPHORE
LDR r1,#1
GETFLAG SWP r1,r1,[r0]
BNZ GETFLAG
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Critical regions
Critical region: section of code that cannot
be interrupted by another process.
Examples:
writing shared memory;
accessing I/O device.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Semaphores
Semaphore: OS primitive for controlling
access to critical regions.
Protocol:
Get access to semaphore with P().
Perform critical region operations.
Release semaphore with V().
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Message passing
Message passing on a network:
CPU 1
CPU 2
message
message
message
© 2008 Wayne Wolf
Overheads for Computers as
Components
Process data
dependencies
One process may not
be able to start until
another finishes.
Data dependencies
defined in a task
graph.
All processes in one
task run at the same
rate.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
P1
P2
P3
P4
Signals in UML
More general than Unix signal---may carry
arbitrary data:
<<signal>>
aSig
someClass
<<send>>
sigbehavior()
p : integer
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Evaluating RTOS
performance
Simplifying assumptions:
Context switch costs no CPU time,.
We know the exact execution time of
processes.
WCET/BCET don’t depend on context
switches.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Scheduling and context
switch overhead
Process
Execution deadline
time
P1
3
5
P2
3
10
With context switch overhead of
1, no feasible schedule.
2TP1 + TP2 = 2*(1+3)+(1_3)=11
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Process execution time
Process execution time is not constant.
Extra CPU time can be good.
Extra CPU time can also be bad:
Next process runs earlier, causing new
preemption.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Processes and caches
Processes can cause additional caching
problems.
Even if individual processes are wellbehaved, processes may interfere with each
other.
Worst-case execution time with bad
behavior is usually much worse than
execution time with good cache behavior.
© 2008 Wayne Wolf
Overheads for Computers as
Components
Effects of scheduling on
the cache
Process
WCET
Avg. CPU
time
P1
8
6
P2
4
3
P3
4
3
Schedule 1 (LRU cache):
Schedule 2 (half of cache
reserved for P1):
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Power optimization
Power management: determining how
system resources are scheduled/used to
control power consumption.
OS can manage for power just as it
manages for time.
OS reduces power by shutting down units.
May have partial shutdown modes.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Power management and
performance
Power management and performance are
often at odds.
Entering power-down mode consumes
energy,
time.
Leaving power-down mode consumes
energy,
time.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Simple power management
policies
Request-driven: power up once request is
received. Adds delay to response.
Predictive shutdown: try to predict how
long you have before next request.
May start up in advance of request in
anticipation of a new request.
If you predict wrong, you will incur additional
delay while starting up.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Probabilistic shutdown
Assume service requests are probabilistic.
Optimize expected values:
power consumption;
response time.
Simple probabilistic: shut down after time
Ton, turn back on after waiting for Toff.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Advanced Configuration
and Power Interface
ACPI: open standard for power
management services.
applications
device
drivers
OS kernel
ACPI BIOS
Hardware platform
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
power
management
ACPI global power states
G3: mechanical off
G2: soft off
S1:
S2:
S3:
S4:
low wake-up latency with no loss of context
low latency with loss of CPU/cache state
low latency with loss of all state except memory
lowest-power state with all devices off
G1: sleeping state
G0: working state
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Processes and operating
systems
Telephone answering machine.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Theory of operation
Compress audio using adaptive
differential pulse code modulation
(ADPCM).
analog
time
ADPCM
3 2 1 -1 -2 -3
time
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ADPCM coding
Coded in a small alphabet with positive
and negative values.
{-3,-2,-1,1,2,3}
Minimize error between predicted value
and actual signal value.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ADPCM compression
system
S
quantizer
integrator
inverse
quantizer
encoder
samples
inverse
quantizer
integrator
decoder
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Telephone system terms
Subscriber line: line to phone.
Central office: telephone switching
system.
Off-hook: phone active.
On-hook: phone inactive.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Real and simulated
subscriber line
Real subscriber line:
90V RMS ringing signal;
companded analog signals;
lightning protection, etc.
Simulated subscriber line:
microphone input;
speaker output;
switches for ring, off-hook, etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components
Requirements
Inputs
Performance
Telephone: voice samples, ring.
User interface: microphone, play
messages button, record OGM button.
Telephone: voice samples, onhook/off-hook command.
User interface: speaker, # messages
indicator, message light.
Default mode: detects ring, signals offhook, pays OGM, records ICM
Playback: play all messages, wait 5
seconds for new playback.
OGM editing: OGM up to 10 sec.
About 30 minutes voice (@ 8kHz).
Manufacturing cost
Consumer product range ($50)
Power
Physical
size/weight
AC plug
Comparable to desk phone.
Outputs
Functions
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Comments on analysis
DRAM requirement influenced by DRAM
price.
Details of user interface protocol could be
tested on a PC-based prototype.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Answering machine class
diagram
1
1
Microphone*
Line-in*
Controls
1
1
1
Speaker*
1
1
1
Playback
1
Buttons*
© 2008 Wayne Wolf
11
1
Line-out*
1
Record
1
1
1
Lights
Overheads for Computers as
Components 2nd ed.
1
* Outgoing* message
*
Incoming* message
Physical interface classes
Microphone*
Line-in*
Line-out*
sample()
sample()
ring-indicator()
sample()
pick-up()
Buttons*
Lights*
record-OGM
play
messages
num-messages
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Speaker*
sample()
Message classes
Message
length
start-adrs
next-msg
samples
Incoming-message
Outgoing-message
msg-time
length=30 sec
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Operational classes
Controls
Record
Playback
operate()
record-msg()
playback-msg()
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Software components
Front panel module.
Speaker module.
Telephone line module.
Telephone input and output modules.
Compression module.
Decompression module.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Controls activate behavior
Compute buttons, line activations
Activations?
Play OGM
Record OGM
Play ICM
Wait for timeout
Erase
Answer
Play OGM
Allocate ICM
Erase
Record ICM
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Record-msg/playback-msg
behaviors
nextadrs = 0
nextadrs = 0
msg.samples[nextadrs] =
sample(source)
speaker.samples() =
msg.samples[nextadrs];
nextadrs++
F
nextadrs=msg.length
F
End(source)
T
T
playback-msg
record-msg
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Hardware platform
CPU.
Memory.
Front panel.
2 A/Ds:
subscriber line, microphone.
2 D/A:
subscriber line, speaker.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Component design and
testing
Must test performance as well as testing.
Compression time shouldn’t dominate other
tasks.
Test for error conditions:
memory overflow;
try to delete empty message set, etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
System integration and
testing
Can test partial integration on host
platform; full testing requires integration
on target platform.
Simulate phone line for tests:
it’s legal;
easier to produce test conditions.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Multiprocessors
Why multiprocessors?
CPUs and accelerators.
Multiprocessor performance analysis.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Why multiprocessors?
Better cost/performance.
Match each CPU to its tasks or use custom
logic (smaller, cheaper).
CPU cost is a non-linear function of
performance.
cost
performance
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Why multiprocessors?
cont’d.
Better real-time performance.
Put time-critical functions on less-loaded
processing elements.
Remember RMS utilization---extra CPU cycles
must be reserved to meet deadlines.
cost
deadline w.
RMS overhead
deadline
performance
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Why multiprocessors?
cont’d.
Using specialized
processors or custom
logic saves power.
Desktop
uniprocessors are not
power-efficient
enough for batterypowered applications.
© 2008 Wayne Wolf
[Aus04] © 2004 IEEE Computer Society
Overheads for Computers as
Components 2nd ed.
Why multiprocessors?
cont’d.
Good for processing I/O in real-time.
May consume less energy.
May be better at streaming data.
May not be able to do all the work on
even the largest single CPU.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerated systems
Use additional computational unit
dedicated to some functions?
Hardwired logic.
Extra CPU.
Hardware/software co-design: joint
design of hardware and software
architectures.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerated system
architecture
request
CPU
accelerator
result
data
data
memory
I/O
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerator vs. coprocessor
A co-processor executes instructions.
Instructions are dispatched by the CPU.
An accelerator appears as a device on the
bus.
The accelerator is controlled by registers.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerator
implementations
Application-specific integrated circuit.
Field-programmable gate array (FPGA).
Standard component.
Example: graphics processor.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
System design tasks
Design a heterogeneous multiprocessor
architecture.
Processing element (PE): CPU, accelerator,
etc.
Program the system.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerated system design
First, determine that the system really
needs to be accelerated.
How much faster is the accelerator on the
core function?
How much data transfer overhead?
Design the accelerator itself.
Design CPU interface to accelerator.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerated system
platforms
Several off-the-shelf boards are available
for acceleration in PCs:
FPGA-based core;
PC bus interface.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerator/CPU interface
Accelerator registers provide control
registers for CPU.
Data registers can be used for small data
objects.
Accelerator may include special-purpose
read/write logic.
Especially valuable for large data transfers.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
System integration and
debugging
Try to debug the CPU/accelerator
interface separately from the accelerator
core.
Build scaffolding to test the accelerator.
Hardware/software co-simulation can be
useful.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Caching problems
Main memory provides the primary data
transfer mechanism to the accelerator.
Programs must ensure that caching does
not invalidate main memory data.
CPU reads location S.
Accelerator writes location S.
CPU writes location S.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
BAD
Synchronization
As with cache, main memory writes to
shared memory may cause invalidation:
CPU reads S.
Accelerator writes S.
CPU reads S.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Multiprocessor
performance analysis
Effects of parallelism (and lack of it):
Processes.
CPU and bus.
Multiple processors.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerator speedup
Critical parameter is speedup: how much
faster is the system with the accelerator?
Must take into account:
Accelerator execution time.
Data transfer time.
Synchronization with the master CPU.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerator execution
time
Total accelerator execution time:
taccel = tin + tx + tout
Data input
Data output
Accelerated
computation
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerator speedup
Assume loop is executed n times.
Compare accelerated system to nonaccelerated system:
S = n(tCPU - taccel)
 = n[tCPU - (tin + tx + tout)]
Execution time on CPU
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Single- vs. multi-threaded
One critical factor is available parallelism:
single-threaded/blocking: CPU waits for accelerator;
multithreaded/non-blocking: CPU continues to
execute along with accelerator.
To multithread, CPU must have useful work to
do.
But software must also support multithreading.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Total execution time
Single-threaded:
Multi-threaded:
P1
P1
P2
A1
P2
P3
P3
P4
P4
© 2008 Wayne Wolf
Overheads for Computers as
Components
A1
Execution time analysis
Single-threaded:
Count execution time
of all component
processes.
© 2008 Wayne Wolf
Multi-threaded:
Find longest path
through execution.
Overheads for Computers as
Components 2nd ed.
Sources of parallelism
Overlap I/O and accelerator computation.
Perform operations in batches, read in
second batch of data while computing on first
batch.
Find other work to do on the CPU.
May reschedule operations to move work
after accelerator initiation.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Data input/output times
Bus transactions include:
flushing register/cache values to main
memory;
time required for CPU to set up transaction;
overhead of data transfers by bus packets,
handshaking, etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Scheduling and allocation
Must:
schedule operations in time;
allocate computations to processing
elements.
Scheduling and allocation interact, but
separating them helps.
Alternatively allocate, then schedule.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: scheduling and
allocation
P1
P2
M1
d1
M2
d2
P3
Task graph
© 2008 Wayne Wolf
Hardware platform
Overheads for Computers as
Components 2nd ed.
First design
Allocate P1, P2 -> M1; P3 -> M2.
M1
P1
P1C
P2
P2C
M2
P3
time
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Second design
Allocate P1 -> M1; P2, P3 -> M2:
M1
P1
M2
P2
P1C
P3
time
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Example: adjusting
messages to reduce delay
Task graph:
3
execution time
3
P1
P2
d1
4
Network:
allocation
M1
d2
P3
Transmission time = 4
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
M2
M3
Initial schedule
M1
P1
M2
P2
M3
P3
network
d1
d2
Time = 15
0
© 2008 Wayne Wolf
5
10
Overheads for Computers as
Components 2nd ed.
15
20 time
New design
Modify P3:
reads one packet of d1, one packet of d2
computes partial result
continues to next packet
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
New schedule
M1
P1
M2
P2
M3
P3 P3 P3 P3
network
d1d2d1d2d1d2d1d2
Time = 12
0
© 2008 Wayne Wolf
5
10
Overheads for Computers as
Components 2nd ed.
15
20 time
Buffering and performance
Buffering may sequentialize operations.
Next process must wait for data to enter
buffer before it can continue.
Buffer policy (queue, RAM) affects
available parallelism.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Buffers and latency
Three processes
separated by buffers:
B1
© 2008 Wayne Wolf
A
B2
B
Overheads for Computers as
Components 2nd ed.
B3
C
Buffers and latency
schedules
A[0]
A[1]
…
B[0]
B[1]
…
C[0]
C[1]
…
© 2008 Wayne Wolf
Must wait for
all of A before
getting any B
A[0]
B[0]
C[0]
A[1]
B[1]
C[1]
…
Overheads for Computers as
Components 2nd ed.
Multiprocessors
Consumer electronics systems.
Cell phones.
CDs and DVDs.
Audio players.
Digital still cameras.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Consumer electronics use
cases
 Multimedia: stored in
compressed form,
uncompressed on
viewing.
 Data storage and
management: keep track
of your multimedia, etc.
 Communication:
download, upload, chat.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Non-functional
requirements for CE
Often battery-operated, strict power
budget.,
Very inexpensive.
User interface must be capable but
inexpensive.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
CE devices and hosts
 Many devices talk to host
system.
PC host does things that
are hard to do on the
device.
Increasingly, CE
devices communicate
directly over the
network, avoiding the
host for access.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Platforms and operating
systems
Many CE devices use
a DSP for signal
processing and a
RISC CPU for other
tasks.
I/O devices include
buttons, screen, USB.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Flash file systems
Flash is widely used for mass storage.
Flash wears out on writing (up to 1 million
cycles).
Directory is most often written, wears out
first.
Flash file system has layer that moves
contents to levelize wear.
Hides wear leveling from API.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Cell phones
Most popular CE
device in history;
most widely used
computing device.
1 billion sold per year.
Handset talks to cell.
Cells hand off
handset as it moves.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Cell phone platforms
 Today’s cell phones use analog
front end, digital baseband
processing.
 Future cell phones will
perform IF processing with
DSP.
 Baseband processing in DSP:
 Voice compression.
 Network protocol.
 Other processing:




Multimedia functions.
User interface.
File system.
Applications (contacts, etc.)
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
CD/MP3 player
Audio
CPU
memory
Jog
memory
display
amp DAC
I2S
© 2008 Wayne Wolf
Error
corrector
Servo
CPU
memory
Analog
out
focus,
tracking,
sled,
Analog
motor
in
FE, TE, amp
Overheads for Computers as
Components 2nd ed.
drive
head
CD medium
Rotational speed: 1.2-1.4 m/s (CLV).
Track pitch: 1.6 microns.
Diameter: 120 mm.
Pit length: 0.8 -3 microns.
Pit depth: .11 microns.
Pit width: 0.5 microns.
Laser wavelength: 780 nm.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CD mechanism
Laser, lens, sled:
CD
focus
track
track
© 2008 Wayne Wolf
diffraction
grating
laser
sled
detectors
Overheads for Computers as
Components 2nd ed.
Laser focus
Focus controlled by vertical position of
lens.
Unfocused beam causes irregular spot:
Out of focus
© 2008 Wayne Wolf
In focus
Overheads for Computers as
Components 2nd ed.
Out of focus
Laser pickup
Side spot
detectors
F
A
B
D
E
© 2008 Wayne Wolf
C
Overheads for Computers as
Components 2nd ed.
Level:
A+B+C+D
Focus error:
(A+C)-(B+D)
Tracking error:
E-F
Servo control
Four main signals:
focus (laser) @ 245 kHz;
tracking (laser) @ 245 kHz;
sled (motor): @ 800 Hz;
Disc motor.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Optical pickup
EFM
Eight-to-fourteen modulation:
Fourteen-bit code guarantees a maximum
distance between transitions.
00000011
© 2008 Wayne Wolf
00100100000000
Overheads for Computers as
Components 2nd ed.
Error correction
CD capacity: 6.99 GB raw, 700 MB formatted.
Reed-Solomon code:
g(x) = (x-a) (x- a2) … (x- an-k-1) (x- an-k)
Produces data, erasure bits.
Time to solve varies greatly depending on noise.
CD interleaves Reed-Solomon blocks to reduce
effects of large data gaps.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Control and error
correction
Skips caused by physical disturbance.
Wait for disturbance to subside.
Retry.
Read errors caused by disc/servo problems.
Detect error.
Choose location for retry.
Retry.
Fail and interpolate.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
MPEG audio standards
Layer 1:
Lossless compression of subbands + optional
simple masking model
Layer 2:
More advanced masking model.
Layer 3:
Additional processing for lower bit rates.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
MPEG audio rates
Input sampling rates:
32, 44.1, 48 kHz.
Output bit rates:
23, 48, 64, 96, 112, 128, 192, 256, 384
kbits/sec.
Output can be mono, dual-channel
(bilingual, etc.), stereo.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Other standards
Dolby Digital (AC-3):
Uses modified discrete cosine transform.
ATRAC (MiniDisc):
Uses subband + modified DCT.
MPEG-2 AAC.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
MPEG Layer 1
384 samples/block at all frequencies.
Equals 8 ms at 48 kHz.
Optional masking model.
Driven by separate FFT for better accuracy.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
MPEG Layer 1 data frame
Bit allocation codes specify word length in
each subband.
Scale factors give gain for each band.
header
CRC
© 2008 Wayne Wolf
bit
allocation
scale
factors
subband samples
Overheads for Computers as
Components 2nd ed.
aux
data
MPEG Layer 1 encoder
Choose
Scale factor
Filter
bank
mux
*
requantize
0101..
FFT
© 2008 Wayne Wolf
Masking
model
Overheads for Computers as
Components 2nd ed.
MPEG Layer 1 decoder
Scale
factor
demux
0101..
inverse
quantize
*
*
expand
Step
size
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Inverse
filter
bank
MP3
Decoding is easier than encoding, but
requires:
decompression;
filtering.
Basic CD standard for data discs.
No standards for MP3 disc file structure:
player must understand Windows, Mac,
Unix discs.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Audio players
 Audio players may use
flash, hard disk, or CD for
mass storage.
 Decompression requires
small amount of CPU:
10% of ARM7.
 File system must be
compatible (FAT).
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Digital still cameras
DSC must determine
exposure before
taking picture.
After taking picture:
Improve image
quality.
Compress.
Save as file.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Digital still camera
architecture
 DSC uses CPU for
general-purpose
processing, DSP for
image processing.
 Internal memory buffers
the passes on the image.
 Display is lower
resolution than image
sensor.
Image must be
downsampled.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Image capture
Before taking picture:
Determine exposure.
Determine focus.
Optimize white
balance.
Bayer pattern
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Image processing
Must perform basic processing to get
usable picture:
Bayer->RGB interpolation.
DSCs perform many functions formerly
performed by photoprocessors for film:
Image sharpening.
Color balance.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
File management
EXIF standard gives format for digital
pictures:
Format of data in a file.
Directory structure.
EXIF file includes:
Image (JPEG, etc.)
Thumbnail.
Metadata (camera type, date/time, etc.)
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerators
Example: video accelerator
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Concept
Build accelerator for block motion
estimation, one step in video
compression.
Perform two-dimensional correlation:
f2
f2f2
f2f2f2f2f2f2f2
Frame 1
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Block motion estimation
MPEG divides frame into 16 x 16
macroblocks for motion estimation.
Search for best match within a search
range.
Measure similarity with sum-of-absolutedifferences (SAD):
S | M(i,j) - S(i-ox, j-oy) |
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Best match
Best match produces motion vector for
motion block:
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Full search algorithm
bestx = 0; besty = 0;
bestsad = MAXSAD;
for (ox = - SEARCHSIZE; ox < SEARCHSIZE; ox++) {
for (oy = -SEARCHSIZE; oy < SEARCHSIZE; oy++) {
int result = 0;
for (i=0; i<MBSIZE; i++) {
for (j=0; j<MBSIZE; j++) {
result += iabs(mb[i][j] - search[iox+XCENTER][j-oy-YCENTER]);
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Full search algorithm,
cont’d.
}
}
if (result <= bestsad) { bestsad = result; bestx =
ox; besty = oy; }
}
}
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Computational
requirements
Let MBSIZE = 16, SEARCHSIZE = 8.
Search area is 8 + 8 + 1 in each
dimension.
Must perform:
nops = (16 x 16) x (17 x 17) = 73984 ops
CIF format has 352 x 288 pixels -> 22 x
18 macroblocks.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerator requirements
name
block motion estimator
purpose
block motion est. in PC
inputs
macroblocks, search areas
outputs
motion vectors
functions
performance
compute motion vectors with
full search
as fast as possible
manufacturing cost
hundreds of dollars
power
from PC power supply
physical size/weight
PCI card
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Accelerator data types,
basic classes
Motion-vector
Macroblock
Search-area
x, y : pos
pixels[] : pixelval
pixels[] : pixelval
PC
Motion-estimator
memory[]
compute-mv()
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Sequence diagram
:PC
:Motion-estimator
compute-mv()
Search area
memory[]
memory[]
macroblocks
© 2000 Morgan
Kaufman
memory[]
Overheads for Computers as
Components 2nd ed.
Architectural
considerations
Requires large amount of memory:
macroblock has 256 pixels;
search area has 1,089 pixels.
May need external memory (especially if
buffering multiple macroblocks/search
areas).
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
© 2008 Wayne Wolf
network
PE 0
PE 1
comparator
ctrl
...
network
macroblock
Address
generator
search area
Motion estimator
organization
Motion
vector
PE 15
Overheads for Computers as
Components 2nd ed.
Pixel schedules
PE 0
PE 1
PE 2
|M(0,0)-S(0,0)|
M(0,0)
|M(0,1)-S(0,1)|
|M(0,0)-S(0,1)|
|M(0,2)-S(0,2)|
|M(0,1)-S(0,2)|
|M(0,0)-S(0,2)|
S(0,2)
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
System testing
Testing requires a large amount of data.
Use simple patterns with obvious answers
for initial tests.
Extract sample data from JPEG pictures
for more realistic tests.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Networking for Embedded
Systems
Why we use networks.
Network abstractions.
Example networks.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Network elements
distributed computing platform:
PE
PE
communication link
network
PE
PEs may be CPUs or ASICs.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Networks in embedded
systems
initial processing
more processing
PE
sensor
PE
actuator
PE
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Why distributed?
Higher performance at lower cost.
Physically distributed activities---time
constants may not allow transmission to
central site.
Improved debugging---use one CPU in
network to debug others.
May buy subsystems that have embedded
processors.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Network abstractions
International Standards Organization
(ISO) developed the Open Systems
Interconnection (OSI) model to describe
networks:
7-layer model.
Provides a standard way to classify
network components and operations.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
OSI model
application
end-use interface
presentation
data format
session
© 2008 Wayne Wolf
application dialog control
transport
connections
network
end-to-end service
data link
reliable data transport
physical
mechanical, electrical
Overheads for Computers as
Components 2nd ed.
OSI layers
Physical: connectors, bit formats, etc.
Data link: error detection and control
across a single link (single hop).
Network: end-to-end multi-hop data
communication.
Transport: provides connections; may
optimize network resources.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
OSI layers, cont’d.
Session: services for end-user
applications: data grouping,
checkpointing, etc.
Presentation: data formats,
transformation services.
Application: interface between network
and end-user programs.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Hardware architectures
Many different types of networks:
topology;
scheduling of communication;
routing.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Point-to-point networks
One source, one or more destinations, no
data switching (serial port):
PE 1
PE 2
link 1
© 2008 Wayne Wolf
PE 3
link 2
Overheads for Computers as
Components 2nd ed.
Bus networks
Common physical connection:
PE 1
header
© 2008 Wayne Wolf
PE 2
address
PE 3
data
PE 4
ECC
Overheads for Computers as
Components 2nd ed.
packet format
Bus arbitration
Fixed: Same order of resolution every
time.
Fair: every PE has same access over long
periods.
round-robin: rotate top priority among Pes.
fixed
round-robin
A
B
C
A
B
C
A
B
C
B
C
A
A,B,C
© 2008 Wayne Wolf
A,B,C
Overheads for Computers as
Components 2nd ed.
Crossbar
out4
out3
out2
out1
in1
© 2008 Wayne Wolf
in2
in3
in4
Overheads for Computers as
Components 2nd ed.
Crossbar characteristics
Non-blocking.
Can handle arbitrary multi-cast
combinations.
Size proportional to n2.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Multi-stage networks
Use several stages of switching elements.
Often blocking.
Often smaller than crossbar.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Message-based
programming
Transport layer provides message-based
programming interface:
send_msg(adrs,data1);
Data must be broken into packets at
source, reassembled at destination.
Data-push programming: make things
happen in network based on data
transfers.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
I2C bus
Designed for low-cost, medium data rate
applications.
Characteristics:
serial;
multiple-master;
fixed-priority arbitration.
Several microcontrollers come with builtin I2C controllers.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
I2C physical layer
master 1
data line
SDL
clock line
SCL
slave 1
© 2008 Wayne Wolf
master 2
Overheads for Computers as
Components 2nd ed.
slave 2
I2C data format
SCL
...
SDL
...
start
© 2008 Wayne Wolf
MSB
Overheads for Computers as
Components 2nd ed.
...
ack
I2C electrical interface
Open collector interface:
+
SDL
+
SCL
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
I2C signaling
Sender pulls down bus for 0.
Sender listens to bus---if it tried to send a
1 and heard a 0, someone else is
simultaneously transmitting.
Transmissions occur in 8-bit bytes.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
I2C data link layer
Every device has an address (7 bits in
standard, 10 bits in extension).
Bit 8 of address signals read or write.
General call address allows broadcast.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
I2C bus arbitration
Sender listens while sending address.
When sender hears a conflict, if its
address is higher, it stops signaling.
Low-priority senders relinquish control
early enough in clock cycle to allow bit to
be transmitted reliably.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
I2C transmissions
multi-byte write
S
adrs
0
data
data
1
data
P
0
data
S
P
read from slave
S
adrs
write, then read
S
adrs
© 2008 Wayne Wolf
adrs
Overheads for Computers as
Components 2nd ed.
1
data
P
Ethernet
Dominant non-telephone LAN.
Versions: 10 Mb/s, 100 Mb/s, 1 Gb/s
Goal: reliable communication over an
unreliable medium.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Ethernet topology
Bus-based system, several possible
physical layers:
A
© 2008 Wayne Wolf
B
Overheads for Computers as
Components 2nd ed.
C
CSMA/CD
Carrier sense multiple access with collision
detection:
sense collisions;
exponentially back off in time;
retransmit.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Exponential back-off times
time
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Ethernet packet format
preamble
start
frame
© 2008 Wayne Wolf
source
adrs
dest
data
length
padding CRC
adrs
payload
Overheads for Computers as
Components 2nd ed.
Ethernet performance
Quality-of-service tends to non-linearly
decrease at high load levels.
Can’t guarantee real-time deadlines.
However, may provide very good service
at proper load levels.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Fieldbus
Used for industrial control and
instrumentation---factories, etc.
H1 standard based on 31.25 MB/s twisted
pair medium.
High Speed Ethernet (HSE) standard
based on 100 Mb/s Ethernet.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
System design techniques
Design methodologies.
Requirements and specification.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Design methodologies
Process for creating a system.
Many systems are complex:
large specifications;
multiple designers;
interface to manufacturing.
Proper processes improve:
quality;
cost of design and manufacture.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Product metrics
Time-to-market:
beat competitors to market;
meet marketing window (back-to-school).
Design cost.
Manufacturing cost.
Quality.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Mars Climate Observer
Lost on Mars in September 1999.
Requirements problem:
Requirements did not specify units.
Lockheed Martin used English; JPL wanted
metric.
Not caught by manual inspections.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Design flow
Design flow: sequence of steps in a
design methodology.
May be partially or fully automated.
Use tools to transform, verify design.
Design flow is one component of
methodology. Methodology also includes
management organization, etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Waterfall model
Early model for software development:
requirements
architecture
coding
testing
maintenance
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Waterfall model steps
Requirements: determine basic
characteristics.
Architecture: decompose into basic
modules.
Coding: implement and integrate.
Testing: exercise and uncover bugs.
Maintenance: deploy, fix bugs, upgrade.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Waterfall model critique
Only local feedback---may need iterations
between coding and requirements, for
example.
Doesn’t integrate top-down and bottomup design.
Assumes hardware is given.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Spiral model
system feasibility
specification
prototype
initial system
enhanced system
design
requirements
test
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Spiral model critique
Successive refinement of system.
Start with mock-ups, move through simple
systems to full-scale systems.
Provides bottom-up feedback from
previous stages.
Working through stages may take too
much time.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Successive refinement
model
specify
specify
architect
architect
design
design
build
build
test
initial system
© 2008 Wayne Wolf
test
refined system
Overheads for Computers as
Components 2nd ed.
Hardware/software design
flow
requirements and
specification
architecture
software design
hardware design
integration
testing
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Co-design methodology
Must architect hardware and software
together:
provide sufficient resources;
avoid software bottlenecks.
Can build pieces somewhat
independently, but integration is major
step.
Also requires bottom-up feedback.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Hierarchical design flow
Embedded systems must be designed
across multiple levels of abstraction:
system architecture;
hardware and software systems;
hardware and software components.
Often need design flows within design
flows.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Hierarchical HW/SW flow
spec
spec
spec
architecture
HWSW
architecture
architecture
HW
SW
detailed
detailed
design
design
integrate
integration
integration
test
testtest
system
© 2008 Wayne Wolf
hardware
software
Overheads for Computers as
Components 2nd ed.
Concurrent engineering
Large projects use many people from
multiple disciplines.
Work on several tasks at once to reduce
design time.
Feedback between tasks helps improve
quality, reduce number of later design
problems.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Concurrent engineering
techniques
Cross-functional teams.
Concurrent product realization.
Incremental information sharing.
Integrated product management.
Supplier involvement.
Customer focus.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
AT&T PBX concurrent
engineering
Benchmark against competitors.
Identify breakthrough improvements.
Characterize current process.
Create new process.
Verify new process.
Implement.
Measure and improve.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Requirements analysis
Requirements: informal description of
what customer wants.
Specification: precise description of what
design team should deliver.
Requirements phase links customers with
designers.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Types of requirements
Functional: input/output relationships.
Non-functional:
timing;
power consumption;
manufacturing cost;
physical size;
time-to-market;
reliability.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Good requirements
Correct.
Unambiguous.
Complete.
Verifiable: is each requirement satisfied in
the final system?
Consistent: requirements do not
contradict each other.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Good requirements, cont’d.
Modifiable: can update requirements
easily.
Traceable:
know why each requirement exists;
go from source documents to requirements;
go from requirement to implementation;
back from implementation to requirement.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Setting requirements
Customer interviews.
Comparison with competitors.
Sales feedback.
Mock-ups, prototypes.
Next-bench syndrome (HP): design a
product for someone like you.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Specifications
Capture functional and non-functional
properties:
verify correctness of spec;
compare spec to implementation.
Many specification styles:
control-oriented vs. data-oriented;
textual vs. graphical.
UML is one specification/design language.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
SDL
Used in
telecommunications
protocol design.
Event-oriented state
machine model.
telephone
on-hook
caller goes
off-hook
dial tone
caller gets
dial tone
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Statecharts
Ancestor of UML state diagrams.
Provided composite states:
OR states;
AND states.
Composite states reduce the size of the
state transition graph.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Statechart OR state
s123
i1
i1
S1
S1
i2
i1
S2
i2
i1
S4
S2
i2
S3
S3
traditional
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
OR state
i2
S4
Statechart AND state
sab
c
S1-3
S1
S1-4
S3
d
a
b
b
a
b
a
c
d
c
S2-3
S2
S2-4
d
r
r
r
S5
AND state
traditional
© 2008 Wayne Wolf
S4
Overheads for Computers as
Components 2nd ed.
S5
AND-OR tables
Alternate way of specifying complex
conditions:
cond1 or (cond2 and !cond3)
OR
AND
cond1
cond2
cond3
© 2008 Wayne Wolf
T
-
T
F
Overheads for Computers as
Components 2nd ed.
TCAS II specification
TCAS II: aircraft collision avoidance
system.
Monitors aircraft and air traffic info.
Provides audio warnings and directives to
avoid collisions.
Leveson et al used RMSL language to
capture the TCAS specification.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
RMSL
State description:
state1
Transition bus for
transitions between
many states:
inputs
a
state description
b
c
outputs
© 2008 Wayne Wolf
d
Overheads for Computers as
Components 2nd ed.
TCAS top-level description
CAS
power-on
Inputs:
power-off
TCAS-operational-status {operational,not-operational}
fully-operational
own-aircraft
other-aircraft i:[1..30]
mode-s-ground-station i:[1..15]
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
C
standby
Own-Aircraft AND state
CAS
Inputs:
own-alt-radio: integer standby-discrete-input: {true,false}
own-alt-barometric:integer, etc.
Effective-SL
Alt-SL
1
1
2
2
...
...
7
7
Alt-layer
Climb-inibit
...
Descend-inibit
...
...
Increase-Descend-inibit ...
Increase-climb-inibit
...
Advisory-Status
Outputs:
sound-aural-alarm: {true,false} aural-alarm-inhibit: {true, false}
combined-control-out: enumerated, etc.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
...
CRC cards
Well-known method for analyzing a
system and developing an architecture.
CRC:
classes;
responsibilities of each class;
collaborators are other classes that work with
a class.
Team-oriented methodology.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CRC card format
Class name:
Superclasses:
Subclasses:
Responsibilities: Collaborators:
Class name:
Class’s function:
Attributes:
front
© 2008 Wayne Wolf
back
Overheads for Computers as
Components 2nd ed.
CRC methodology
Develop an initial list of classes.
Simple description is OK.
Team members should discuss their choices.
Write initial responsibilities/collaborators.
Helps to define the classes.
Create some usage scenarios.
Major uses of system and classes.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CRC methodology, cont’d.
Walk through scenarios.
See what works and doesn’t work.
Refine the classes, responsibilities, and
collaborators.
Add class relatoinships:
superclass, subclass.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CRC cards for elevator
Real-world classes:
elevator car, passenger, floor control, car
control, car sensor.
Architectural classes: car state, floor
control reader, car control reader, car
control sender, scheduler.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Elevator responsibilities
and collaborators
class
responsibilities
Elevator car*
Move up and down Car control, car
sensor, car control
sender
Transmits car
Passenger, floor
requests
control reader
Car control*
Car state
© 2008 Wayne Wolf
Reads current
position of car
Overheads for Computers as
Components 2nd ed.
collaborators
Scheduler, car
sensor
System design techniques
Quality assurance.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Quality assurance
Quality judged by how well product
satisfies its intended function.
May be measured in different ways for
different kinds of products.
Quality assurance (QA) makes sure that
all stages of the design process help to
deliver a quality product.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Therac-25 Medical Imager
(Leveson and Turner)
Six known accidents: radiation overdoses
leading to death and serious injury.
Radiation gun controlled by PDP-11.
Four major software components:
stored data;
scheduler;
set of tasks;
interrupt services.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Therac-25 tasks
Treatment monitor controlled and
monitored setup and delivery of treatment
in eight phases.
Servo task controlled radiation gun.
Housekeeper task took care of status
interlocks and limit checks.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Treatment monitor task
Treat was main monitor task.
Eight subroutines.
Treat rescheduled itself after every
subroutine.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Software timing race
Timing-dependent use of mode and
energy:
if keyboard handler sets completion behavior
before operator changes mode/energy data,
Datent task will not detect the change, but
Hand task will.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Software timing errors
Changes to parameters made by operator
may show on screen but not be sensed by
Datent task.
One accident caused by entering
mode/energy, changing mode/energy,
returning to command line in 8 seconds.
Skilled operators typed faster, more likely
to exercise bug.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Leveson and Turner
observations
Performed limited safety analysis:
guessed at error probabilities, etc.
Did not use mechanical backups to check
machine operation.
Used overly complex programs written in
unreliable styles.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
ISO 9000
Developed by International Standards
organization.
Applies to a broad range industries.
Concentrates on process.
Validation based on extensive
documentation of organization’s process.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
CMU Capability Maturity
Model
Five levels of organizational maturity:
Initial: poorly organized process, depends on
individuals.
Repeatable: basic tracking mechanisms.
Defined: processes documented and
standardized.
Managed: makes detailed measurements.
Optimizing: measurements used for
improvement.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Verification
cost to fix
Verification and testing are important
throughout the design flow.
Early bugs are more expensive to fix:
© 2008 Wayne Wolf
requirements
bug coding bug
Overheads for Computers as
Components 2nd ed.
time
Verifying requirements and
specification
Requirements:
prototypes;
prototyping languages;
pre-existing systems.
Specifications:
usage scenarios;
formal techniques.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Design review
Uses meetings to catch design flaws.
Simple, low-cost.
Proven by experiments to be effective.
Use other people in the project/company
to help spot design problems.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Design review players
Designers: present design to rest of team,
make changes.
Review leader: coordinates process.
Review scribe: takes notes of meetings.
Review audience: looks for bugs.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Before the design review
Design team prepares documents used to
describe the design.
Leader recruits audience, coordinates
meetings, distributes handouts, etc.
Audience members familiarize themselves
with the documents before they go to the
meeting.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Design review meeting
Leader keeps meeting moving; scribe
takes notes.
Designers present the design:
use handouts;
explain what is going on;
go through details.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Design review audience
Look for any problems:
Is the design consistent with the
specification?
Is the interface correct?
How well is the component’s internal
architecture designed?
Did they use good design/coding practices?
Is the testing strategy adequate?
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Follow-up
Designers make suggested changes.
Document changes.
Leader checks on results of changes, may
distribute to audience for further review
or additional reviews.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Measurements
Measurements help ground our beliefs:
Do our practices really work?
Do they work where we think they work?
Types of measurements:
bugs found at different stages of design;
bugs as a function of time;
bugs in different types of components;
how bugs are found.
© 2008 Wayne Wolf
Overheads for Computers as
Components 2nd ed.
Download