Final Review ECE 455/555 Embedded System Design Wei Gao

advertisement
ECE 455/555
Embedded System Design
Final Review
Wei Gao
Fall 2015
1
Final Exam




When: 12/8 2:45pm-4:45pm
Where: Min Kao 406
20% of your final grade
What about: Everything covered in this course




Closed-book, closed-notes, no laptop, no discussion
All included in class slides (as well as textbook, assigned papers)
Will include 1 programming question
Make your answers short to include only key points
• Do answer every question, DON’T leave blank
 Final project report due on 12/8 before final exam
 15% of your final grade
 Remember to submit your SAIS review to receive 5% bonus
credit for your final project report!
ECE 455/555 Embedded System Design
2
First half of the semester
 Introduction to embedded systems
 Real-time, low power, small memory footprint, low cost
 Soft/hard real-time system
 Design methodology
 Microprocessors, FPGA and ASIC
 Design procedure and example: GPS
 Microprocessors
 von Neumann vs. Harvard, RISC vs. CISC, SHARC, ARM7
 CPUs, I/O, interrupt
 Busy-wait I/O, interrupt-based I/O, interrupt mechanism
 Caches and memory
 Memory system, average memory access time in multi-level cache
 Cache organization: direct-mapped, N-way set-associative
 Embedded computing platforms
 I/O devices, hardware/software architecture, state machine, testing
ECE 455/555 Embedded System Design
3
Definition
 Embedded system: any device that includes a
computer but is not itself a general-purpose
computer.
 System characteristics
 Non-functional requirements: Real-time, Low power, Small
memory footprint, Low cost
 Hard vs. soft real-time
ECE 455/555 Embedded System Design
4
Alternative Technology
 Application-Specific Integrated Circuits (ASICs)
 Microprocessors
 Field-Programmable Gate Arrays (FPGAs)
 Why should we use microprocessors?
 Reprogrammability and low development cost >> low
performance/watt
ECE 455/555 Embedded System Design
5
Microprocessors
 Performance
 Con: Programmable architecture is fundamentally slow!
• Fetch, decode instructions
 Pro: Highly optimized architecture and manufacturing
• Pipelines; cache; clock frequency; circuit density; manufacturing
technology
 Power
 Processors perform poorly in terms of performance/watt!
 Power management can alleviate the power problem.
 Flexibility, development cost and time
 Let software do the work!
ECE 455/555 Embedded System Design
6
Design Methodologies
requirements
specification
Top-down design
architecture
component
design
Bottom-up design
system
integration
ECE 455/555 Embedded System Design
7
Microprocessors
 von Neumann
 Same memory holds data, instructions.
 A single set of address/data buses between CPU and
memory
 Harvard
 Separate memories for data and instructions.
 Two sets of address/data buses between CPU and memory
 RISC vs. CISC
 CISC: Many addressing modes and instructions; High code
density.
 RISC: Compact, uniform instructions: facilitate pipelining,
poor memory footprint
ECE 455/555 Embedded System Design
8
Busy-Wait I/O Programming
 Simplest way to program I/O devices.




Devices are usually slower than CPU and require more cycles
CPU has to wait for device to finish before starting next one
Use peek instruction to test when device is finished
Test-and-set
//send a string to device using Busy-Wait handshaking
current_char = mystring;
while (*current_char != ‘\0’) {
//send character to device (data register)
poke(OUT_CHAR,*current_char);
//wait for device to finish by checking its status
while (peek(OUT_STATUS) != 0);
//advance character pointer to next one
current_char++;
}
ECE 455/555 Embedded System Design
9
Interrupt-based I/O
 Busy-wait is very inefficient.
 CPU can’t do other work while testing device.
 Hard to do simultaneous I/O.
 Interrupts allow to change the flow of control in the CPU.
 Call interrupt handler (i.e. device driver) to handle device.
CPU
PC
IR
interrupt request
interrupt ack
data/address
status
register
Device
mechanism
data
register
ECE 455/555 Embedded System Design
10
Microprocessor Bus
 Bus is a set of wires and a protocol for the CPU to
communicate with memory and devices
 Five major components to support reads and writes
Device 2
Device 1
a
CPU
n
Clock
R/W’
Address
Data ready’
Data
Memory
ECE 455/555 Embedded System Design
11
Typical Bus Access
 Timing diagram syntax:
 Tri-state: Constant value (0/1), stable, changing, unknown.
Clock
R/W’
Address
enable
Address
Data
Ready
data
read
ECE 455/555 Embedded System Design
write
time
12
Memory System and Caches
 Memory is slower than CPU
 CPU clock rates increase faster than memory
 Caches are used to speed up memory
 Cache is a small but fast memory that holds copies of the
contents of main memory
 More expensive than main memory, but faster
 Memory Management Units (MMU)
 Memory size is not large enough for all application?
 Provide a larger virtual memory than physical memory
ECE 455/555 Embedded System Design
13
Memory Devices
 Types of memory devices
 RAM (Random-Access Memory)
• Address can be read in any order, unlike magnetic disk/tape
• Usually used for data storage
• DRAM vs. SRAM.
 ROM (Read-Only Memory)
• Usually used or program storage
• Mask-programmed vs. field-programmable.
ECE 455/555 Embedded System Design
14
TinyOS System
Support concurrency: event-driven architecture
Modularity: application = scheduler + graph of components
 Compiled into one executable
 Efficiency: Get done quickly and sleep
 Event/command = function calls
 Fewer context switches: FIFO/non-preemptable scheduling
 No kernel/application boundary: completely open-source


Main (includes Scheduler)
Application (User Components)
Actuating
Sensing
Communication
Communication
Hardware Abstractions
Modified from D. Culler et. Al., TinyOS boot camp presentation, Feb 2001
ECE 455/555 Embedded System Design
15
TinyOS Programming Model: nesC
 Component model
 An application consists of
Application
Component
D
wired components
Component
A
 Application = graph of
components
 Components are wired
through interfaces
 Wiring specified by
configurations
configuration
 Configuration can be
hierarchical
Component
C
Component
B
Component
F
Component
E
ECE 455/555 Embedded System Design
configuration
16
TinyOS Programming Model: nesC
 Interface: events vs. commands
 command needs to implemented by components
providing the interface
 event needs to be handled by components using the
interface
Interface Receive
{
event message_t * Receive(message_t * msg, void * payload, uint8_t len);
command void * getPayload(message_t * msg, uint8_t * len);
command uint8_t payloadLength(message_t * msg);
}
ECE 455/555 Embedded System Design
17
Second Half of the semester
 Program optimizations
 Power management
 Operating systems
 Real-time scheduling
ECE 455/555 Embedded System Design
18
Basic Compilation Optimization
 Expression simplification
 Dead code elimination
 Function inlining
 Loop optimizations
 Array conflicts in cache
 Register allocation
ECE 455/555 Embedded System Design
19
Function inlining
int foo(a,b,c) { return a + b - c;}
z = foo(w,x,y);

z = w + x - y;
 An inline function’s body is inserted directly (like a
substitution) in the compiled code at the point where the
function is called.
 Improve performance by reducing function call overhead
 “inline” in different cases
 TinyOS does whole-program inlining
ECE 455/555 Embedded System Design
20
Loop Optimizations
 Loops are good targets for optimization.
 Basic loop optimizations:
 Code motion;
 Reduce loop overhead: loop unrolling
 Increase opportunities for pipelining and parallelism: loop
fusion
ECE 455/555 Embedded System Design
21
Register Allocation
 Processor registers
 A very small amount of very fast computer memory
 Used to speed the execution of computer programs
 Provides quick access to most commonly used values
 Memory hierarchy: register – cache – main memory – disk
 Reduce the number of used registers
 Fit more frequently used variables in registers
 Load once, use many times
ECE 455/555 Embedded System Design
22
Register Lifetime Graph
no. of needed registers = 5
1. w = a + b;
2. x = c + w;
3. y = c + d;
4. z = a - b;
a
b
c
d
w
x
y
z
1
2
3
4
means this variable should
be loaded to register
ECE 455/555 Embedded System Design
23
After Rescheduling
no. of needed registers = 4
1. w = a + b;
2. z = a - b;
3. x = c + w;
4. y = c + d;
a
b
c
d
w
x
y
z
1
2
3
4
Cannot change dependencies between instructions!
ECE 455/555 Embedded System Design
24
Power Management
 Hardware support
 CMOS features: voltage drops, toggling, leakage
 Clock gating, supply shutdown, dynamic voltage scaling
 Power management policy




Dynamic power management
Power state machine, break-even time TBE
Energy saving calculation based on a known idle time
Predictive techniques
• Metrics of prediction quality: safety and efficiency
• Fixed timeout vs. predictive shutdown/wakeup
 Power manager
 Advanced Configuration and Power Interface (ACPI)
 Holistic approach
 Memory system, cache behavior
ECE 455/555 Embedded System Design
25
Break-Even Time TBE
 TBE of an inactive state is the total time for entering and
leaving the state
 Assumption: transition doesn’t cause extra power consumption
 TBE = TTR = TOn,Off + TOff,On
 Ex. TBE = 160 ms + 90 µs for SLEEP in SA-1100
Prun = 400 mW
run
10 µs
10 µs
idle
Pidle = 50 mW
90 µs
90 µs
160 ms
Power consumption during
transition ≈ Prun
sleep
Psleep = 0.16 mW
ECE 455/555 Embedded System Design
26
Energy Saving Calculation
 Given an idle period Tidle > TBE
 ES(Tidle) = (Tidle - TTR)(POn - POFF) + TTR(POn – PTR)
• POn > PTR: total = idle saving + transition saving
• POn < PTR: total = idle saving - transition cost
 Achievable power saving depends on workload!
 Distribution of idle periods
ECE 455/555 Embedded System Design
27
Operating Systems
 OS: manages multiple, concurrent tasks
 Engine control, sensor motes
 Process
 Co-routines methodology, co-operative multitasking,
preemptive multitasking
 Context switch
 Process states and scheduling
 Inter-process communication
 Shared memory
 Race conditions
 Examples
 TinyOS, POSIX
 Real-Time OS
 Proprietary kernels, real-time extensions to general-purpose OS
ECE 455/555 Embedded System Design
28
Cooperative Multitasking
 Improvement to co-routines:
 hides context switching mechanism;
 still relies on processes to voluntarily give up CPU.
 Each process allows a context switch at cswitch() call.
 Separate scheduler chooses which process runs next.
if (x > 2)
sub1(y);
else
sub2(y, 2);
cswitch();
proca(a, b, c);
Process 1
Student A
save_state(current);
p = choose_process();
load_and_go(p);
Scheduler
TA
ECE 455/555 Embedded System Design
proc_data(r, s, t);
cswitch();
If (val1 == 3)
abc(val2);
rst(val3);
Process 2
Student B
29
Preemptive Multitasking
 No more voluntary release of CPU
 Operating System (OS) is now in charge
Timer
 Most powerful form of multitasking:
interrupt
 OS controls when context switches;
 OS determines what process runs next.
 Use periodic timer interrupts to call OS to
switch contexts
interrupt
P1
OS
CPU
interrupt
P1
OS
P2
Flow of control with preemption
ECE 455/555 Embedded System Design
time
30
Process States
 A process can be in one of three states:
 executing on the CPU;
 ready to run;
 waiting for data.
executing
gets
CPU
Scheduler
preempted
needs
data
gets data
and CPU
gets data
ready
waiting
needs data
ECE 455/555 Embedded System Design
31
Shared Memory and Problems
 Process 1 and 2 take turn to execute on the CPU
 Problem when two processes try to write the shared memory
location: Race condition




process 1 reads flag and sees 0.
process 2 reads flag and sees 0.
process 1 sets flag to one and writes location.
process 2 sets flag to one and overwrites the same location.
if (flag == 0)
/* preempted*/
flag=1; loc=var;
/* preempted*/
print(loc);
var = 5;
process 1
if (flag == 0)
flag=1; loc=var;
memory
var = 2;
process 2
ECE 455/555 Embedded System Design
if (flag == 0)
/* preempted*/
flag=1; loc=var;
/* preempted*/
32
Race Conditions

Conditions for race conditions to happen
 Concurrent processes/tasks access shared variables.
 Preemption/interruption at a “wrong” time.
Atomic section: section of code that cannot be interrupted by another
process.
 Critical section: section of code that must not be concurrently accessed by
more than one thread of execution.



Mutual exclusion
Prevent race conditions
 Atomic section
 semaphores
ECE 455/555 Embedded System Design
33
Real-time Scheduling
 Terminologies and timing parameters
 Task, job, subtask
 Metrics to evaluate scheduling algorithms
 Schedulability, overhead
 Optimal scheduling algorithms
 When relative deadline = period: RMS, EDF, utilization bound
 When relative deadline < period: EDF, processor demand analysis
 CPU utilization analysis and bound
 Priority inversion
 Sources, unbounded priority inversion, priority inheritance
 End-to-end scheduling framework
 Task allocation: bin packing
 Synchronization protocol: greedy protocol, release guard
 Subdeadline assignment: ultimate deadline, proportional deadline
ECE 455/555 Embedded System Design
34
RMS Meeting the Deadline
 T1 = (10,20), T2 = (10,30), utilization is 83%
T1_1
T1_2
1
T2_1
T2_2
2
T1_1
T1_2
T2_1
T2_2
Job1 of T2 meets its deadline
ECE 455/555 Embedded System Design
35
EDF Meeting a Deadline
 T1 = (10,20), T2 = (15,30), utilization is 100%
T1_1
T1_2
1
T2_1
T2_2
2
T1_1
T1_2
T2_1
T2_2
T2 takes priority because its
deadline is sooner
ECE 455/555 Embedded System Design
36
Priority Inversion
critical section
T1 blocked!
1
4
0
1
4
2
4
1
4
6
8
10
4
12
14
16
18
20
22
T1 tries to get the same semaphore
T4 preempted by T1
T4 acquires a semaphore
T4 starts to run
ECE 455/555 Embedded System Design
37
Unbounded Priority Inversion
critical section
T1 blocked by 4,2,3!
1
1
1
2
3
4
0
4
2
4
4
4
6
8
10
12
14
16
4
18
20
22
T1 tries to get the semaphore
ECE 455/555 Embedded System Design
38
Solution: Priority Inheritance
 Let the low-priority task inherit the priority of the
blocked high-priority task.
critical section
T1 only blocked by 4
1
1
1
3
2
4
0
4
2
4
4
4
6
8
10
12
14
16
18
20
22
T4 returns to priority 4 after the critical section
T1 tries to get semaphore so T4 inherits T1’s priority
ECE 455/555 Embedded System Design
39
Multi-Processor Systems
 Tight coupling among processors.
 Communicate through shared memory and on-board
bus.
 Scheduled by a common scheduler/OS.
 Global scheduling
 Partitioned scheduling
 States of all processors available to each other.
ECE 455/555 Embedded System Design
40
End-to-End Task Model
 An (end-to-end) task is composed of
multiple subtasks running on multiple
processors
 Message/event
 Remote method invocation
 Subtasks are subject to precedence
constraints
 Task = a chain/tree/graph of subtasks
 E.g. ship navigation
Sonar
Signal
processing
ECE 455/555 Embedded System Design
Obstacle
detection
Navigation
41
End-to-End Scheduling Framework
1.
2.
3.
4.
Task allocation
Synchronization protocol
Subdeadline assignment
Schedulability analysis
ECE 455/555 Embedded System Design
Greedy Protocol
 After a subtask is finished, the next subtask starts
immediately
 Release job Ji,j;k as soon as Ji,j-1;k is completed
 Subsequent subtasks may not be periodic under a
greedy protocol
 Difficult for schedulability analysis
 High-priority tasks arrive early  high worst-case response
time for lower-priority tasks
Sonar
Signal
processing
Obstacle
detection
ECE 455/555 Embedded System Design
Navigation
Greedy Protocol
Illustrated
 Task: (C,P)
T1 (2,4)
T2,2 (2,6)
P1
P2
T2,1 (2,6)
T3 (4,7)
P1
P2
T1
2
4
6
8
10
12
2
4
6
8
10
12
2
4
6
8
10
12
2
4
8
10
12
T3’s deadline
T2,1
On P1
On P2
T2,2
T3
6
T3 starts here
ECE 455/555 Embedded System Design
T3
misses
deadline
Release Guard
 After a subtask is finished, the next subtask may wait for a
while before release
 Every subtask (if not a first subtask) has a release guard, which
 waits for the preceding subtask for a result/event
 then releases the job
• at the point of exact one period from the last release time (Rule1)
OR
• whenever the processor becomes idle (Rule 2)
 Release guard strategy improves worst response time without
affecting schedulability
ECE 455/555 Embedded System Design
Release Guard
Illustrated
 Task: (C,P)
T1 (2,4)
T2,2 (2,6)
P1
P2
T2,1 (2,6)
T3 (4,7)
P1
P2
T1
2
4
6
8
10
12
2
4
6
8
10
12
T2,1
On P1
On P2
Next release = 4+6=10
Release guard releases the job
T2,2
2
4
2
4
6
8
10
12
8
10
12
T3’s deadline
T3
6
T3 starts here
ECE 455/555 Embedded System Design
T3 meets
deadline
Download