ECE 455/555 Embedded System Design Final Review Wei Gao Fall 2015 1 Final Exam When: 12/8 2:45pm-4:45pm Where: Min Kao 406 20% of your final grade What about: Everything covered in this course Closed-book, closed-notes, no laptop, no discussion All included in class slides (as well as textbook, assigned papers) Will include 1 programming question Make your answers short to include only key points • Do answer every question, DON’T leave blank Final project report due on 12/8 before final exam 15% of your final grade Remember to submit your SAIS review to receive 5% bonus credit for your final project report! ECE 455/555 Embedded System Design 2 First half of the semester Introduction to embedded systems Real-time, low power, small memory footprint, low cost Soft/hard real-time system Design methodology Microprocessors, FPGA and ASIC Design procedure and example: GPS Microprocessors von Neumann vs. Harvard, RISC vs. CISC, SHARC, ARM7 CPUs, I/O, interrupt Busy-wait I/O, interrupt-based I/O, interrupt mechanism Caches and memory Memory system, average memory access time in multi-level cache Cache organization: direct-mapped, N-way set-associative Embedded computing platforms I/O devices, hardware/software architecture, state machine, testing ECE 455/555 Embedded System Design 3 Definition Embedded system: any device that includes a computer but is not itself a general-purpose computer. System characteristics Non-functional requirements: Real-time, Low power, Small memory footprint, Low cost Hard vs. soft real-time ECE 455/555 Embedded System Design 4 Alternative Technology Application-Specific Integrated Circuits (ASICs) Microprocessors Field-Programmable Gate Arrays (FPGAs) Why should we use microprocessors? Reprogrammability and low development cost >> low performance/watt ECE 455/555 Embedded System Design 5 Microprocessors Performance Con: Programmable architecture is fundamentally slow! • Fetch, decode instructions Pro: Highly optimized architecture and manufacturing • Pipelines; cache; clock frequency; circuit density; manufacturing technology Power Processors perform poorly in terms of performance/watt! Power management can alleviate the power problem. Flexibility, development cost and time Let software do the work! ECE 455/555 Embedded System Design 6 Design Methodologies requirements specification Top-down design architecture component design Bottom-up design system integration ECE 455/555 Embedded System Design 7 Microprocessors von Neumann Same memory holds data, instructions. A single set of address/data buses between CPU and memory Harvard Separate memories for data and instructions. Two sets of address/data buses between CPU and memory RISC vs. CISC CISC: Many addressing modes and instructions; High code density. RISC: Compact, uniform instructions: facilitate pipelining, poor memory footprint ECE 455/555 Embedded System Design 8 Busy-Wait I/O Programming Simplest way to program I/O devices. Devices are usually slower than CPU and require more cycles CPU has to wait for device to finish before starting next one Use peek instruction to test when device is finished Test-and-set //send a string to device using Busy-Wait handshaking current_char = mystring; while (*current_char != ‘\0’) { //send character to device (data register) poke(OUT_CHAR,*current_char); //wait for device to finish by checking its status while (peek(OUT_STATUS) != 0); //advance character pointer to next one current_char++; } ECE 455/555 Embedded System Design 9 Interrupt-based I/O Busy-wait is very inefficient. CPU can’t do other work while testing device. Hard to do simultaneous I/O. Interrupts allow to change the flow of control in the CPU. Call interrupt handler (i.e. device driver) to handle device. CPU PC IR interrupt request interrupt ack data/address status register Device mechanism data register ECE 455/555 Embedded System Design 10 Microprocessor Bus Bus is a set of wires and a protocol for the CPU to communicate with memory and devices Five major components to support reads and writes Device 2 Device 1 a CPU n Clock R/W’ Address Data ready’ Data Memory ECE 455/555 Embedded System Design 11 Typical Bus Access Timing diagram syntax: Tri-state: Constant value (0/1), stable, changing, unknown. Clock R/W’ Address enable Address Data Ready data read ECE 455/555 Embedded System Design write time 12 Memory System and Caches Memory is slower than CPU CPU clock rates increase faster than memory Caches are used to speed up memory Cache is a small but fast memory that holds copies of the contents of main memory More expensive than main memory, but faster Memory Management Units (MMU) Memory size is not large enough for all application? Provide a larger virtual memory than physical memory ECE 455/555 Embedded System Design 13 Memory Devices Types of memory devices RAM (Random-Access Memory) • Address can be read in any order, unlike magnetic disk/tape • Usually used for data storage • DRAM vs. SRAM. ROM (Read-Only Memory) • Usually used or program storage • Mask-programmed vs. field-programmable. ECE 455/555 Embedded System Design 14 TinyOS System Support concurrency: event-driven architecture Modularity: application = scheduler + graph of components Compiled into one executable Efficiency: Get done quickly and sleep Event/command = function calls Fewer context switches: FIFO/non-preemptable scheduling No kernel/application boundary: completely open-source Main (includes Scheduler) Application (User Components) Actuating Sensing Communication Communication Hardware Abstractions Modified from D. Culler et. Al., TinyOS boot camp presentation, Feb 2001 ECE 455/555 Embedded System Design 15 TinyOS Programming Model: nesC Component model An application consists of Application Component D wired components Component A Application = graph of components Components are wired through interfaces Wiring specified by configurations configuration Configuration can be hierarchical Component C Component B Component F Component E ECE 455/555 Embedded System Design configuration 16 TinyOS Programming Model: nesC Interface: events vs. commands command needs to implemented by components providing the interface event needs to be handled by components using the interface Interface Receive { event message_t * Receive(message_t * msg, void * payload, uint8_t len); command void * getPayload(message_t * msg, uint8_t * len); command uint8_t payloadLength(message_t * msg); } ECE 455/555 Embedded System Design 17 Second Half of the semester Program optimizations Power management Operating systems Real-time scheduling ECE 455/555 Embedded System Design 18 Basic Compilation Optimization Expression simplification Dead code elimination Function inlining Loop optimizations Array conflicts in cache Register allocation ECE 455/555 Embedded System Design 19 Function inlining int foo(a,b,c) { return a + b - c;} z = foo(w,x,y); z = w + x - y; An inline function’s body is inserted directly (like a substitution) in the compiled code at the point where the function is called. Improve performance by reducing function call overhead “inline” in different cases TinyOS does whole-program inlining ECE 455/555 Embedded System Design 20 Loop Optimizations Loops are good targets for optimization. Basic loop optimizations: Code motion; Reduce loop overhead: loop unrolling Increase opportunities for pipelining and parallelism: loop fusion ECE 455/555 Embedded System Design 21 Register Allocation Processor registers A very small amount of very fast computer memory Used to speed the execution of computer programs Provides quick access to most commonly used values Memory hierarchy: register – cache – main memory – disk Reduce the number of used registers Fit more frequently used variables in registers Load once, use many times ECE 455/555 Embedded System Design 22 Register Lifetime Graph no. of needed registers = 5 1. w = a + b; 2. x = c + w; 3. y = c + d; 4. z = a - b; a b c d w x y z 1 2 3 4 means this variable should be loaded to register ECE 455/555 Embedded System Design 23 After Rescheduling no. of needed registers = 4 1. w = a + b; 2. z = a - b; 3. x = c + w; 4. y = c + d; a b c d w x y z 1 2 3 4 Cannot change dependencies between instructions! ECE 455/555 Embedded System Design 24 Power Management Hardware support CMOS features: voltage drops, toggling, leakage Clock gating, supply shutdown, dynamic voltage scaling Power management policy Dynamic power management Power state machine, break-even time TBE Energy saving calculation based on a known idle time Predictive techniques • Metrics of prediction quality: safety and efficiency • Fixed timeout vs. predictive shutdown/wakeup Power manager Advanced Configuration and Power Interface (ACPI) Holistic approach Memory system, cache behavior ECE 455/555 Embedded System Design 25 Break-Even Time TBE TBE of an inactive state is the total time for entering and leaving the state Assumption: transition doesn’t cause extra power consumption TBE = TTR = TOn,Off + TOff,On Ex. TBE = 160 ms + 90 µs for SLEEP in SA-1100 Prun = 400 mW run 10 µs 10 µs idle Pidle = 50 mW 90 µs 90 µs 160 ms Power consumption during transition ≈ Prun sleep Psleep = 0.16 mW ECE 455/555 Embedded System Design 26 Energy Saving Calculation Given an idle period Tidle > TBE ES(Tidle) = (Tidle - TTR)(POn - POFF) + TTR(POn – PTR) • POn > PTR: total = idle saving + transition saving • POn < PTR: total = idle saving - transition cost Achievable power saving depends on workload! Distribution of idle periods ECE 455/555 Embedded System Design 27 Operating Systems OS: manages multiple, concurrent tasks Engine control, sensor motes Process Co-routines methodology, co-operative multitasking, preemptive multitasking Context switch Process states and scheduling Inter-process communication Shared memory Race conditions Examples TinyOS, POSIX Real-Time OS Proprietary kernels, real-time extensions to general-purpose OS ECE 455/555 Embedded System Design 28 Cooperative Multitasking Improvement to co-routines: hides context switching mechanism; still relies on processes to voluntarily give up CPU. Each process allows a context switch at cswitch() call. Separate scheduler chooses which process runs next. if (x > 2) sub1(y); else sub2(y, 2); cswitch(); proca(a, b, c); Process 1 Student A save_state(current); p = choose_process(); load_and_go(p); Scheduler TA ECE 455/555 Embedded System Design proc_data(r, s, t); cswitch(); If (val1 == 3) abc(val2); rst(val3); Process 2 Student B 29 Preemptive Multitasking No more voluntary release of CPU Operating System (OS) is now in charge Timer Most powerful form of multitasking: interrupt OS controls when context switches; OS determines what process runs next. Use periodic timer interrupts to call OS to switch contexts interrupt P1 OS CPU interrupt P1 OS P2 Flow of control with preemption ECE 455/555 Embedded System Design time 30 Process States A process can be in one of three states: executing on the CPU; ready to run; waiting for data. executing gets CPU Scheduler preempted needs data gets data and CPU gets data ready waiting needs data ECE 455/555 Embedded System Design 31 Shared Memory and Problems Process 1 and 2 take turn to execute on the CPU Problem when two processes try to write the shared memory location: Race condition process 1 reads flag and sees 0. process 2 reads flag and sees 0. process 1 sets flag to one and writes location. process 2 sets flag to one and overwrites the same location. if (flag == 0) /* preempted*/ flag=1; loc=var; /* preempted*/ print(loc); var = 5; process 1 if (flag == 0) flag=1; loc=var; memory var = 2; process 2 ECE 455/555 Embedded System Design if (flag == 0) /* preempted*/ flag=1; loc=var; /* preempted*/ 32 Race Conditions Conditions for race conditions to happen Concurrent processes/tasks access shared variables. Preemption/interruption at a “wrong” time. Atomic section: section of code that cannot be interrupted by another process. Critical section: section of code that must not be concurrently accessed by more than one thread of execution. Mutual exclusion Prevent race conditions Atomic section semaphores ECE 455/555 Embedded System Design 33 Real-time Scheduling Terminologies and timing parameters Task, job, subtask Metrics to evaluate scheduling algorithms Schedulability, overhead Optimal scheduling algorithms When relative deadline = period: RMS, EDF, utilization bound When relative deadline < period: EDF, processor demand analysis CPU utilization analysis and bound Priority inversion Sources, unbounded priority inversion, priority inheritance End-to-end scheduling framework Task allocation: bin packing Synchronization protocol: greedy protocol, release guard Subdeadline assignment: ultimate deadline, proportional deadline ECE 455/555 Embedded System Design 34 RMS Meeting the Deadline T1 = (10,20), T2 = (10,30), utilization is 83% T1_1 T1_2 1 T2_1 T2_2 2 T1_1 T1_2 T2_1 T2_2 Job1 of T2 meets its deadline ECE 455/555 Embedded System Design 35 EDF Meeting a Deadline T1 = (10,20), T2 = (15,30), utilization is 100% T1_1 T1_2 1 T2_1 T2_2 2 T1_1 T1_2 T2_1 T2_2 T2 takes priority because its deadline is sooner ECE 455/555 Embedded System Design 36 Priority Inversion critical section T1 blocked! 1 4 0 1 4 2 4 1 4 6 8 10 4 12 14 16 18 20 22 T1 tries to get the same semaphore T4 preempted by T1 T4 acquires a semaphore T4 starts to run ECE 455/555 Embedded System Design 37 Unbounded Priority Inversion critical section T1 blocked by 4,2,3! 1 1 1 2 3 4 0 4 2 4 4 4 6 8 10 12 14 16 4 18 20 22 T1 tries to get the semaphore ECE 455/555 Embedded System Design 38 Solution: Priority Inheritance Let the low-priority task inherit the priority of the blocked high-priority task. critical section T1 only blocked by 4 1 1 1 3 2 4 0 4 2 4 4 4 6 8 10 12 14 16 18 20 22 T4 returns to priority 4 after the critical section T1 tries to get semaphore so T4 inherits T1’s priority ECE 455/555 Embedded System Design 39 Multi-Processor Systems Tight coupling among processors. Communicate through shared memory and on-board bus. Scheduled by a common scheduler/OS. Global scheduling Partitioned scheduling States of all processors available to each other. ECE 455/555 Embedded System Design 40 End-to-End Task Model An (end-to-end) task is composed of multiple subtasks running on multiple processors Message/event Remote method invocation Subtasks are subject to precedence constraints Task = a chain/tree/graph of subtasks E.g. ship navigation Sonar Signal processing ECE 455/555 Embedded System Design Obstacle detection Navigation 41 End-to-End Scheduling Framework 1. 2. 3. 4. Task allocation Synchronization protocol Subdeadline assignment Schedulability analysis ECE 455/555 Embedded System Design Greedy Protocol After a subtask is finished, the next subtask starts immediately Release job Ji,j;k as soon as Ji,j-1;k is completed Subsequent subtasks may not be periodic under a greedy protocol Difficult for schedulability analysis High-priority tasks arrive early high worst-case response time for lower-priority tasks Sonar Signal processing Obstacle detection ECE 455/555 Embedded System Design Navigation Greedy Protocol Illustrated Task: (C,P) T1 (2,4) T2,2 (2,6) P1 P2 T2,1 (2,6) T3 (4,7) P1 P2 T1 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 8 10 12 T3’s deadline T2,1 On P1 On P2 T2,2 T3 6 T3 starts here ECE 455/555 Embedded System Design T3 misses deadline Release Guard After a subtask is finished, the next subtask may wait for a while before release Every subtask (if not a first subtask) has a release guard, which waits for the preceding subtask for a result/event then releases the job • at the point of exact one period from the last release time (Rule1) OR • whenever the processor becomes idle (Rule 2) Release guard strategy improves worst response time without affecting schedulability ECE 455/555 Embedded System Design Release Guard Illustrated Task: (C,P) T1 (2,4) T2,2 (2,6) P1 P2 T2,1 (2,6) T3 (4,7) P1 P2 T1 2 4 6 8 10 12 2 4 6 8 10 12 T2,1 On P1 On P2 Next release = 4+6=10 Release guard releases the job T2,2 2 4 2 4 6 8 10 12 8 10 12 T3’s deadline T3 6 T3 starts here ECE 455/555 Embedded System Design T3 meets deadline