Announcements and Reminders Next Monday (10/26)’s class is canceled due to travel Lab 3 is due next Wednesday 10/28 at 5:30pm Return your TelosB motes to the TA if you don’t need them for your project ECE 455/555 Embedded System Design 1 ECE 455/555 Embedded System Design Program Optimization - II Wei Gao Fall 2015 2 Program Optimization for Embedded Systems 1. Optimizing for execution time. 2. Optimizing for energy/power. 3. Optimizing for program size. Those goals may conflict with each other! ECE 455/555 Embedded System Design 3 Execution Time Analysis Real-time embedded systems must meet deadlines Predictability is the key Inaccuracy: cache, pipeline, various optimizations Execution time analysis Average-case • For typical data values, whatever they are. • Good for soft real-time systems Worst-case • For any possible input set • Hard real-time Techniques for improving execution time? ECE 455/555 Embedded System Design 4 Execution Time Affected by program path and instruction timing Program path depends on input data Sensor readings, User input Instruction timing depends on Cache behavior: memory access is slower Instruction level variations • Floating point vs. integer operations ECE 455/555 Embedded System Design 5 Program Path for (i=0, f=0; i<N; i++) f = f + c[i]*x[i]; i=0; f=0; Loop initiation block executed once. Loop test executed N+1 times. Loop body and variable update executed N times. i<N N Y f = f + c[i]*x[i]; i = i+1; Find the longest path length for execution time analysis. ECE 455/555 Embedded System Design 6 Measurement-Driven Analysis CPU simulator. I/O may be hard. May not be totally accurate. Time stamping Requires instrumented program. Timer granularity • Gettimeofday on UNIX/Linux: 10 ms • Gethrtime on Pentium: read a 64 bit clock cycle counter. and return the number of clock cycles since the CPU was powered up or reset: nanoseconds resolution. ECE 455/555 Embedded System Design 7 Measurement-Driven Analysis A program trace is a record of the execution path of a program. Measured during program execution Trace is used to Determine the total execution time Study cache behavior and power management policies. Analyze the program performance on different CPUs A useful trace: requires proper input values; is large (gigabytes). ECE 455/555 Embedded System Design 8 Trace Generation Hardware capture Logic analyzer • Limited buffer space • Cannot observe on-chip cache Hardware assist in CPU • Pentium supports automatic tracing of branches Software PC sampling • OS periodically interrupt program and save away CPU values Instrumentation instructions • Add additional instructions to write info to trace buffer Simulation ECE 455/555 Embedded System Design 9 2. Optimizing for Energy/Power Important for battery-powered systems and for system reliability and cost Energy: ability to do work. Most important in battery-powered systems. Power: energy per unit time. Important even in wall-plug systems---power becomes heat. We have two classes on power management ECE 455/555 Embedded System Design 10 Measuring Energy Consumption Measure the power consumption for an instruction or a small code segment 1. Executes the code under test over and over in a loop 2. Measure the current flowing to the CPU 3. Delete the test code from the loop 4. Measure the current flowing to the CPU again 5. Calculate the difference Ammeter + Power supply _ ECE 455/555 Embedded System Design current while (TRUE) { test_code(); other_code(); } CPU 11 Sources of Energy Consumption Relative energy per operation Relative energy of CPU per operation (Catthoor et al): 33 30 20 9 10 1 3.6 10 4.4 0 16-bit add 16-bit 8x128x16 8x128x16 External multiply SRAM SRAM I/O read write access 16-bit memory transfer Memory transfer is the most expensive operation Biggest energy optimization comes from properly organizing memory Energy consumption: memory > caches > registers ECE 455/555 Embedded System Design 12 Optimizing Energy Consumption Design algorithms in a clever way to consume less power Moderate loop unrolling eliminates some loop overhead instructions. Eliminate pipeline stalls. Optimize memory access Use registers efficiently. Identify and eliminate cache conflicts. Turn off parts of the system Run parts in low power states ECE 455/555 Embedded System Design 13 3. Optimizing for Program Size Benefits Reduce hardware cost; Reduce power consumption. Size is determined by data and instructions Two opportunities: Data; Instructions. ECE 455/555 Embedded System Design 14 Reduce Data Size Reuse constants, variables, data buffers in different parts of code. E.x., pack multiple flags in one byte • int flag1=1; flag2=0; flag3=1 one-byte flag: 101 Requires careful verification of correctness. Generate data using instructions. Instead of using static data with initial values ECE 455/555 Embedded System Design 15 Reduce Code Size Avoid loop unrolling. Reduces loop overhead but increase code size Inlining? Size of function Number of calls Choose CPU with compact instructions. Ex. DSPs (CISC) Some CPUs support dense instruction set ARM Thumb, MIPS-16 ECE 455/555 Embedded System Design 16 Code Compression Compressed code is stored in and fetched from main memory Decompress on-the-fly only on cache misses. Table generated from compression is used for decompression 0101101 main memory 0101101 decompressor Compressed code table LDR r0,[r4] cache CPU ECE 455/555 Embedded System Design 17 Summary Optimization for embedded systems Optimizing for execution time • Execution time analysis: Program path, instruction timing • Execution time metrics: Average-case, worst-case • Execution time measurement: trace analysis Optimizing for energy/power • Measurement, sources of energy consumption, cache Optimizing for program size • Reduce data size and code size ECE 455/555 Embedded System Design 18 Reading Textbook: 5.6-5.9 ECE 455/555 Embedded System Design 19