Announcements and Reminders Next ’s class is canceled due to travel

advertisement
Announcements and Reminders
 Next Monday (10/26)’s class is canceled due to travel
 Lab 3 is due next Wednesday 10/28 at 5:30pm
 Return your TelosB motes to the TA if you don’t need them
for your project
ECE 455/555 Embedded System Design
1
ECE 455/555
Embedded System Design
Program Optimization - II
Wei Gao
Fall 2015
2
Program Optimization for Embedded
Systems
 1. Optimizing for execution time.
 2. Optimizing for energy/power.
 3. Optimizing for program size.
 Those goals may conflict with each other!
ECE 455/555 Embedded System Design
3
Execution Time Analysis
 Real-time embedded systems must meet deadlines
 Predictability is the key
 Inaccuracy: cache, pipeline, various optimizations
 Execution time analysis
 Average-case
• For typical data values, whatever they are.
• Good for soft real-time systems
 Worst-case
• For any possible input set
• Hard real-time
 Techniques for improving execution time?
ECE 455/555 Embedded System Design
4
Execution Time
 Affected by program path and instruction timing
 Program path depends on input data
 Sensor readings, User input
 Instruction timing depends on
 Cache behavior: memory access is slower
 Instruction level variations
• Floating point vs. integer operations
ECE 455/555 Embedded System Design
5
Program Path
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];
i=0; f=0;
 Loop initiation block executed
once.
 Loop test executed N+1 times.
 Loop body and variable update
executed N times.
i<N
N
Y
f = f + c[i]*x[i];
i = i+1;
 Find the longest path length
for execution time analysis.
ECE 455/555 Embedded System Design
6
Measurement-Driven Analysis
 CPU simulator.
 I/O may be hard.
 May not be totally accurate.
 Time stamping
 Requires instrumented program.
 Timer granularity
• Gettimeofday on UNIX/Linux: 10 ms
• Gethrtime on Pentium: read a 64 bit clock cycle counter. and
return the number of clock cycles since the CPU was powered up
or reset: nanoseconds resolution.
ECE 455/555 Embedded System Design
7
Measurement-Driven Analysis
 A program trace is a record of the execution path of a
program.
 Measured during program execution
 Trace is used to
 Determine the total execution time
 Study cache behavior and power management policies.
 Analyze the program performance on different CPUs
 A useful trace:
 requires proper input values;
 is large (gigabytes).
ECE 455/555 Embedded System Design
8
Trace Generation
 Hardware capture
 Logic analyzer
• Limited buffer space
• Cannot observe on-chip cache
 Hardware assist in CPU
• Pentium supports automatic tracing of branches
 Software
 PC sampling
• OS periodically interrupt program and save away CPU values
 Instrumentation instructions
• Add additional instructions to write info to trace buffer
 Simulation
ECE 455/555 Embedded System Design
9
2. Optimizing for Energy/Power
 Important for battery-powered systems and for
system reliability and cost
 Energy: ability to do work.
 Most important in battery-powered systems.
 Power: energy per unit time.
 Important even in wall-plug systems---power becomes
heat.
 We have two classes on power management
ECE 455/555 Embedded System Design
10
Measuring Energy Consumption
 Measure the power consumption for an instruction
or a small code segment
1. Executes the code under
test over and over in a loop
2. Measure the current flowing
to the CPU
3. Delete the test code from
the loop
4. Measure the current flowing
to the CPU again
5. Calculate the difference
Ammeter
+
Power
supply
_
ECE 455/555 Embedded System Design
current
while (TRUE) {
test_code();
other_code();
}
CPU
11
Sources of Energy Consumption
Relative energy per operation
 Relative energy of CPU per operation (Catthoor et al):
33
30
20
9
10
1
3.6
10
4.4
0
16-bit
add
16-bit 8x128x16 8x128x16 External
multiply
SRAM
SRAM
I/O
read
write
access
16-bit
memory
transfer
 Memory transfer is the most expensive operation
 Biggest energy optimization comes from properly organizing memory
 Energy consumption: memory > caches > registers
ECE 455/555 Embedded System Design
12
Optimizing Energy Consumption
 Design algorithms in a clever way to consume less
power
 Moderate loop unrolling eliminates some loop overhead
instructions.
 Eliminate pipeline stalls.
 Optimize memory access
 Use registers efficiently.
 Identify and eliminate cache conflicts.
 Turn off parts of the system
 Run parts in low power states
ECE 455/555 Embedded System Design
13
3. Optimizing for Program Size
 Benefits
 Reduce hardware cost;
 Reduce power consumption.
 Size is determined by data and instructions
 Two opportunities:
 Data;
 Instructions.
ECE 455/555 Embedded System Design
14
Reduce Data Size
 Reuse constants, variables, data buffers in different
parts of code.
 E.x., pack multiple flags in one byte
• int flag1=1; flag2=0; flag3=1  one-byte flag: 101
 Requires careful verification of correctness.
 Generate data using instructions.
 Instead of using static data with initial values
ECE 455/555 Embedded System Design
15
Reduce Code Size
 Avoid loop unrolling.
 Reduces loop overhead but increase code size
 Inlining?
 Size of function
 Number of calls
 Choose CPU with compact instructions.
 Ex. DSPs (CISC)
 Some CPUs support dense instruction set
 ARM Thumb, MIPS-16
ECE 455/555 Embedded System Design
16
Code Compression
 Compressed code is stored in and fetched from main
memory
 Decompress on-the-fly only on cache misses.
 Table generated from compression is used for
decompression
0101101
main
memory
0101101
decompressor
Compressed code
table
LDR r0,[r4]
cache
CPU
ECE 455/555 Embedded System Design
17
Summary
 Optimization for embedded systems
 Optimizing for execution time
• Execution time analysis: Program path, instruction timing
• Execution time metrics: Average-case, worst-case
• Execution time measurement: trace analysis
 Optimizing for energy/power
• Measurement, sources of energy consumption, cache
 Optimizing for program size
• Reduce data size and code size
ECE 455/555 Embedded System Design
18
Reading
 Textbook: 5.6-5.9
ECE 455/555 Embedded System Design
19
Download