Data-flow Analysis for Interrupt- driven Microcontroller Software Nathan Cooprider Advisor: John Regehr

advertisement
Data-flow Analysis for Interruptdriven Microcontroller Software
Nathan Cooprider
Advisor: John Regehr
Dissertation defense
School of Computing
University of Utah
Data-flow Analysis for Interruptdriven Microcontroller Software
• A whole program analysis
• Targeting embedded C programs
• Suitable for use in a compiler
2
Microcontrollers (MCUs)
• 10 billion units / year
• $12.5 billion market in 2006
• Cheap
• Resource constrained
• e.g. Wireless sensor networks
– Mica2 mote
ATmega 128L (4 MHz 8-bit MCU)
128 kB code, 4 kB data SRAM
3
Problem
• Resources are constrained
• Software outlives hardware
– Code reuse leads to bloat
• Low-level code confuses analysis
– Interrupt-driven concurrency
– Device register access
4
Solution
• Traditional data-flow analysis
– Not adequate precision for MCU software
• New techniques to increase precision
– Deal with concurrency
– Track volatile data
• Use in code transformations
Thesis statement
– Optimizations
5
Contributions
• Analysis techniques
– Interatomic concurrent data-flow (ICD)
– Tracking data through volatile variables
• Tool – cXprop
• Applications
– Practical memory safety – Safe TinyOS
– Offline RAM Compression
6
• Open-source OS for WSNs
• Written in nesC
main
– Dialect of C
• Concurrency
– Tasks and interrupts
– No threads
– Atomic sections
Interrupt
task
task
task
Interrupt
7
ICD
c
X
p
r
o
p
Volatile
tracking
Abstract
interpretation
Conditional x
propagation
Safe
TinyOS
RAM
compression
Pointer
analysis
8
Abstract interpretation
• Abstract domain
switch (x) {
...
– Abstract values
break;
– Form poset
case 42: case 7: case -1:
• Subset relation ()
if (x < 0) x={42,7,-1}
– Lattice
x *= -1;
• Undefined ( )
x++;
• Unknown (⊥)
{} or
if (x == 0)
{42}
{7}
{-1}
assert(0);
break;
{42,7} {7,-1} {42,-1}
...
{42,7,-1} or ⊥
9
Abstract interpretation
switch (x) {
...
break;
case 42: case 7: case -1:
if (x < 0) x={42,7,-1}
x *= -1;
x++;
if (x == 0)
assert(0);
break;
...
• Abstract domain
– Abstract values
– Form poset
• Subset relation ()
– Lattice
• Undefined ( )
• Unknown (⊥)
• Data-flow analysis
– Transfer functions
– Merging ()
– Fixed point
10
Abstract interpretation
• Abstract domain
Τ
{42,7,-1}
– Abstract values
– Form poset
Τ{-1}
Τ
{42,7}
x<0
<
x*=-1;
*=

++
x++;
Τ
{1}
Τ
{42,7,1}
• Subset relation ()
– Lattice
• Undefined ( )
• Unknown (⊥)
Τ
{43,8,2}
x==0
==
Τ 
{43,8,2}
• Data-flow analysis
assert(0);
Τ
Τ
– Transfer functions
– Merging ()
– Fixed point
11
ICD
c
X
p
r
o
p
Volatile
tracking
Abstract
interpretation
Conditional x
propagation
Safe
TinyOS
RAM
compression
Pointer
analysis
12
Interrupt-driven concurrency
• Problems
– C statements not necessarily atomic
x = 0x4242;
ldi r24, 0x42
Interrupt
ldi r25, 0x42
13
Interrupt-driven concurrency
• Problems
– C statements not necessarily atomic
– Preempts sequential control flow
• Complicated control flow
• Synchronization
A race
– One flow does not “break” another
– Bad synchronization happens
• Difficult or impossible to reason about
• Must deal with conservatively (⊥)
14
Related work
• Thread-based concurrency
– M. B. Dwyer, L. A. Clarke, J. M.Cobleigh, and G.
Naumovich. Flow analysis for verifying properties of
software systems. TOSEM 2004.
– M. C. Rinard. Analysis of multithreaded programs. SAS
2001.
• Leveraging race detection
– R. Chugh, J. W. Voung, R. Jhala, and S. Lerner. Dataflow
analysis for concurrent programs using datarace detection.
PLDI 2008.
• Formal semantics
– X. Feng, Z. Shao, Y. Dong, Y. Gho. Certifying low-level
programs with hardware interrupts and preemptive
threads. PLDI 2008.
15
Race detection
• Lockset analysis - standard technique
– Lock status = interrupt enable bit status
– Only one lock – no lock aliasing
– nesC uses lexical nesting
• Data classification
– Unshared – accessed only from main
– Shared – accessed from interrupts
16
Race detection


Accessed without locking
Written in shared or unlocked
unshared code
Accessed in shared code

R
A
C
E
• Data classification
– Unshared – accessed only from main
– Shared – accessed from interrupts
17
Race detection case analysis
Interrupt
Write
Read
Use
Racing
Not
racing
Interrupt
or
task
Write
Read
Access
Atomic section
18
Data classification
Data
Heap
Concurrent
Static
(Global)
Sequential

Shared
⊥ Racing
6%

Stack
Unshared
50%
Not racing
44%

19
Published at LCTES 2006
Atomic interleaving
Atomic
section
main
Interrupt
Atomic
section
Atomic
section
Interatomic Concurrent Data-flow
20
Volatile
• C type qualifier – volatile int
• Special case of C’s memory model
– Read value may change “randomly”
– Write may affect system state
• E.g., racing data, device registers
• Behavior opaque at C level
• Prevents compiler optimizations
21
Tracking volatile RAM
• Locate variables backed by RAM
• Introduce concurrency information
– Interatomic concurrent dataflow
• Have sound approximation of mutators
– Behavior not opaque at system level
• Safely analyze volatile variables in RAM
22
Tracking volatile device registers
• Hardware registers
– Memory mapped I/O
– Hardware not actually random (volatile)
• Can track using MCU-specific
information
– OK to track individual bits
• Instead of whole register
• Interrupt bit of status register
Volatile tracking
23
Pointer analysis
• Points-to sets – must and may alias
– Two pluggable domains
– Subtleties from context-insensitivity
• Targets:
–
–
–
–
–
–
Device registers
Scalars
Structs
Arrays
not-NULL
Heap
Pointer analysis
24
Conditional X propagation
• Pluggable abstract domains
– From conditional constant propagation
• Clean domain interface
– Transfer functions
– Abstract
interpretation
Abstract domain
utility functions
Conditional X propagation
Analysis
25
Domains
Constant
Bitwise
Interval
Conditional
X propagation
Value set
26
ICD
c
X
p
r
o
p
Volatile
tracking
Abstract
interpretation
Conditional x
propagation
Safe
TinyOS
RAM
compression
Pointer
analysis
27
Struct splitter
Inliner
Fixed point computation
Cleaner
Value-flow Pointer-flow
ICD
Volatile tracking
• Constant propagation
• Dead code elimination
• Dead data elimination
Transformations
Cleaner
Implemented as a CIL extension
28
Suppose we have a WSN…
29
Suppose we have a WSN…
• What happened?
– State got corrupted – array
out-of-bounds
Memory
safety error
– Hard to debug
• Limited visibility into executing systems
• Difficult to replicate complex bugs
• Memory safety can
– Catch all pointer and array bounds errors
• Before they corrupt state
– Provide a choice of recovery action
• Display error message or reboot
30
Safe TinyOS
Expand
Deputy:
existing solution
for making C safe
into system safety
• Modify TinyOS to
work with Deputy
• Enforce Deputy’s
safety model under
concurrency
• Reduce overhead
cXprop
Published at SenSys 2007
31
Safe TinyOS toolchain
int post(val_t*
buf, buf,
int n);
int post(val_t*
COUNT(n)
int n);
run modified
nesC compiler
enforce safety
using Deputy
deal with
concurrency
TinyOS
code
cXprop
compress
error messages
Safe
whole-program
optimization
TinyOS
app
cXprop
Annotate
Safe
TinyOS
code
Modify TinyOS to work
with Deputy
Enforce Deputy’s safety
model under
concurrency
Reduce overhead
32
• Deputy enforces
safety in
sequential code
• cXprop avoids
extraneous
protection
– Only racing
variables need
protection
Atomic block
Concurrency
Potentially
unsafe read
to local
Interrupt
Deputy check
Potentially
Read local )
If ( unsafe
read
33
Code size
35
Code size
35%
13%
-11%
Safe
TinyOS
36
A closer look at RAM usage
• On-chip RAM for MCUs expensive
– Kilobytes, not megabytes or gigabytes
– Data in SRAM – 6 transistors / bit
– SRAM can dominate power consumption
of a sleeping chip
37
A closer look at RAM usage
• On-chip RAM for MCUs expensive
– Kilobytes, not megabytes or gigabytes
–On-chip
Data in SRAM
transistors / bit
RAM–is6 persistently
scarce
in can
tinydominate
MCU-based
systems
– SRAM
power
consumption
of a sleeping chip
• Is RAM used efficiently?
– Performed value profiling for MCU apps
• Apps already heavily tuned for RAM usage
– Result: Average byte stores four values!
38
Offline RAM compression
• Automated sub-word packing
for statically allocated
scalars, pointers, structs,
arrays
– No heap on targeted MCUs
– Trades ROM and CPU cycles
for RAM
Published at PLDI 2007
39
Method
x ≝ variable that occupies n bits
Vx ≝ conservative estimate of value set
log2|Vx| < n ⇒ RAM compression possible
Cx ≝ another set such that |Cx| = |Vx|
fx ≝ bijection between Vx and Cx
n - log2|Cx| ⇒ bits saved through compression of x
40
Example Compression
void (*function_queue[8])(void);
41
Example Compression
void (*function_queue[8])(void);
x
n = size of a function pointer = 16 bits
42
Example Compression
x
Vx
&function_A
&function_B
&function_C
NULL
43
Example Compression
x
Vx
n = 16 bits
|Vx| = 4
log2|Vx| < n
2 < 16

44
Example Compression
x
Vx
Cx
0
1
2
fx ≝ Vx to Cx ≝ compression
fx-1 ≝ Cx to Vx ≝ decompression
3
45
Example Compression
ROM
x
Cx
Vx = {
,
,
,
}
0
1
2
3
fx ≝ compression
table scan
fx-1 ≝ decompression
table lookup
46
Example Compression
ROM
x
Cx
Vx = {
,
,
,
}
0
1
2
128 bits reduced to 16 bits
3
112 bits of RAM saved
47
RAM compression results
49
RAM compression results
cXprop (no compression)
10% RAM reduction
20% ROM reduction
5.9% duty cycle reduction
Compression
22% RAM reduction
3.6% ROM reduction
29% duty cycle increase
50
ICD
c
X
p
r
o
p
Volatile
tracking
Abstract
interpretation
Conditional x
propagation
Safe
TinyOS
RAM
compression
Pointer
analysis
51
Conclusion
• Interatomic concurrent data-flow
• Volatile data may be tracked
• Better analysis  more optimizations
– Safe TinyOS – practical memory safety
– RAM compression – 22% RAM reduction
http://www.cs.utah.edu/~coop/research/cxprop/
http://www.cs.utah.edu/~coop/safetinyos/
http://www.cs.utah.edu/~coop/research/ccomp/
Thank you
52
53
Cost/Benefit Ratio
C i Ai B i V
C ≝ access profile
A,B ≝ platform-specific costs
V ≝ cardinality of value set
S u− S c
Su ≝ original size
Sc ≝ compressed size
54
Turning the RAM Knob
0%
55
Turning the RAM Knob
10%
56
Turning the RAM Knob
20%
57
Turning the RAM Knob
30%
58
Turning the RAM Knob
40%
59
Turning the RAM Knob
50%
60
Turning the RAM Knob
60%
61
Turning the RAM Knob
70%
62
Turning the RAM Knob
80%
63
Turning the RAM Knob
90%
64
Turning the RAM Knob
100%
65
Turning the RAM Knob
95%
66
Future work
• Triggering and sequencing
Timer
interrupt
handler
Sense
Data
ready
interrupt
handler
Fire
Trigger
Fire
Data
• Caching compressed values
read x x
decompress
read x x
decompress
read x x
decompress
67
More related work
• Safe TinyOS
– R. K. Rengaswamy, E. Kohler, and M. Srivastava. Softwarebased memory protection in sensor nodes. EmNets 2006.
– B. L. Titzer. Virgil: Objects on the head of a pin. OOPSLA
2006.
– S. Kowshik, D. Dhurjati, and V. Adve. Ensuring code safety
without runtime checks for real-time control systems.
CASES 2002.
• Offline RAM compression
– Y. Zhang and R. Gupta. Compressing heap data for
improved memory performance. Software—Practice and
Experience 2006.
– L. S. Bai, L. Yang, and R. P. Dick. Automated compile-time
and run-time techniques to increase usable memory in
MMU-less embedded systems. CASES 2006.
68
PAG
• Program Analysis Generator
– Domain specific language input describes
• Domain lattice
• Transfer functions
• Language-describing grammar
• Fixed point solution method
– Data-flow analyzer as output
• Does not deal with concurrency
• Used to evaluate fixed point solutions
69
Feature comparison
12%
5.5%
70
Domain comparison
71
Resource reduction
12%
8.3%
2.5%
1.8%
72
Published at LCTES 2006
Atomic interleaving
Atomic
section
main
Interrupt
Interrupt
Atomic
section
Atomic
section
Atomic
section
Interatomic Concurrent Data-flow
73
Context insensitivity
a is a global variable
foo
int x = 7;
bar(&x);
a = {27}
{7}
x = {7,42}
bar(int *y)
goo(y);
a = {27}
y = {&x}
goo(int *z)
*z = 42;
a = *z;
{27}
a = {7,27,42}
z = {&x}
74
Benchmark descriptions
•
•
•
•
•
AVR ATmega128 code
TinyOS
3,000-26,000 lines of C code
Analysis times - seconds to an hour
Metrics
– Duty cycle
• % of time processor is on
• Obtained from Avrora
– Cycle-accurate simulator for WSNs
– Code size and data size
75
Wireless sensor networks
• 10 billion units / year
• $12.5 billion market in 2006
• Cheap
• Resource constrained
• e.g. Wireless sensor networks
– Mica2 mote
ATmega 128L (4 MHz 8-bit MCU)
128 KB code, 4 KB data SRAM
76
Download