Static program analysis and its application to TI's DSP Processor

advertisement
Static Analysis of Executable Assembly
Code to Ensure QA and Reuse
Ramakrishnan Venkitaraman
Graduate Student, Research Track
Computer Science, UT-Dallas
Advisor: Dr. Gupta
Software Reuse & System Integration
Companies
Cost of Project
But, the Integrated
System does not
work
Outline
• Need for reusable software binaries
• Our framework for reuse of software binaries
• Automated tool to enforce standard compliance.
Need for reusable software binaries
•
•
•
•
•
•
Most third-party software is proprietary.
COTS market place
No recompiling, only linking
Reduced development time
Fewer bugs
Time to Market
Motivation
• Problems faced by Embedded chip
manufactures.
• System integration is very difficult.
Scope of the Framework
• Gives the sufficient conditions for software
binary code reusability.
• Usability vs. Reusability
• Usability is a precondition for reusability
• E.g. Array index out of bound reference.
Framework for reusable software Binaries
• Code should not be hard coded
• Binaries should not be assumed to be located at a
fixed virtual memory location
• Code should be reentrant
• No self-modifying code
• Should not make symbol resolution invalid
Problem and Solution
• Problem: Detection of hard coded addresses in
programs without accessing source code.
• Solution: “Static Program Analysis”
Interest in Static Analysis
• “We actually went out and bought for 30 million dollars, a
company that was in the business of building static analysis
tools and now we want to focus on applying these tools to
large-scale software systems”
• Remarks by Bill Gates, 17th Annual ACM Conference on ObjectOriented Programming, Systems, Languages and Application,
November 2002.
Static Analysis
• Defined as any analysis of a program carried out
without completely executing the program.
• Un-decidability: Impossible to build a tool that
will precisely detect hard coding.
Hard Coded Addresses
• Bad Programming Practice.
• Results in non relocatable code.
• Results in non reusable code.
Some examples showing hard-coding
void main()
{
int * p = 0x8800;
void main()
{
int *p = 0x80;
void main()
{
int *p, val;
// Some code
int *q = p;
p = ….;
val = …;
*p = …;
}
//Some code
Example1:
Directly
Hardcoded
NOTE: We don’t care if
a pointer is hard coded
and is never
dereferenced.
*q = …;
}
if(val)
p = 0x900;
else
p = malloc(…);
Example2:
Indirectly
Hardcoded
*p;
}
Example3:
Conditional Hardcoding
Overview Of Our Approach
• Input: Object Code of the Software
• Output: Compliant or Not Compliant status
Disassemble
Object Code
Split Into
Functions
Obtain Basic
Blocks
Output the
Result
Static
Analysis
Obtain Flow
Graph
Activity Diagram for our Static Analyzer
Basic Aim Of Analysis
• Find a path to trace pointer origin.
• Problem: Exponential Complexity
• Static Analysis approximation makes it linear
Analyzing Source Code – Easy
#include<stdio.h>
void main()
{
int *p, *q;
So, the program is not compliant
with the standard
//some code
p = (int*)8000;
P IS HARD CODED
//some code
q = p;
{{p}}
//some code
*q = 5;
}
{{q}}
Analyzing Assembly Code is Hard
• Problem
• No type information is available
• Instruction level pipeline and parallelism
• Solution
• Backward analysis
• Use Abstract Interpretation
Analyzing Assembly – Hard
000007A0 main:
000007A0 07BD09C2 SUB.D2 SP,0x8,SP
000007A4 020FA02A MVK.S2 0x1f40,B4
000007A8 023C22F6 STW.D2T2 B4,*+SP[0x1]
000007AC 00002000 NOP 2
000007B0 023C42F6 STW.D2T2 B4,*+SP[0x2]
000007B4 00002000 NOP 2
000007B8 0280A042 MVK.D2 5,B5
000007BC 029002F6 STW.D2T2 B5,*+B4[0x0]
000007C0 00002000 NOP 2
000007C4 008C8362 BNOP.S2 B3,4
000007C8 07BD0942 ADD.D2 SP,0x8,SP
000007CC 00000000 NOP
000007D0 00000000 NOP
{{ }}
B4 = 0x1f40
So, B4 is HARD CODED
{ {Code
B4 } }is NOT Compliant
{{ B4 }}
Abstract Interpretation Based Analysis
• Domains from which variables draw their
values are approximated by abstract domains.
• The original domains are called concrete
domains.
Lattice Abstraction
• Lattice based abstraction is used to determine pointer
hard-coded ness.
Contexts
• Contexts to Abstract Contexts
• Abstract Context to Context
Phases In Analysis
• Phase 1: Find the set of dereferenced pointers.
• Phase 2: Check the safety of dereferenced
pointers.
Building Unsafe Sets (Phase 1)
• The first element is added to the unsafe set
during pointer dereferencing.
• E.g. If “*Reg” in the disassembled code, the unsafe set is
initialized to {Reg}.
• ‘N’ Pointers Dereferenced  ‘N’ Unsafe sets
• Maintained as SOUS (Set Of Unsafe Sets)
Populating Unsafe Sets (Phase 2)
• For e.g., if
• Reg = reg1 + reg2, the element “Reg” is deleted
from the unsafe set, and the elements “reg1”,
“reg2”, are inserted into the unsafe set.
• Contents of the unsafe set will now become
{reg1, reg2}.
Pointer Arithmetic
• All pointer operations are abstracted during analysis.
Handling Loops
• Complex: # iterations of loop may not be
known until runtime.
• Cycle the loop until the unsafe set reaches a
“fixed point”.
• No new information is added to the unsafe set
during successive iterations.
Merging Information
Block A
• If no merging, then
exponential complexity.
• Mandatory when loops
• Information loss.
If (Cond)
Then
Block B
Else
Block C
Block D
Block E
Proof – Analysis is Sound
• Consistency of α and γ functions is established
by showing the existence of Galois
Connection. That is,
• x = α(γ(x))
• y belongs to γ(α(y))
Extensive Compliance Checking
• Handle all cases occurring in programs.
• Single pointer, double pointer, triple pointer…
• Global pointer variables.
• Static and Dynamic arrays.
Extensive Compliance Checking
• Loops – all forms (e.g. for, while…)
• Function calls.
• Pipelining and Parallelism.
• Merging information from multiple paths.
Analysis Stops when…
• Compliance of all the pointers are established.
• Errors and warnings are reported.
• Log file containing statistics of the analysis is
created.
Sample Code
Fig. Flow Graph
Analysis Results
Program
# Lines
# * Ptrs
# Hard
Coded
Chain
Length
Running
Time (ms)
t_read
80
3
0
0
1280
timer1
126
17
6
1
1441
mcbsp1
196
0
0
0
1270
figtest
292
19
10
2
1521
m_hdrv
345
6
2
1
2262
dat
949
10
8
12
2512
gui_codec
1139
109
28
1
3063
codec
1188
109
28
1
3043
stress
1203
105
0
1
4505
demo
1350
82
47
9
4716
Related Work
• UNO Project – Bell Labs
• Analyze at source level
• TI XDAIS Standard
• Contains 35 rules and 15 guidelines.
• SIX General Programming Rules.
• No tool currently exists to check for compliance.
Current Status and Future Work
• Prototype Implementation done
• But, context insensitive, intra-procedural
• Extend to context sensitive, inter-procedural.
• Extend compliance check for other rules.
So…
• Reuse of software binaries is essential.
• Hard Coding and non-reentrancy are bad
programming practices.
• Non relocatable/reusable code.
• A Static Analysis based technique is useful
and practical.
Software Reuse & System Integration
Select ONLY Compliant Software
WOW!!!! It
works…
Questions…
• Click to continue
• Extra slides
TI XDAIS Standard
•
Six General Programming Rules
1) All programs should follow the runtime conventions of TI’s C
programming language.
2) Algorithms must be re-entrant.
3) No hard coded data memory locations.
4) No hard coded program memory locations.
5) Algorithms must characterize their ROM-ability.
6) No peripheral device accesses.
•
No tool exists to check for compliance
Download