Program Analysis

advertisement
1
ANALYSIS OF PROG. LANG.
PROGRAM ANALYSIS
Instructors: Crista Lopes
Copyright © Instructors.
Motivation(s)
2



Where do you see PA in your everyday life?
How does PA “work”?
What is PA anyway?
Auto-completion
3
Pre-compilation error detection
4

Ex: missing parenthesis
How do you know ...
5
int a;
increment_a() {
a ++;
}
while(true) {
String a = “hello”;
increment_a();
}
This “a” is not
that “a”
How do you remember ...
6
int a;
“a” is of type int (FYI...)
increment_a() {
a ++;
}
while(true) {
String a = “hello”;
increment_a();
}
Wait, what’s the
type of “a” again?
Outline
7


Introduction/motivations
Program representation
 AST
 3-address


code
Control flow analysis
Data flow
Intermediate Representation (IR)
8


Initial Point
Abstract Syntax Tree
 Abstract
vs Concrete Syntax
 Parse Tree vs Abstract Syntax Tree

Three-address Codes
IR-1 Starting Point
9
Source
code
Parsing, Lexical
Analysis
Intermediate
representation
Code
Generation,
Optimization
Target
code
Analyze IR – Perform analysis on the results
Use this information for applications
Code Execution
IR-2. Abstract Syntax Tree (AST)
10

Concrete vs Abstract Syntax
 Concrete
show structure and is language-specific
 Abstract shows structure

Representations
 Parse
Tree represents Concrete Syntax
 Abstract Syntax Tree represents Abstract Syntax
IR-2. Example : Grammar
11

Example


a:= b+c (Language 1)
a = b+c; (Language 2)
Grammar for 1
stmtlist  stmt | stmt stmtlist
Ÿ
stmt  assign | if-then | …
assign  ident “:=“ ident binop ident
binop  “+” | “-” | …

Grammar for 2
stmtlist  stmt “;”| stmt “;” stmtlist
Ÿ
stmt  assign | if-then | …
assign  ident “=“ ident binop ident
binop  “+” | “-” | …

IR-2. Example: Parse Tree
12
Parse Tree for a:=b+c
Ident :=
a
Parse Tree for a=b+c;
stmtlist
stmtlist
stmt
stmt
assign
assign
ident binop
b
“+”
ident
c
Ident
a
=
“;”
ident binop
b
“+”
ident
c
IR-2 Example: Abstract Syntax Tree
13
Example
 Abstract
Syntax Tree for 1 and 2
1. a:=b+c
assign
2. a=b+c;
a
add
b
c
IR-3. Three Address Code
14


General form: x = y op z
More generally: (operator, operand1, operand2, result)



(at most 3 spots besides the operator)
May include temporary variables
Examples

Assignment




Copy x:=y
Jumps



Binary x:= y op z
Unary x := op y
(op, y, z, x)
(op, v, _, x)
(_, y, _, x)
Unconditional goto L (goto, L, _, _)
Conditional if x relop y goto L (relop, x, y, L)
….
IR-3. Example: Three Address Code
15
if a>10
then x=y+z
else
x=y-z





1. if a>10 goto 4
2. x = y-z
3. goto 5
4. x = y + z
5. …..
Analysis Levels
16

Local


Intraprocedural


within a single class
Interclass


across procedure boundaries, procedure call, shared globals, etc
Intraclass


within a single procedure, function, or method
Interprocedural


within a single basic block or statement
across class boundaries
…..
Outline
17




Introduction/motivations
Program representation
Control flow analysis
 Computing Control Flow (analysis and
representation)
 Search and Traversals
 Applications
Data flow
Computing Control flow (example)
18
Procedure AVG
S1
count=0;
S2
fread(fptr , n)
S3
while(not EOF) do
S4
if(n<0)
S5
return(error)
else
S6
nums[count]=n
S7
count++
endif
S8
fread(fptr , n);
endwhile
S9
avg= mean(nums , count)
S10
return (avg)
entry
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
EXIT
CF1: Control Flow (Basic Blocks)
19




A basic block is a sequence of consecutive
statements in which flow of control enters at the
beginning and leaves at the end without halt of
possibility of branch except at the end
A basic block may or may not be maximal
For compiler optimizations, maximal blocks are
desirable
For software engineering tasks, basic blocks that
represent one source code statement are often used
Computing Control flow (example)
20
Procedure AVG
S1
count=0;
S2
fread(fptr , n)
S3
while(not EOF) do
S4
if(n<0)
S5
return(error)
else
S6
nums[count]=n
S7
count++
endif
S8
fread(fptr , n);
endwhile
S9
avg= mean(nums , count)
S10
return (avg)
entry
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
EXIT
CF1: Computing Control Flow
21



Input: A list of program statements in some form
Output: A list of CFG nodes and edges
Procedure:
Construct basic blocks
 Create entry exit nodes; create edge (entry, B1); create
(exit, Bk) for each Bk that represents an exit from program
 Add CFG edge from Bi to Bj if Bj can immediately follow Bi
in some execution i.e.,

There is conditional or unconditional goto from last statement of Bi
to first statement of Bj or
 Bj immediately follows Bi in the order of the program and Bi does
not end in unconditional goto statement


Label edges that represent conditional transfers of control
CF2: Search and Ordering
22

Many ways to visit the nodes in the graph
 Depth
First Search: Visits descendants of the node
before visiting any of its siblings
 Breadth First Search: All of the node’s immediate
descendants are processed before any of their
unprocessed children
 Preorder Traversal: A node is processed before its
descendants
 Postorder Traversal: A node is processed after its
descendants
CF2: Search and Ordering (cont’d)
(DFS)
23
1

S3

S4

2
S5
S6
S7

S8
S9

S10
One DFS of CFG
13467810,back to
8,9, back to 8, 7,6,4,5, back to
4,3,1,2,back to 1
The number assigned to a node
during DFS is its depth first number
Depth first ordering of nodes is the
reverse of the order in which nodes
are visited in DFS
For the DFS, nodes are visited
1,3,4,6,7,8,10,8,9,8,7,6,5,4,3,1,2,1
Depth first ordering is
1,2,3,4,5,6,7,8,9,10
CF: Types of Edges
24


Depth first representation is
depth first spanning tree
along with other edges not
part of the tree; tree edges,
other edges
Three kinds of edges



Advanced (forward) edges:
go from a node to one of its
proper descendants in the
tree; these include tree edges
Back edges: go from a node
to one of its ancestor in the
tree
Cross edges: connect nodes
such that neither is an
ancestor of the other
Applications of Control Flow
25


Complexity – Pointers to
refactoring
Testing






2
3
4
5
6
Program Understanding


Branch, Path, Basis Path
Branch: Must test 1-2, 1-3, 4-5,
4-8, 5-6, 5-7
Path: Infinite, due to loop
Basis Path: Set of paths which
covers all the edges at least once
e.g. 1,2,4,8; 1,3,4,5,6,7,4,8
1
Recover program structure
Impact analysis
…..
8
7
Outline
26




Introduction/motivations
Program representation
Control flow
Data flow
 Introduction
 Reaching
definitions
Data flow - Introduction
27

Flow of various data throughout the program
 Obtained
from AST or CFG
 Used in software engineering tasks

Exact solutions to most data flow problems are
undecidable
 May
depend on input
 May depend on the outcome of a conditional statement
 May depend on termination of loop

Thus we compute approximations of the exact
solution
Data flow - Introduction
28

Some Approximations “overestimate” the solution



Some Approximations “underestimate” the solution




Approximations contain actual information plus some spurious
information but does not omit any actual information
Conservative and safe approach
Approximations may not contain all the information of the actual
solution
Unsafe
Research challenge: Providing safe but precise information
in an efficient way
Uses of data flow:


Compiler optimization requires conservative analysis
Software engineering tasks may only need unsafe info
Data flow – Compiler Optimization
29

Common subexpression elimination
c=a+b
d=a+b
=a
=a
e=a+b
=a
Data flow – Compiler Optimization
30

Common subexpression elimination
c=a+b
d=a+b
=a
=a
e=a+b
=a
t=a+b
c=t
t=a+b
d=t
c=a
c=a
e=t=a
Need to know available expressions: which expressions have
been computed at that point before this statement
Data Flow - Compiler Optimization
31

Register (de)allocation
 When
assigning memory locations to registers, if a
value in a register (ie a memory location) is not used
again, no need to keep it in a register
R1=R2+10 
 Is

=a
R2 needed after this statement?
Need to know “live variables”: which variables are
still used after current line
Data Flow - Compiler Optimization
32

Suppose every assignment that reaches this
statement assigns 5 to c
a=c+10 // need 3 registers=a
 then
‘a’ can be replaced by 15
a=15 //need 2 registers/a

But: Need to know reaching definitions: which
definition(s) of variable c reach this statement
Data Flow - Sw Eng Tasks
33

Data-Flow testing

Suppose that a statement assigns a value but the use of that
value is never executed under test
a=c+10=a
a never used on this path

d=a+y=a
Need to know definition use pairs: link between
definition(s) and use(s) of a variable (or a memory
location)
Data Flow - Sw Eng Tasks
34

Debugging
 Suppose
 Eg
that ‘a’ has an incorrect value in the statement
int overflow
a=c+y=a
d=a+y=a

Need data dependence information: some
statements produce erroneous values, others are
affected by those values
Data flow - Example
35

B1
1. i=2
2. k=i+1
Compute the flow of data
throughout the program


B2
3. i=1
B3
4. k=k+1
B4
5. k=k-4


Where does the assignment to
i in statement 1 reach?
Where does the expression
computed in statement 2
reach?
Which uses of variable are
reachable from the end of
Block1?
Is the value of variable i live
after statement 2?
Reaching definitions analysis
36

B1
1. i=2
2. k=i+1

B2
3. i=1
B3
4. k=k+1
B4
5. k=k-4

Definition = statement
where a variable is
assigned a value (e.g.
input statement,
assignment statement)
A definition of ‘a’ reaches
a point ‘p’ if there exists a
control flow path in the
CFG from the definition to
‘p’ with no other
definitions of ‘a’ on the
path
Such a path may exist in
the graph but may not be
possible – infeasible path
Reaching definitions analysis
37

B1
1. i=2
2. k=i+1
Of variable i:
 Of variable k:


B2
3. i=1
What are the definitions
in the program?
Which basic blocks
(before block) do these
definitions reach?
Def
 Def
 Def
 Def
 Def

B3
4. k=k+1
B4
5. k=k-4
1 reaches:
2 reaches:
3 reaches:
4 reaches:
5 reaches:
Reaching definitions analysis
38

B1
1. i=2
2. k=i+1
What are the definitions in
the program?



B2
B3
B4
3. i=1
4. k=k+1
5. k=k-4
Of variable i: 1,3
Of variable k: 2,4,5
Which basic blocks
(before block) do these
definitions reach?





Def
Def
Def
Def
Def
1 reaches: B2
2 reaches: B1, B2, B3
3 reaches: B1, B3, B4
4 reaches: B4
5 reaches: exit
Reaching definitions analysis
39

B1
1. i=2
2. k=i+1
B2
3. i=1
Method

Gen[B]: set of definitions
generated within B
 Kill[B]: set of definitions that, if
they reach the point before B,
won’t reach end of B


B3
B4
Compute two kinds of basic
information (within the block)
4. k=k+1
5. k=k-4
Compute two other sets by
propagation
IN[B]: set of definitions the reach
the beginning of B
 OUT[B]: set of definitions that
reach the end of B

Reaching definitions analysis
40
B1
B2
1. i=2
2. k=i+1
3. i=1
B3
4. k=k+1
B4
5. k=k-4
Init
GEN
Init
KILL
Init
IN
Init
OUT
IN
OUT
1
1,2
3,4,5
--
1,2
2,3
1,2
2
3
1
--
3
1,2
2,3
3
4
2,5
--
4
2,3
3,4
4
5
2,4
--
5
3,4
3,5
Iterative Data-Flow analysis algorithm
41



Algorithm for Reaching Definitions
Input: CFG with GEN[B], KILL[B] for all B
Output: IN[B], OUT[B] for all B
Begin RD
IN[B]=empty, OUT[B]=GEN[B] for all B; change = true
While change do begin
change=false
For each B do begin
IN[B]=union OUT[P] (P is a predecessor of B)
OLDOUT=OUT[B]
OUT[B]=GEN[B] union (IN[B]-KILL[B])
if (OUT[B]!=OLDOUT) then change = true;
End for
End while
End RD
Tools
42
Eclipse JDT/AST (APIs to construct, traverse and
manipulate AST)
http://www.vogella.de/articles/EclipseJDT/article.html
 Sourcerer
http://sourcerer.ics.uci.edu/index.html
 Crystal (Data Analysis Framework, mostly for
academic purposes)
http://code.google.com/p/crystalsaf/wiki/Installation

Mandatory Reading List
43



Representation and Analysis of Software – RepAnalysis.pdf
Crystal Notes – CrystalTutorialNotes.pdf,
CrystalTutorial.ppt
Eclipse JDT - AST http://www.vogella.de/articles/EclipseJDT/article.html
More (optional) Reading List
44



Principles of Program Analysis, Nielson and Hankin
Invariant Detection using Daikon – daikon.pdf
More optional readings available at Program Analysis
course material at CMU
http://www.cs.cmu.edu/~aldrich/courses/15-819M/
Download