int foo() - Suif - Stanford University

advertisement
CS 343 presentation
Concrete Type Inference
Department of Computer Science
Stanford University
Concrete type analysis…
why we care
•
•
•
•
•
Runtime cost of virtual method resolution is high
Reduction of code size
Call graphs needed for interprocedural analysis
Function inlining
Inference algorithms very expensive – coming up
with efficient algorithms is the challenge
Fast Static Analysis of C++
Virtual Function Calls
Bacon and Sweeney
Overview
• Goal: Resolving virtual function calls
• Three Static analysis algorithms
– Unique name
– Class Hierarchy Analysis (CHA)
– Rapid Type Analysis (RTA)
Example
class A{
public:
virtual int foo(){return 1;};
}
class B: public A {
public:
virtual int foo(){return 2;};
virtual int foo(int t){return I+1};
}
void main(){
b* p = new B();
int result1 = p-> foo(1);
int result2 = p->foo();
A* q = p;
int result3 = q->foo();
}
A
B
int foo()
int foo()
int foo(int)
Unique Name
•
•
•
•
Link time process
Doesn’t require access to source code
Checks mangled name
Unique signature implies replacing virtual
call with direct call
Class Hierarchy
• Uses static declared type with class
hierarchy information
• Builds call graph
• Replaces virtual calls with direct calls when
there are no derived classes for the static
type
• Rely on type safety of language (sometimes
need to disable downcasts)
Rapid Type Analysis
• Starts with call graph generated from CHA
• Prunes the size the call graph based on static
information about class instantiation
• Flow insensitive like CHA
– results in efficiency 
– Inherits limitations of flow insensitive analysis 
• Rely on type safety of language (sometimes need
to disable downcasts)
Results
• What biases the results? (C++)
• Ran analysis algorithms on seven real
programs of varying size (large - small)
• RTA wins 4 out of 7 WHY? Discuss
• Static analysis can fail with certain
programming idioms (e.g. base*b = new sub() )
• Code Size: often reduces code size
dramatically
Practical Virtual Method Call
Resolution for Java
Sundaresan et al
Overview
• Study practical, context-insensitive, flow
insensitive techniques to resolve virtual
function calls in Java
• Present Reaching-type analysis
– Variable-type analysis
– Refers-to analysis
• Uses Soot(Jimple) framework
Three Groups of analysis
• Baseline (discussed previously)
– Class hierarchy analysis
– Rapid type Analysis
• Reaching type
– Declared type analysis
– Variable type analysis (more fine grain/accurate)
• Refers-to
– Developed for C but ported to
Reaching-type Analysis
• Build a type propagation graph
• Initialize the graph with type information
generated by new()
• Propagate type information along directed
edges
• Nodes are associated with all reaching types
Variable and Declared Type
Analysis
• Variable Type (pg 10)
– Uses variable name as the representative
• Declared Type (pg 11)
– Uses the type by which the initial variable was declared
– Puts all variables of the same declared type into the
same equivalence class
– Coarser and less precise
• Both algorithms have an initialization phase and
an propagation phase
• Size of propagation graph: O(C*Mc) edges
Refers-to Analysis
• Takes into account aliasing
• Nodes
– Reference nodes (locals, parameters, instance fields)
– Abstract location nodes (heap locations)
• Algorithm: Each reference node initially refers to
a unique abstract location, assignments merge
abstract locations as the algorithm progresses
Alternative Approaches
• Type prediction
–
–
–
–
Requires profiling code
Making the common case fast
Runtime type test
Resolves more calls
• Alias analysis
– Very expensive (interprocedural, flow sensitive)
• Sometimes static analysis is not possible e.g.
dynamically loaded classes based on command
line inputs or newly available classes.  Does
anyone see a way to address this?
Benchmarks and Results
• Ran on 9 programs, 7 of which are used in the
SPECjvm benchmark suite
• Variable type analysis best at improving call graph
precision
• Type based analysis more efficient because it build
nodes based on the classes in the program and not
each individual variable
• Table II shows exact numbers for how many
monomorphic edges…. So why couldn’t they
resolve all of these? How did they get this
information in the first place???
“The Cartesian Product
Algorithm”
Simple and Precise Type Inference of
Parametric Polymorphism
Polymorphism
• Explicit concrete type declarations undesirable for
programmer
• Algorithms must be used to infer types
• Parametric polymorphism: ability of routines to be
invoked on arguments of several different types
• CPA uses context sensitivity, whereas other
inference algorithms do not, this is key b/c CPA
uses different code for each context
Basic Type Inference Algorithm
• Step 1: Allocate type variables (associate a type
var with every slot and expression in the program)
• Step 2: Seed type variables (to capture the initial
state of the target program)
• Step 3: Establish constraints, propagate (builds a
directed graph that expresses propagation of types
through assignments)
• Basic algorithm analyzes polymorphism
imprecisely
Improvements on Basic
Algorithm
• 1-Level Expansion
– Different templates for each send
– Inefficient
• P-Level (precise, yet worst-case complexity
is exponential)
• Iterative algorithm (precise, more efficient
than expansion)
Cartesian Product Algorithm
• “There is no such thing as a polymorphic call,
only polymorphic call sites”
• Turns the analysis of each send into a case
analysis (makes exact type info available for each
case immediately, eliminates iteration)
• Maintain per-method pools of templates so that
template-sharing can be achieved (efficiency)
• Iteration is avoided because of
– Monotonicity of cartesian product
– Monotone context of application (iterative is not
monotone because comparing types for equality is not a
monotone function)
• Efficient and precise (also, no need to expand
away inheritance)
Precision improvements
possible?
• Yes
• mod arg = (self-(arg*(self div: arg) )
• x mod: y, where type(x) = type(y) = {smallInt,
float}
• Iterative algorithm infers {smallInt, float}
• CPA infers {smallInt}
• In this case, there is a benefit from having four
templates connected, one for each tuple in the
product of the types of x and y
Results
• “Extractor” – having less precise information
about type forces it to extract more
• CPA delivers the smallest extractions, and the best
CPU time of the different algorithms
• How generalizable are the results from the Self
system?
• How much type inference is even necessary for
the programs they benchmarked (Unix diff
command)?
Thanks 
caller
callee
Download