powerpoint - The Stanford SUIF Compiler Group

advertisement
Interprocedural Program Analyses
David Heine Vladimir Livshits
Brian Murphy
Christopher Unkel
Hansel Wan
Stanford University
http://suif.stanford.edu/
Outline
I. Data structures for program analysis
II. Interprocedural analysis framework
III. Interprocedural passes and parallelizer
IV. Pointer alias analysis
I. Data structures: Lattice values
 Commonly used in data flow analysis
 bottom, top, meet operators
 Includes definitions of some common lattices, e.g.
 bitvectors, constants, intervals
Graphs
 Common algorithms
 Iterated dominance frontier
 strongly connected components
 Generates dot graph output
 Example: control flow graphs and call graphs
Region Graphs
 Capture the hierarchical program structure along side the statements
 An interpretation of the statements without dismantling them
 Useful for elimination-style algorithms
 A region
 has one entry and possibly multiple exits
 may be a terminal region (straight line control flow internally)
 or a composite region
 Flow between subregions is specified by
 control flow graph (adjacency lists)
 a regular expression (path expression with composition,
meet and Kleene star)
 Extensible with new nodes
Region Transformations
 Flattening regions
 Conversions from regular expression RE -> CFG and CFG -> RE
 May involve some code cloning
III. Interprocedural Analysis
 Two important design choices in program analysis
 Across procedures
No interprocedural analysis
Interprocedural: context-insensitive
Interprocedural: context-sensitive
 Within a procedure
Flow-insensitive
Flow-sensitive: interval/region based
Flow-sensitive: iterative over flow-graph
Efficient Context-Sensitive Analysis
call
inner loop
scc
 Bottom-up
 A region/interval: a procedure or a loop
 An edge: call or code in inner scope
 Summarize each region (with a transfer function)
 Find strongly connected components (sccs)
 Bottom-up traversal of sccs
 Iteration to find fixed-point for recursive functions
 Top-down
 Top-down propagation of values
 Iteration to find fixed-point for recursive functions
(sccs)
Interprocedural Framework Architecture
Driver
Bottom-up
Top-down
Linear traversal
Compound Handlers
Procedure calls and returns
Composite regions
User-def. handlers/lattice values
E.g. Array summaries
E.g. Mod/ref analysis
Data Structures
Call graphs, SCC, lattice values
Regions, control flow graphs
Interprocedural Framework Architecture
 Interprocedural analysis data structures
 e.g. call graphs, regions or intervals
 Handlers: Orthogonal sets of handlers for different groups of constructs
 Primitives: user specifies analysis-specific semantics of primitives
 Compound: handles compound statements and calls
User chooses between handlers of different styles
• e.g. no interprocedural analysis versus context-sensitive
• e.g. flow-insensitive vs. flow-sensitive
 All the handlers are registered in a visitor
 Driver
 Driver invoked by user’s request for information (demand driven)
 Build prepass data structures
 Invokes the right set of handlers in right order
(e.g. bottom-up traversal of call graph)
III. Interprocedural Passes
 Scalar analysis
 Mod/ref, reduction recognition: Bottom-up flow-insensitive
 Liveness for privatization: Bottom-up and top-down, flow-sensitive
 Constraint propagation: Top-down, flow-insensitive
 Array analysis
 Dependence analysis
 Privatization analysis
Region-Based Array Analysis
 Array sections are represented as sets of linear inequalities(Omega)
 Bottom-up and backward-flow analysis
 For each region: compute 4 sections for each array accessed
 M: may have been written
 W: must have been written
 R: may have been read
 E: (exposed-read) values read are defined before the region executes
 Dependence test
 $ iteration i, j s.t. Mi  Rj = 
 Privatization test
  iteration i, Ei = 
Example: ModRef Analysis
class ModRefProblem : public BUProblem {
public:
ModRefProblem(SuifEnv* suif_env, PtrAnalysisType the_ptrAnalysisType);
virtual void initialize();
...
}
ModRefProblem::ModRefProblem(SuifEnv* suif_env,
PtrAnalysisType the_ptrAnalysisType) :
BUProblem(suif_env, "ModRef",
new ModRefValue(), new ModRefValue(),
new ModRefUserBUHandler(suif_env, the_ptrAnalysisType),
new CallGraphIPBUHandler(suif_env),
new FlowInsensitiveIntraBUHandler(suif_env)),
ptrAnalysisType(the_ptrAnalysisType)
{
initialize();
}
}
Lattice Values
class ModRefValue : public LatticeValue {
public:
ModRefValue();
~ModRefValue();
AbslocSetValue* get_mod() const {return modVars;}
AbslocSetValue* get_ref() const {return refVars;}
virtual void do_meet(const LatticeValue* other, bool*
changed=NULL);
virtual LatticeValue* top() const;
virtual LatticeValue* id() const;
virtual void do_compose(const LatticeValue* other, bool*
changed=NULL);
virtual void do_star(const VariableSymbol * idx,
const Expression* lb, const Expression* ub,
bool* changed){};
virtual void do_widen(const LatticeValue* other, bool* changed);
LatticeValue* clone() const;
bool is_top() const;
bool is_id() const;
String to_string() const;
...
};
User-Defined Handler
class ModRefUserBUHandler : public UserBUHandler {
public:
ModRefUserBUHandler(SuifEnv* suif_env, PtrAnalysisType ptrAnalysisType);
virtual UNSHARED LatticeValue* handle_statement
(BUProblem* problem, Statement* stmt);
virtual LatticeValue* handle_simple_region
(BUProblem* problem, SimpleRegion* region);
virtual LatticeValue* handle_predicate_region
(BUProblem* problem, PredicateRegion* region);
virtual LatticeValue* handle_mwb_default_region
(BUProblem* problem, MWBDefaultRegion* region);
virtual LatticeValue* handle_eval_predicate_region
(BUProblem* problem, EvalPredicateRegion* region);
virtual LatticeValue* handle_undef_proc_region
(BUProblem* problem, UndefProcRegion* region);
...
};
Most of the work is done here!
UNSHARED LatticeValue* ModRefUserBUHandler::handle_statement
(BUProblem* problem, Statement* stmt
{
ModRefValue* curr_value = new ModRefValue();
for (SemanticHelper::SrcVarIter iter(stmt); iter.is_valid(); iter.next())
curr_value->add_ref(iter.current());
if(is_kind_of<StoreVariableStatement>(stmt)){
StoreVariableStatement* s = to<StoreVariableStatement>(stmt);
VarAbsLocation* dest =
VarAbsLocation::create_var_absloc(s->get_destination());
curr_value->get_mod()->add(dest);
}else{
if(is_kind_of<StoreStatement>(stmt)){ // *x = y
StoreStatement* s = to<StoreStatement>(stmt);
curr_value->get_mod()->do_join(
new AbslocSetValue(query->get_absloc_set(s), true));}}
return curr_value;
};
Parallelizer
 Parallelizes a loop if
 there is no abnormal exit out of a loop
 all scalar variables are either
read-only variables
privatizable variables
reduction variables
 all array variables
either have no dependence
or can be privatized
IV. Pointer Alias Analysis
 Steensgaard’s pointer alias analysis
 Flow-insensitive and context-insensitive, type-inference based analysis
 Very efficient: near linear-time analysis
 Very inaccurate
 A good bootstrapping step for interprocedural C program analysis
 Enables the construction of a call graph with indirect function calls
Context-Sensitive Pointer Analysis
 Implementation of the analysis described in
Scalable Context-Sensitive Flow Analysis Using Instantiation Constraints
Fahndrich, Rehof, Das, (PLDI ’00!) in SUIF 2.
 Context-sensitive, flow-insensitive flow analysis.
 Instantiation constraints represent caller-callee relationships.
 Handles function pointers smoothly, and is efficient.
 One application is pointer alias analysis.
 Implementation runs in three phases:
 constraint generation
 constraint solution
 reachability analysis
 Implemented in SUIF in ~6 weeks (as a first project in SUIF)
Demo of Two Visualization Tools
 From the implementation in SUIF, running on sizeable programs:
 Progress of the analysis
 Resulting type graphs
Progress Visualization
 Simple X windows progress monitor.
 One pixel for each node.
 Allocated in scan order as they are created.
 White: initially created; red: callee; green: caller; grey: merged node.
 Visualization results:
 Constraint generation:
white nodes created; some functions and call sites.
 Constraint solution:
nodes merged together and many greyed out.
Several passes of working down pointer chains: a=b, *a=*b, **a=**b.
Red and green spread to formal and actual parameters.
Some new nodes created for “product types”.
Scattered merging as the algorithm deduces flow through functions
 Nearly 1,000,000 nodes created for gcc. 2.5 minutes CPU time on this laptop.
Result Visualization
 pointergraph compress.suif compress.ps
 ghostview compress.ps &
 Resulting type graphs courtesy of Dot:
 Pointees below pointers.
 Arguments below functions.
 Callers below callees.
 Nodes marked with variable names.
 Optional grouping by function
(only for small programs.)
Download