Automatic Extraction of Object-Oriented Component Interfaces John Whaley Michael C. Martin Monica S. Lam Computer Systems Laboratory Stanford University July 24, 2002 Motivation Component programming is widespread. Interface specifications are important! Ideally, we want formal specifications. Misunderstanding the API is a common source of error However, many components don’t have any specifications, formal or informal! Our goal: automatic generation of interface specifications For large, object-oriented programs Partial specifications July 24, 2002 ISSTA 2002 Slide 2 Why Automatic Extraction? Documentation Based on the actual code, so no divergence Rules Find errors in API usage Find for static or dynamic checkers API bugs Discrepancy between code & intended API Dynamic extraction: Evaluation of test coverage July 24, 2002 ISSTA 2002 Slide 3 Overview Component Model Product of Finite State Machines Static Analysis Dynamic Analysis and Checker Implemented for Java Analyzed >1 million lines of code Java class libraries Java 2 Enterprise Edition Java network libraries joeq virtual machine July 24, 2002 ISSTA 2002 Slide 4 Example: File Use a Finite State Machine (FSM) to express ordering constraints. read START open close END write July 24, 2002 ISSTA 2002 Slide 5 A Simple OO Component Model Each object follows an FSM model. One state per method, plus START & END states. Method call causes a transition to a new state. read START open m1 close END write July 24, 2002 ISSTA 2002 m2 m1 ; m2 is legal, new state is m2 Slide 6 Problem 1 An object has two fields, a and b. Each field must be set before being read. Solution: a product of FSMs, one for each field. START set_a set_a set_b set_b get_a get_a get_b get_b END July 24, 2002 ISSTA 2002 Slide 7 Splitting by fields START START START set_a set_a set_b set_b set_a set_b get_a get_a get_b get_b get_a get_b END END END Separate by fields into different, independent submodels. July 24, 2002 ISSTA 2002 Slide 8 Problem 2 getFileDescriptor state-preserving. Solution: Model for Socket is distinguish between state-modifying and state-preserving. Sstart TART create connect getFileDescriptor close END July 24, 2002 ISSTA 2002 Slide 9 State-preserving methods Sstart TART create getFileDescriptor connect m1 m1 is state-modifying m2 is state-preserving m1 ; m2 is legal, new state is m1 close END July 24, 2002 m2 ISSTA 2002 Slide 10 Summary of Model Product Per-thread, per-instance One of FSMs submodel per field Interprocedural mod-ref analysis • Identifies methods belonging to submodel • Separates state-modifying and state-preserving methods. One submodel per Java interface Implementation not required. July 24, 2002 ISSTA 2002 Slide 11 Extraction Techniques Static Dynamic For all possible program executions For one particular program execution Conservative Exact (for that execution) Analyze implementation Analyze component usage Detect illegal transitions Detect legal transitions Superset of ideal model (upper bound) Subset of ideal model (lower bound) July 24, 2002 ISSTA 2002 Slide 12 Static Model Extractor Defensive programming Implementation throws exceptions (user or system defined) on illegal input. public void connect() { connection = new Socket(); } public void read() { if (connection connection == null) throw new IOException(); } July 24, 2002 ISSTA 2002 START connect read Slide 13 Detecting Illegal Transitions Only Comparisons with constants, implicit null pointer checks Find support simple predicates <source, target> pairs such that: Source must execute: • field = const ; Target must execute: • if (field == const) throw exception; July 24, 2002 ISSTA 2002 Slide 14 Algorithm Source Constant at exit node Target method: Constant propagation method: Control dependence Throw of exception is control dependent on predicate July 24, 2002 ISSTA 2002 Slide 15 Dynamic Extractor Goal: find the legal transitions that occur during an execution of the program Java bytecode instrumentation For each thread, each instance of a class: Track last state-modifying method for each submodel. Same mechanism for dynamic checking Instead of adding to model, flag exception. July 24, 2002 ISSTA 2002 Slide 16 Experiences We applied our tool to several real-life applications. Program Description Java.net 1.3.1 Java libraries 1.3.1 J2EE 1.2.1 joeq Networking library General purpose library Business platform Java virtual machine July 24, 2002 ISSTA 2002 Lines of code 12,000 300,000 900,000 65,000 Slide 17 Automatic documentation java.util.AbstractList.ListItr slice on lastRet field (static) START set next, previous add remove July 24, 2002 ISSTA 2002 Slide 18 Automatic documentation J2EE TransactionManager (dynamic) Sstart TART begin commit suspend rollback resume END July 24, 2002 ISSTA 2002 Slide 19 Test coverage J2EE IIOPOutputStream (dynamic) START No self-edges implies a max recursion depth of 1 increaseRecursionDepth increaseRecursionDepth simpleWriteObject decreaseRecursionDepth END July 24, 2002 ISSTA 2002 Slide 20 Upper/lower bound of model SocketImpl model (dynamic) (+static) Sstart TART create getFileDescriptor connect available getInputStream getOutputStream close END July 24, 2002 ISSTA 2002 Slide 21 Finding API bugs Applied our tool to the joeq virtual machine START Expected API for jq_Method: load START Actual API for jq_Method: prepare prepare compile July 24, 2002 load setOffset compile ISSTA 2002 Slide 22 Related Work Dynamic Static Daikon (Ernst99) DIDUCE (Hangal02) K-limited FSM extraction (Reiss01) Machine-learning (Ammons02) Metal (Engler00) Vault (DeLine01), NIL, Hermes (Strom86) SLAM toolkit (Ball01) ESC (Detlefs98) ESC + Daikon (Flanagan01, Nimmer02) July 24, 2002 ISSTA 2002 Slide 23 Conclusion Product of FSM Model is simple, but useful Upper/lower Useful bound: static/dynamic for: Documentation generation Test coverage Rules for automatic checkers Finding API bugs July 24, 2002 ISSTA 2002 Slide 24