Static Control-Flow Analysis for Reverse Engineering of UML Sequence Diagrams Atanas (Nasko) Rountev Ohio State University with Olga Volgin and Miriam Reddoch Example of a UML Sequence Diagram start:X p:A m1() m2() m3() create() opt n:A m4() PRESTO Research Group - Ohio State University Nasko Rountev PASTE'05 2 UML Sequence Diagrams Popular UML artifacts for modeling of object interactions Design-time sequence diagrams Reverse-engineered sequence diagrams Based on existing code Iterative development; design recovery for software maintenance; software testing Implemented in some commercial UML tools PRESTO Research Group - Ohio State University Nasko Rountev PASTE'05 Together ControlCenter (Borland) 3 Reverse-Engineering Analyses Dynamic analysis: tracks a set of representative run-time executions Several research tools Static analysis: examines only the code Commercial tools (deficiencies) Some research work (not comprehensive) RED tool for Java: PRESTO group at OSU URL: presto.cse.ohioPRESTO Research Group - Ohio State Nasko Rountev state.edu/red University PASTE'05 4 Representation of Intraprocedural Flow of Control Given: the methods whose bodies will be used to construct the diagram How should we represent the intraprocedural flow of control inside these bodies? Solution: general algorithm for mapping a method’s CFG to UML 2.0 interaction fragments Any reducible exception-free CFG Precise mapping: preserves all call sequences PRESTO Research Group - Ohio State Nasko Rountev Subsequent diagram transformations University PASTE'05 5 UML 2.0 Interaction Fragments Opt, alt, loop, break; added sd example generalized break :MergeCollation s:String e:PatternEntry patterns:Vector example(e) s = getChars() ALT i=charAt(0) i=indexOf(e) LOOP L e1= elementAt(i) OPT BREAK L fixEntry(e1) removeElementAt(i) PRESTO Research Group - Ohio State University Nasko Rountev PASTE'05 6 Analysis Stages CFG Phase I: Preprocessing Phase II: Fragment Construction Phase III: Transformations Data Structure for Fragments PRESTO Research Group - Ohio State University Nasko Rountev PASTE'05 7 Phase I: Preprocessing Post-dominance tree Node n2 post-dominates n1 if all paths from n1 to exit go through n2 Immediate post-dominator; parent in the tree Analyze branch nodes What is the merge point for all branches? Analyze loops Nesting relationships PRESTO Group - Ohio State Nasko Rountev Research What is the merge point for all loop University PASTE'05 8 Post-dominance Tree 1: i = -1 2: s = e.getChars() 12 T 6 7 9 11 4: e = s.charAt(0) F 3 4 5 10 3: s != null F 5: i = patterns.indexOf(e) 6: i>=0 8 T F 2 7: statusArray[i] !=0 T 8: e1 = patterns.elementAt(i) 1 9: e1 != null T F 10: patterns.removeElementAt(i) 11: fixEntry(e1) 12: exit PRESTO Research Group - Ohio State University Nasko Rountev PASTE'05 9 Branch Nodes and Branch Successors Branch successor: node where the outgoing paths for a branch node merge 1: i = -1 2: s = e.getChars() 12 T 6 7 9 11 4: e = s.charAt(0) F 3 4 5 10 3: s != null F 5: i = patterns.indexOf(e) 6: i>=0 8 T F 2 1 the branch successo r of 3 is 6 PRESTO Research Group - Ohio State University 7: statusArray[i] !=0 T 8: e1 = patterns.elementAt(i) 9: e1 != null T F 10: patterns.removeElementAt(i) 11: fixEntry(e1) 12: exit Nasko Rountev PASTE'05 1 0 Loops and Loop Successors Reducible CFG: contains only natural loops Loop successor: merge point of all paths exiting the loop 12 1: i = -1 2: s = e.getChars() T 3: s != null 4: e = s.charAt(0) 6 7 9 F 5: i = patterns.indexOf(e) 11 F 6: i>=0 T 3 4 5 10 8 F 7: statusArray[i] !=0 T the loop success 1 or of L is 12Group - Ohio State PRESTO Research 2 University 8: e1 = patterns.elementAt(i) 9: e1 != null T F 10: patterns.removeElementAt(i) 11: fixEntry(e1) 12: exit Nasko Rountev PASTE'05 11 Branch/Loop Successors Inside Loop L Consider only edges inside L Create a post-dominance tree for L and use it for: branch successors for nodes in L loop successors for loops nested in L 1: i = -1 2: s = e.getChars() T 3: s != null 4: e = s.charAt(0) F F 5: i = patterns.indexOf(e) 6: i>=0 T 6 F 7: statusArray[i] !=0 T 10 the 9 7 branch successo 8 r PRESTO Research Group Ohio of- 7 is State 10 University 8: e1 = patterns.elementAt(i) 9: e1 != null T F 10: patterns.removeElementAt(i) 11: fixEntry(e1) 12: exit Nasko Rountev PASTE'05 1 2 Analysis Stages CFG Phase I: Preprocessing Phase II: Fragment Construction Phase III: Transformations Data Structure for Fragments PRESTO Research Group - Ohio State University Nasko Rountev PASTE'05 1 3 Phase II: Fragment Construction TOP 1: i = -1 PatternEntry:getChars() 2: s = e.getChars() ALT 1 cond: s != null String:charAt(0) 3 : s != null Vector:indexOf(e) 4 5: i = patterns.indexOf(e) : e = s.charAt(0) LOOP 1 cond: i>= 0 F 6: i>=0 BREAK 1 T F 7: statusArray[i] !=0 OPT 1 cond: i<0 breaks_from: LOOP 1 cond: statusArray[i] !=0 T Vector:elementAt(i) 8 : e1 = patterns.elementAt(i) 9: e1 != null BREAK 2 cond: e1 != null breaks_from: LOOP 1 T MergeCollation:fixEntry(e1) F 10 : patterns.removeElementAt(i) 11 : fixEntry(e1) Vector:removeElementAt(i) 12: exit PRESTO Research Group - Ohio State University Nasko Rountev PASTE'05 1 4 Various Issues UML additions Multi-level break fragments Multiple method exits Opt-like fragments: return fragments Algorithm uses info about control dependencies Exceptions (Java) “throw e”: similar to method exit - throw fragment Ignore catches and implicit exceptions Node replication: the same CFG node may have to- Ohio produce multiple identical PRESTO Research Group State Nasko Rountev University PASTE'05 1 5 Average Running Time per Method [milliseconds] 50 45 40 35 30 25 20 15 10 5 io jfl by ex tec od ch e ec big ked de cim al ve c pu tor sh ba ck PRESTO Research Group - Ohio State University sq l htm l jes s mi pdf nd bri gh t zip ma th gz ip co lla tor da te de cim me al ss bo age un da rie s 0 Nasko Rountev PASTE'05 1 6 Methods Requiring Return/Throw Fragments 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% io jf by lex tec od ch e ec big ked de cim al ve c pu tor sh ba ck PRESTO Research Group - Ohio State University sq l htm l jes s mi pdf nd bri gh t zip ma th co lla tor da t de e cim me al s bo sage un da rie s gz ip 0% Nasko Rountev PASTE'05 1 7 Methods Requiring Multi-level Break Fragments 18% 16% 14% 12% 10% 8% 6% 4% 2% io jfl by ex tec od ch e ec big ked de cim al ve c pu tor sh ba ck PRESTO Research Group - Ohio State University sq l htm l jes s mi pdf nd bri gh t zip ma th co lla tor da te de cim me al s bo sage un da rie s gz ip 0% Nasko Rountev PASTE'05 1 8 Methods Requiring Node Replication 45% 40% 35% 30% 25% 20% 15% 10% 5% io jfl by ex tec od ch e ec big ked de cim al ve c pu tor sh ba ck PRESTO Research Group - Ohio State University sq l htm l jes s mi pdf nd bri gh t zip ma th co lla tor da te de cim me al s bo sage un da rie s gz ip 0% Nasko Rountev PASTE'05 1 9 Summary and Future Work General and fast algorithm Creates detailed and precise representation Subsequent simplifications Lossless: e.g. merge a fragment with the surrounding fragment [OSUCISRC-3/04-TR12] Lossy: e.g. give up on multi-level breaks Interactive visualization [VISSOFT’05] Collapse and un-collapse fragments; PRESTO Research Group - Ohio State w.r.t. a fragment Nasko Rountev slice the diagram of University PASTE'05 2 0