STRUCTURE COMPARISON AND SEMANTIC INTERPRETATION OF DIFFERENCES* Wellington Yu Chiu USC Information Sciences Institute 4676 Admiralty Way Marina de1 Roy, California 90291 II AN -- EXAMPLE ABSTRACT Frequently situations differentiate between are encountered objects is one in which one is in a current goal state. Abstractly, comparing two between work of the well data two addressing structures applying Al techniques that ability to and wishes to achieve we shall address Current The issues. We yield semantic a Below differentiate encounters between objects situations techniques is one in which one is in a current goal state. guises in research two Such our [I, this gap by interpretations of to the other. Another to effect state extension of structures. determining the address This semantic to identify system comparison In the viowr before and program the development where The after application An applied between state must be more semantically states from in text based differencer a part were For in differencing oriented. For We that yield semantic of my thesis: cwpremed are thaw a 1 do 2 in in text by text do from in space:3 : 6 text: text do ;: in text by from text : e f space: h9 was produced from the before state via but the following explanations generated without knowledge of BEFORE: Delete Delete AFT&I:’ ’ Insert Insert . * . while there in in line line if in line f then in line 4 4 f of differences. describes of [ 1 J. - The current syntactic differencing techniques in [4, 5, 6, 7, 8, 9, lo] typically explain differences the following terms: the well above but fall short of addressing and types differences. transformation. to is that of comparing work state of a transformation differences a sequence is to be techniques replay after the we the design of - A higher level syntactic explanation is achieved by generalizing and combining the techniques for syntactic differencing to explain differences in terms of embeds, extracts, swaps and associations, in addition to inserts and deletes, and by incorporating syntactic information about the structures being compared [3] a and its ongoing implementation. * This work woa rupportrd by Nitional Scknce Foundation Grant MCS 7683880. the are to show from one differences. all differences this gap by applying Al techniques paper several problem. differences issues. comparisons the we shall address Current interpretations a change does not match in the and determining all syntactic semantic situations, the changes a development, the problem structures under programming case is the situation called on) a slightly different Abstractly, two wishes the semantic AFTER: while there exists character begin if character is Iinefeed then replace character if character in text then i f P (character) then remove character end: to The typical situation encountered based a desired of this second two data ability case is one in which a transformation and one transformations, (replayed are and need to discover wish current the A simple case is one in which we are given states to apply where state and wishes to achieve transformation 2, 31. program situations and deduce BEFORE : uhile there exists character if character is linefeed then replace character uhile there exists character i f Pfcharacter) then remove character but fall short address is necessary. is presented to infer all differences I INTRODUCTION frequently example used transformational differences. One following information is that of comparison differences all syntactic semantic the The typical situation and determining structures. the state the problem in determining where is necessary. The of thr author. 259 second loop is coerced a conditional. into of the The conditional the first - The proposed difference is: is explanation Loops embedded loop, of the into trees for 23, - Infer the similarity of 5,6 to e,f,g from the syntactic equivalence of 5,6 to f,g. 5,6 embedded in a test for generator consistency inferred from semantic knowledge of loop merging. semantic merged. - Conclusion: The following is presented differencer is the derivation to show of the syntactic the mechanisms explanation. The upon which the semantic explanation plausible loop. decond merged in a conditional This of the body. to any explanation: is done into a similar that tests without The two adjacent side effects, caused by the objects Our current difference syntactic available by imposing thereby structural advance, structure the on dimensions In the from the proposed in its fall short the text analysis type below, based differencer domain we are made for the following dealing with of this semantics code for optimization. Within such merged” Building a domain, of changes we can to derive the use Our syntactic is presented. changes are the the constraints following explanation. the derivation derivation of the comparison interpretation above, that “loops merged”. merge from the fact that the The information we provided are by this information. differencer makes use of the information profile used super of the tree is a brief to determine in the differencer combinations base provided The matches. to determine common unique, techniques description techniques We then substructures loop where structures base matches common residual described in and [3, 4, 6, 7, lo]. of the common unique and common for linear structures. - Common Unique KU): The key to this technique the use of substructures that are common to both structures and unique within each as anchors [4]. For the above example, the common unique substructures are: linefeed, replace, space, P, remove. - Syntactically, 2,3 (the first loop body) is equivalent to c,d and 5,6 (the second loop body) is equivalent to f,g. The context trees (i.e. super trees) containing 2,3 and 5,6 and the context tree for c,d,f,g are the same. Infer FACTORING of context trees with supports being the 2 to 1 mapping of substructures and the equivalence of context trees. - Infer on objects, the The main - Longest Common Subsequence (LCS): coniern with LCS is that .of finding the minimal edit distance between two linear structures. The only edit ooerations allowed are: delete a substructure, or insert a substructure [6, 81. For the above example, the LCS is: 1,2,3,5,6 matching a,c,d,f,g. “loop that generated is a proposed the semantic on the explanation and at the same time rule out the syntactic on the mechanisms A). (profiles) of of: object techniques residual semantic Appendix Positional Below semantics relations substructures of an explanation are 2. TO prepare (see consists common code defining the - The sequence of productions used in the derivation of the substructure, given the context tree above as the starting point. reasons: 1. To optimize loop body. being and the Despite is one where loop first consistency - Sequence of nonterminals from the left hand side of productions used in generating the context tree of the substructure. A context tree (i.e. super tree) is that part of the parse tree with the substructure part deleted. by The by are profile strings [3]. start comparing currently constraints of the desired SCENARIO section semantic this techniques making use of structural the explanations changes. produces It exte_nds the Ill DESIGN OF THE SEMANTIC BASED DIFFERENCER -e--p in the first loop. differencer explanation. compared, extra second loop embedded the second loop loops can now be - Infer coercion of loop 4,5,6 to conditional e,f,g based on 5,6 being equivalent to f,g, and the similarity of the loop generator to the conditional predicate. around for changing - tnfer an embed of composite structure 4,5,6 into the composite structure 1,2,3 to produce a,b,c,d,e,f,g. The support for this inference comes from 5,6 being equivalent to f,g, the adjacency of 1,2,3 to 4,5,6, and the adjacency of c,d to f,g. check is not a loop into The body of the body, that will not be caught by the loop generator - Conclude yield this new conditional is the desired is embedded subject clifferencer make sense to transform to embed consistency. functionality We the only loop by our syntactic i? doesn’t The following generator 1,2,3 similar to composite based on 2,3 being generated because a conditional - Syntactically, 2,3 (first loop body of the before state) is equivalent to c,d (part of the loop body of the after state) and 5,6 is equivalent to f,g. the Loops merged. It will be based. - Infer composite structure structure a,b,c,d,e,f,g,h equivalent to c,d. 5,6 and c,d,f,g are loop generators. context example 260 build on these that contain of this is inferring base matches substructures that 1,2,3 by inferring matches of that are matches. An is similar to a,b,c,d,e,f,g,h from the assertion inferred and that 2,3 matches c,d. matches: those those without Syntactic with conflicts. from embedding, extracting There syntactic boundary or associating Since are two types Of boundary conflicts we 1,2,3,4,5,6 conflicts c,d even result though boundary substructures. are The third here type of profile substructures are: adjacency, There are structure description profiles, structure. the relations semantics considered matches, rules, changes due and the to optimization this set needed be of A found grammar, in rules given the small when our in a transformation of for optimization, is differencer LISP programs written in our the differencer With this reduced technique In our example, them anchors constraint and as a backup intuitive With the infer that 4,5,6 f,g and this tree. two with the With asserted relaxation The super same the position super of tree tree with relations regarding is similar to e,f,g because without scope conflicting is made. respect shows 4 and the conditional (i.e. relationships predicate between the relationships and relative A). All the requirements are met. But once that the operator being and that we can relax to and body4 to that Final body4. the of for analysis we see that the derivation involves syntactic domain as well the knowledge. of the structures interpretation and the for a design that both syntactic being compared and so as to This allows us to the information information of semantic as techniques We present of changes. between and needed provided by by current VI -APPENDIX A: TEMPLATES Every is following FROM THE SEMANTIC DIFFERENCER ----I__ substructure associated with of the it an Object structures Profile being compared that is an record has with the role names as fields: to a Content: Value Type Sequence of nonterminals of of the grammar, productions, in the derivation of the substructure. Context: content 2,3 to f,g, we now Once made, further by tasks. of the substructure. used Posi t ional Context: (i.e. a Position in parse tree. sequence of directions in reaching the substructure from the root of the parse tree. Abetraction: If a grammar is used to generate substructure, this refers to the sequence of productions used to generate the substructure itself. 5,6 is common unique evidence the in more applies. substructures being equivalent makes in body4. above, structure differencers differencing has to do with technique, the a semantic current and technique techniques we could with adjacency to body4 body2 the gap that exists the requirement, we 2,3 and c,d This follows from the support difference knowledge provide are common to both of the above to c,d and 5,6 the assertion reduced generator a two problem). triggered the issue of managing and applying semantic is relaxed. equivalent violations) as unique for this third technique occupying super equivalence being within each. when neither explanation objects common well common matches). as equivalent is embedded and applying addresses to c,d structure of managing have been made, we use (residual be the small example semantic knowledge base positional The of explanations, is triggered, a loop generator that body2 n-m because is we discover between or optimizations that such of the two bodies. Given uses the on about (see Appendix body2 arise. mapping all plausible being equivalent that 2,3 to the second loop is embedded Semantics considered, relationship the isolate on content based are is indeed bridge in case no substructures and unique as a default works in the to 2,3 is equivalent relations matches for body2 cases the segmental Semantics are asserted violations is With this assertion, structure, that the inference to a,b,c,d,e,f,g,h say that Factor technique, tree V CONCLUSION containing the differencer assertions to produce Factor given the knowledge the the (i.e. matches. matching. similarity to f,g. assert technique condition structures both to content residual uniqueness The possible unique plausible a instance is similar Semantics requirement is performed tries relations want 1,2,3 super we assert due to boundary information. substructure substructure 5,6 is equivalent all of these as scope, to assert first we the this one segmental For programs differencing composite and substructure Once language, in But our reveals S-expressions. The trees. unique matches. about into the smallest all changes. common used first. factored programming makes use of structural it know specification parse differences acts the the Our syntactic This Embed the IV -A SCENARIO on our except system. For that positioning, to the set of based that assert within same domain compared Given When object tried, we add to our set of guess, or preparation will be fairly transformations can substructures intuitive commuting, unfolding. by our semantic generating between our and of conflicts into are common a majority distributing, folding, factor currently positioning. that describe With each set of examples semantic that for and rules embedding, of A. Factors support and relative Some are: factoring, extracting, Appendix are: Considerations semantic changes. oartial the relationship containment, several associating, is one that describes within a given structure. via a,b,c,d,e,f,g,h violation substructures between given, matches boundary analysis of the loop e. 261 the A Relation Profile substructures, The role describes names of this record Content Context Base Matches: Inferred Conflict 4. Heckel, P., “A Technique for Isolating Differences Between of the ACM 21, (41, April 1978, Files,” Communications 264-268. 5. Hirschberg, D., The Longest Common Subsequence Ph.D. thesis, Princeton University, August 1975. 6. Hirschberg, D., “Algorithms for the Longest Common Subsequence Problem,” journal of the ACM 24, (41, October free boundary (i.e. no syntactic violations) conflicts (inferences depends heuristics regarding current substructure abstraction and weights associated with substructure matches). substructure substructure substructure relation profile two of the substructures, substructures considerations within here containment, and a several changes. associating, is FACTOR semantic embedding, description the semantic of Hunt, J., An Algorithm for Differential file Comparison, Bell Laboratories, Computer Science Technical Report 41, 1976. 8. Hunt, J., “A Fast Algorithm for Computing Longest Common Subsequences,” Communications of the ACM 20, (51, May are: 1977,350-353. relative RI-IS: KEY: Segmental REDUIREMENTS: requirements requirements requirements requirements requirement9 requirements a opl op2 op3 a majority distributing, Factor Semantics 10. of commuting, folding and unfolding. the interpretation SEllANTICS: FORM: LHS: CASES : opl is rules that describe Some are factoring, extracting, partial generating 7. 9. are Below used in above: body1 body2 body3 body4 matching opl=opZ opl =op3 bodyl=body3 bodyZ=body4 relative positioning of body1 to body2 holds for body3 to body4 adjacency of body1 to body2 holds for body3 to body4 loop generator relaxations body2=body4 relaxed to body2 similar to body4 RELAXATIONS: relaxations relaxations relaxations Problem, 1977, 664-675. matches matches matches is one between Some on positioning. There Baiter, R., Goldman, N., and Wile, D., “On the Transformational Implementation Approach to Programming,” in Second International Conference on Software Engineering, pp. 337-344, IEEE, October 1976. Chiu, W., Structure Comparison, 1979. Presented at the Second International Transformational Implementation Workshop in Boston, September, 1979. of structure 1. are: 3. structure. adjacency VII REFERENCES Balzer, R., Transformational Implementation: An Example, Information Sciences Institute, Research Report 79-79, September 1979. n-m A second two 2. l-l 2-1 Happ i ngs: between being compared. Positional constraint and Content from the largest common (i.e. residual substructure technique where uniqueness of context matches is relaxed). With given relationships (i.e. common unique) (i.e. positional determined from context trees) Matches: a the one from each of the structures adjacency requirement relative positioning equivalence (=I to similar 262 Tai, K., Syntactic Error Correction in Programming Languages, Ph.D. thesis, Cornell University, January Tai, K., “The Tree-to-Tree Correction the ACM 26, (3), July 1979, 422-433. 1977. Problem,” Journal of