STRUCTURE Wellington Yu Chiu USC 4676 Admiralty Way

advertisement
STRUCTURE
COMPARISON AND SEMANTIC INTERPRETATION OF DIFFERENCES*
Wellington Yu Chiu
USC Information Sciences Institute
4676 Admiralty Way
Marina de1 Roy, California 90291
II AN
-- EXAMPLE
ABSTRACT
Frequently
situations
differentiate
between
are
encountered
objects
is one in which one is in a current
goal state.
Abstractly,
comparing
two
between
work
of
the
well
data
two
addressing
structures
applying
Al techniques
that
ability
to
and wishes to achieve
we shall address
Current
The
issues.
We
yield
semantic
a
Below
differentiate
encounters
between
objects
situations
techniques
is one in which one is in a current
goal
state.
guises
in
research
two
Such
our
[I,
this gap
by
interpretations
of
to the other.
Another
to effect
state
extension
of
structures.
determining
the
address
This
semantic
to identify
system
comparison
In
the
viowr
before
and
program
the
development
where
The after
application
An
applied
between
state
must be more semantically
states
from
in
text
based
differencer
a part
were
For
in
differencing
oriented.
For
We
that yield semantic
of my thesis:
cwpremed
are thaw
a
1
do
2
in
in text
by
text
do
from
in
space:3
:
6
text:
text
do
;:
in
text
by
from
text
:
e
f
space:
h9
was produced
from the before
state via
but the following explanations
generated
without
knowledge
of
BEFORE:
Delete
Delete
AFT&I:’
’
Insert
Insert
. * .
while
there
in
in
line
line
if in line
f
then
in line
4
4
f
of differences.
describes
of
[ 1 J.
- The
current
syntactic
differencing
techniques
in
[4, 5, 6, 7, 8, 9, lo] typically explain differences
the following terms:
the
well
above
but fall short of addressing
and
types
differences.
transformation.
to
is that of comparing
work
state
of a transformation
differences
a sequence
is to be
techniques
replay
after
the
we
the
design
of
- A higher level syntactic explanation is achieved by
generalizing
and combining
the techniques
for
syntactic
differencing
to explain
differences
in
terms of embeds, extracts, swaps and associations,
in
addition
to
inserts
and
deletes,
and
by
incorporating
syntactic
information
about
the
structures
being compared [3]
a
and its ongoing implementation.
* This work woa rupportrd by Nitional Scknce Foundation Grant MCS 7683880.
the
are
to show
from one
differences.
all differences
this gap by applying Al techniques
paper
several
problem.
differences
issues.
comparisons
the
we shall address
Current
interpretations
a
change does not match in the
and determining
all syntactic
semantic
situations,
the changes
a development,
the problem
structures
under
programming
case is the situation
called
on) a slightly different
Abstractly,
two
wishes
the semantic
AFTER:
while
there
exists
character
begin
if character
is Iinefeed
then
replace
character
if character
in text
then
i f P (character)
then remove
character
end:
to
The typical situation
encountered
based
a desired
of this second
two data
ability
case is one in which a transformation
and one
transformations,
(replayed
are
and need to discover
wish
current
the
A simple case is one in which we are given
states
to apply
where
state and wishes to achieve
transformation
2, 31.
program
situations
and deduce
BEFORE :
uhile
there
exists
character
if character
is linefeed
then
replace
character
uhile
there
exists
character
i f Pfcharacter)
then remove
character
but fall short
address
is necessary.
is presented
to infer
all differences
I INTRODUCTION
frequently
example
used
transformational
differences.
One
following
information
is that of
comparison
differences
all syntactic
semantic
the
The typical situation
and determining
structures.
the
state
the problem
in determining
where
is necessary.
The
of thr author.
259
second
loop is coerced
a conditional.
into
of
the
The
conditional
the first
- The
proposed
difference
is:
is
explanation
Loops
embedded
loop,
of
the
into
trees
for 23,
- Infer the similarity of 5,6 to e,f,g from the syntactic
equivalence
of 5,6 to f,g. 5,6 embedded in a test
for generator
consistency
inferred
from semantic
knowledge of loop merging.
semantic
merged.
- Conclusion:
The following
is presented
differencer
is the derivation
to show
of the syntactic
the mechanisms
explanation.
The
upon which the semantic
explanation
plausible
loop.
decond
merged
in a conditional
This
of the body.
to any
explanation:
is
done
into a similar
that
tests
without
The two adjacent
side effects,
caused
by the
objects
Our
current
difference
syntactic
available
by
imposing
thereby
structural
advance,
structure
the
on
dimensions
In the
from the proposed
in its
fall short
the
text
analysis
type
below,
based differencer
domain
we
are
made for the following
dealing
with
of
this
semantics
code for optimization.
Within
such
merged”
Building
a domain,
of changes
we
can
to derive
the
use
Our syntactic
is presented.
changes
are
the
the
constraints
following
explanation.
the derivation
derivation
of the comparison
interpretation
above,
that
“loops merged”.
merge
from
the
fact
that
the
The information
we
provided
are
by this
information.
differencer
makes use of the information
profile
used
super
of the
tree
is a brief
to
determine
in the differencer
combinations
base
provided
The
matches.
to determine
common unique,
techniques
description
techniques
We then
substructures
loop
where
structures
base
matches
common residual
described
in
and
[3, 4, 6, 7, lo].
of the common unique and common
for linear structures.
- Common Unique KU):
The key to this technique the
use of substructures
that are common to both
structures
and unique within each as anchors [4].
For
the
above
example,
the
common
unique
substructures
are:
linefeed,
replace,
space, P,
remove.
- Syntactically,
2,3 (the first loop body) is equivalent
to c,d and 5,6 (the second loop body) is equivalent
to f,g.
The
context
trees
(i.e. super
trees)
containing
2,3 and 5,6 and the context tree for
c,d,f,g are the same.
Infer FACTORING of context
trees with supports being the 2 to 1 mapping of
substructures
and the equivalence of context trees.
- Infer
on objects,
the
The main
- Longest
Common Subsequence
(LCS):
coniern
with LCS is that .of finding the minimal edit
distance between
two linear structures.
The only
edit ooerations
allowed are: delete a substructure,
or insert
a substructure
[6, 81.
For the above
example, the LCS is: 1,2,3,5,6 matching a,c,d,f,g.
“loop
that generated
is a proposed
the semantic
on the
explanation
and at the same time rule out the syntactic
on the mechanisms
A).
(profiles)
of
of:
object
techniques
residual
semantic
Appendix
Positional
Below
semantics
relations
substructures
of
an explanation
are
2. TO prepare
(see
consists
common
code
defining
the
- The sequence of productions used in the derivation
of the substructure,
given the context tree above
as the starting point.
reasons:
1. To optimize
loop body.
being
and the
Despite
is one where
loop
first
consistency
- Sequence of nonterminals from the left hand side of
productions
used in generating the context tree of
the substructure.
A context tree (i.e. super tree) is
that part of the parse tree with the substructure
part deleted.
by
The
by
are
profile
strings
[3].
start
comparing
currently
constraints
of the desired
SCENARIO section
semantic
this
techniques
making use of structural
the explanations
changes.
produces
It exte_nds
the
Ill DESIGN
OF THE SEMANTIC BASED DIFFERENCER
-e--p
in the first loop.
differencer
explanation.
compared,
extra
second loop embedded
the second
loop
loops can now be
- Infer coercion
of loop 4,5,6
to conditional e,f,g
based
on 5,6 being equivalent
to f,g, and the
similarity of the loop generator
to the conditional
predicate.
around
for
changing
- tnfer an embed of composite structure 4,5,6 into the
composite
structure
1,2,3 to produce a,b,c,d,e,f,g.
The support
for this inference
comes from 5,6
being equivalent
to f,g, the adjacency of 1,2,3 to
4,5,6, and the adjacency of c,d to f,g.
check
is not
a loop into
The body of the
body, that will not be caught by the loop generator
- Conclude
yield
this new conditional
is the desired
is embedded
subject
clifferencer
make sense to transform
to embed
consistency.
functionality
We
the
only
loop
by our syntactic
i? doesn’t
The following
generator
1,2,3 similar to composite
based
on
2,3
being
generated
because
a conditional
- Syntactically,
2,3 (first loop body of the before
state) is equivalent to c,d (part of the loop body of
the after state) and 5,6 is equivalent to f,g.
the
Loops merged.
It
will be based.
- Infer composite structure
structure
a,b,c,d,e,f,g,h
equivalent to c,d.
5,6 and c,d,f,g are loop generators.
context
example
260
build on these
that
contain
of this is inferring
base matches
substructures
that
1,2,3
by inferring
matches of
that are matches.
An
is similar to a,b,c,d,e,f,g,h
from
the assertion
inferred
and
that 2,3 matches c,d.
matches:
those
those
without
Syntactic
with
conflicts.
from embedding,
extracting
There
syntactic
boundary
or associating
Since
are two types Of
boundary
conflicts
we
1,2,3,4,5,6
conflicts
c,d even
result
though
boundary
substructures.
are
The third
here
type of profile
substructures
are: adjacency,
There
are
structure
description
profiles,
structure.
the
relations
semantics
considered
matches,
rules,
changes
due
and
the
to optimization
this set
needed
be
of
A
found
grammar,
in
rules
given
the
small when
our
in a transformation
of
for optimization,
is
differencer
LISP programs
written
in our
the
differencer
With this reduced
technique
In our example,
them
anchors
constraint
and
as a backup
intuitive
With
the
infer
that
4,5,6
f,g
and
this
tree.
two
with
the
With
asserted
relaxation
The super
same
the
position
super
of
tree
tree
with
relations
regarding
is similar to e,f,g because
without
scope
conflicting
is made.
respect
shows
4 and the conditional
(i.e.
relationships
predicate
between
the
relationships
and
relative
A). All the requirements
are met.
But once
that the operator
being
and that we can relax
to
and
body4
to
that
Final
body4.
the
of
for
analysis
we see that the derivation
involves
syntactic
domain
as well
the knowledge.
of the structures
interpretation
and
the
for
a design that
both syntactic
being compared
and
so as to
This allows us to
the information
information
of
semantic
as techniques
We present
of changes.
between
and
needed
provided
by
by
current
VI -APPENDIX A:
TEMPLATES
Every
is
following
FROM THE SEMANTIC DIFFERENCER
----I__
substructure
associated
with
of
the
it an Object
structures
Profile
being
compared
that is an record
has
with the
role names as fields:
to a
Content:
Value
Type
Sequence
of nonterminals
of
of the grammar,
productions,
in the derivation
of the
substructure.
Context:
content
2,3
to f,g, we now
Once made, further
by
tasks.
of
the
substructure.
used
Posi t ional
Context:
(i.e.
a
Position
in parse
tree.
sequence
of directions
in reaching
the substructure
from the root
of
the parse
tree.
Abetraction:
If a grammar
is used to generate
substructure,
this
refers
to the
sequence
of productions
used to
generate
the substructure
itself.
5,6 is common unique
evidence
the
in
more
applies.
substructures
being equivalent
makes
in body4.
above,
structure
differencers
differencing
has to do with
technique,
the
a semantic
current
and
technique
techniques
we could with
adjacency
to body4
body2
the gap that exists
the
requirement,
we
2,3 and c,d
This follows from the support
difference
knowledge
provide
are common to both
of the above
to c,d and 5,6
the assertion
reduced
generator
a
two
problem).
triggered
the issue of managing and applying
semantic
is relaxed.
equivalent
violations)
as
unique
for this third technique
occupying
super
equivalence
being
within each.
when neither
explanation
objects
common
well
common
matches).
as
equivalent
is embedded
and applying
addresses
to c,d
structure
of
managing
have been made, we use
(residual
be
the small example
semantic
knowledge
base
positional
The
of
explanations,
is triggered,
a loop generator
that body2
n-m
because
is
we discover
between
or
optimizations
that
such
of the two bodies.
Given
uses the
on
about
(see Appendix
body2
arise.
mapping
all plausible
being equivalent
that
2,3 to
the second loop is embedded
Semantics
considered,
relationship
the
isolate
on content
based
are
is indeed
bridge
in case no substructures
and unique
as a default
works
in the
to
2,3 is equivalent
relations
matches
for body2
cases
the
segmental
Semantics
are asserted
violations
is
With this assertion,
structure,
that
the inference
to a,b,c,d,e,f,g,h
say that
Factor
technique,
tree
V CONCLUSION
containing
the differencer
assertions
to produce
Factor
given
the
knowledge
the
the
(i.e.
matches.
matching.
similarity
to f,g.
assert
technique
condition
structures
both
to
content
residual
uniqueness
The
possible
unique
plausible
a
instance
is similar
Semantics
requirement
is performed
tries
relations
want
1,2,3
super
we assert
due to boundary
information.
substructure
substructure
5,6 is equivalent
all of these
as
scope,
to assert
first
we
the
this
one
segmental
For programs
differencing
composite
and substructure
Once
language,
in
But our
reveals
S-expressions.
The
trees.
unique
matches.
about
into the smallest
all changes.
common
used
first.
factored
programming
makes use of structural
it know
specification
parse
differences
acts
the
the
Our syntactic
This
Embed
the
IV -A SCENARIO
on
our
except
system.
For
that
positioning,
to the set of
based
that
assert
within
same
domain
compared
Given
When
object
tried, we add to our set of
guess,
or preparation
will be fairly
transformations
can
substructures
intuitive
commuting,
unfolding.
by our semantic
generating
between
our
and
of
conflicts
into
are common
a majority
distributing,
folding,
factor
currently
positioning.
that describe
With each set of examples
semantic
that
for
and
rules
embedding,
of
A. Factors
support
and relative
Some are: factoring,
extracting,
Appendix
are:
Considerations
semantic
changes.
oartial
the relationship
containment,
several
associating,
is one that describes
within a given structure.
via
a,b,c,d,e,f,g,h
violation
substructures
between
given,
matches
boundary
analysis of
the
loop
e.
261
the
A Relation
Profile
substructures,
The role
describes
names of this record
Content
Context
Base
Matches:
Inferred
Conflict
4.
Heckel, P., “A Technique for Isolating Differences Between
of the ACM 21, (41, April 1978,
Files,” Communications
264-268.
5.
Hirschberg,
D., The Longest Common Subsequence
Ph.D. thesis, Princeton University, August 1975.
6.
Hirschberg,
D., “Algorithms for the Longest Common
Subsequence
Problem,” journal of the ACM 24, (41, October
free
boundary
(i.e.
no syntactic
violations)
conflicts
(inferences
depends
heuristics
regarding
current
substructure
abstraction
and
weights
associated
with
substructure
matches).
substructure
substructure
substructure
relation
profile
two
of
the
substructures,
substructures
considerations
within
here
containment,
and
a
several
changes.
associating,
is
FACTOR
semantic
embedding,
description
the semantic
of
Hunt, J., An Algorithm for Differential
file Comparison, Bell
Laboratories,
Computer Science Technical Report 41, 1976.
8.
Hunt, J., “A Fast Algorithm for Computing Longest Common
Subsequences,”
Communications
of the ACM 20, (51, May
are:
1977,350-353.
relative
RI-IS:
KEY:
Segmental
REDUIREMENTS:
requirements
requirements
requirements
requirements
requirement9
requirements
a
opl
op2
op3
a majority
distributing,
Factor
Semantics
10.
of
commuting,
folding and unfolding.
the
interpretation
SEllANTICS:
FORM:
LHS:
CASES :
opl
is
rules that describe
Some are factoring,
extracting,
partial
generating
7.
9.
are
Below
used
in
above:
body1
body2
body3
body4
matching
opl=opZ
opl =op3
bodyl=body3
bodyZ=body4
relative
positioning
of
body1
to body2 holds
for
body3
to body4
adjacency
of body1 to
body2 holds
for body3 to
body4
loop generator
relaxations
body2=body4
relaxed
to
body2 similar
to body4
RELAXATIONS:
relaxations
relaxations
relaxations
Problem,
1977, 664-675.
matches
matches
matches
is one between
Some
on
positioning.
There
Baiter, R., Goldman, N., and Wile, D., “On the
Transformational
Implementation Approach to Programming,”
in Second International
Conference on Software
Engineering,
pp. 337-344,
IEEE, October 1976.
Chiu, W., Structure Comparison, 1979. Presented at the
Second International
Transformational
Implementation
Workshop in Boston, September,
1979.
of
structure
1.
are:
3.
structure.
adjacency
VII REFERENCES
Balzer, R., Transformational Implementation:
An Example,
Information Sciences Institute, Research Report 79-79,
September
1979.
n-m
A second
two
2.
l-l
2-1
Happ i ngs:
between
being compared.
Positional
constraint
and Content
from the largest
common
(i.e.
residual
substructure
technique
where
uniqueness
of context
matches
is relaxed).
With
given
relationships
(i.e.
common unique)
(i.e.
positional
determined
from context
trees)
Matches:
a
the
one from each of the structures
adjacency
requirement
relative
positioning
equivalence
(=I
to similar
262
Tai, K., Syntactic
Error Correction in Programming
Languages, Ph.D. thesis, Cornell University, January
Tai, K., “The Tree-to-Tree
Correction
the ACM 26, (3), July 1979, 422-433.
1977.
Problem,” Journal
of
Download