Analysis of Automated Code Refactoring

advertisement
Riggs
Analysis of Refactoring
Draft-2
Analysis of Automated Code Refactoring
Introduction
During the 70’s many software developers learned techniques to structure their procedural
language code. Critical techniques such as high cohesion, low coupling, span of control, scope of
effect, and others [1] were taught to educate programmers in the development of highly maintainable
code. With the advent of object oriented languages, much work has been done on how to structure
and partition objects during the design phase of software development. One technique, refactoring
has emerged as a recommended technique for designers.
Refactoring is a key design phase technique for restructuring and redesigning working object
oriented software code. It originated in the Smalltalk community and is spreading into software
development shops where highly maintainable and reusable object oriented code are desirable.
Some basic heuristics of refactoring are documented by Martin Fowler in his book Refactoring :
Improving the Design of Existing Code [2]. The intent of this article is to examine a few of the
recommended refactoring techniques explicitly from the viewpoint of developing an automated
general refactoring software tool usable with code written in any object oriented language given the
necessary compiler and a simple code generator. Within that framework, the perspective of this
article is on accomplishing the refactoring automatically rather than selecting needed specific
refactoring techniques. While selecting the specific refactorings is of great interest, it is the more
difficult task and may depend on knowledge not normally codified.1
Approach
To share the knowledge of how these methods might be automated, a few simple examples
were selected. After a brief description of our general method, selected examples are presented
1
For example, what might be required in future uses of the code. [Fowler, p. 6, p. 144]
1/12
2/13/2016
Riggs
Analysis of Refactoring
Draft-2
using the proposed process for automation. Each analysis begins with a formal definition of the
specific refactoring using a grammar-like text transformation. After establishing this basis, we
discuss what knowledge must be parsed from the original code.2 Given this parsed knowledge, the
refactoring method becomes a set of transformation rules the parse analysis. We express this
transformation by rules. Execution of these rules produces a set of output records used later in the
code re-writing. The new refactored code lines are generated by a language dependent code
generator. Finally the new lines are edited into the original code. The complete sequence is carried
out as in the figure below:
Code
Viewer
1 Analysis
2 Method
3 Code
4 Edit
Code
Viewer
This architecture was chosen for the following reasons.
1. While most of the refactoring code requires parsing, we parse only relevant parts of
the selected code segments. This parser is constructed from commonly available grammars and
parser generators. In our example, a readily available grammar for Java and yacc were used.
2. The transformation of a method requires reasoning about small segments of code.
Fowler presents his refactorings as heuristics defined by a sequence of steps. Thus the scope and
nature of the methods make it natural to use production rules for these steps. Rules can also be
readily used to check constraints on the method steps.
These rules output an intermediate code representing the content of the different
statements required to make and call the new method. Thus they give important basics like the
2
We haved assumed that variable names are ‘standardized apart’ in the original code.
2/12
2/13/2016
Riggs
Analysis of Refactoring
Draft-2
name of the function, the form of the call, the type and names of local variables, but avoid
language specific details. Such rules should be standard across families of languages such as
procedural languages (e.g. C++ or JAVA).
3. The actual lines of code requiring changes or additions necessarily depend on the
syntax of the target language. Therefore this requires a separate module. Facts from the method
module are adequate for these routines.3 The output of this module is a set of lines in the target
language and the editing commands to introduce them into the original code.
4. The final editing is left to system utilities. In this manner, available software is
reused and we take advantage of its optimized state. Note, for example, that a simple refactoring
action like renaming may require the processing of large files.
In the following sections the automation technique is described for two methods. First, a
brief description of the refactoring technique is given Second, a semi-formal definition is presented
for the technique. This is followed by a definition of the inputs from analysis and then a set of rules
for the automation method. Finally the output records sent to the code routines are defined.
Extract Method
The first refactoring technique chosen for automation is Extract Method. The intent of the
technique is to investigate the code in a class and restructure any identified long segments with high
semantic distance" [2]. The result of using this technique is the breaking up of the low cohesion long
segments into one to many small code segments with resulting high cohesion. The mechanics of this
activity require the creation of new methods, function headers, parameter lists, declarations for any
orphaned variables, return statements, and suitable call statements to the new methods. The intent is
For example, in Fowler’s JAVA examples, extracted methods have a visibility of private. [2] This can be assumed for
Extract Method in JAVA. Other languages would require dufferent identifiers, syntax or approaches.
3
3/12
2/13/2016
Riggs
Analysis of Refactoring
Draft-2
to remove the cohesive small code segments safely. This refactoring requires re-compilation and retesting to ensure correctness after the refactoring is complete.
Formal Description
A formal description of extract method is given below:
fn (pm ) { p c s }  fn (pm) { p CALL s } FN ( PM) { DECL c RTRN }
where c represents a block, i.e. :
fn (pm ) { p { c } s } parses
The variables represent the information of code elements as given below:
fn is the function header of the original method, FN of the new method
pm is the parameter list of the old method, PM of the new method
c is the code to be extracted
p is the code in the body of fn before (prefix to ) c
s is the code after (suffix to ) c in the body of fn
DECL are the needed declarations
RTRN is a new return statement or is null
Output of Analysis
Analysis must find variables defined and used in the code segments pm, p and c. The parser
locates these variables giving the facts below:
DF (pm) , DF ( p ) , DF (c)
where DF(x) = { (t v) | v is defined in x and has typed t }
LV(p), LV(c)
where LV(x) = { v | v appears as an l-value4 in x}
RV(p), RV(c)
where RV(x) = { v | v appears as an r-value in x}
By parsing c as a block, the parse establishes that the code segment c is well-formed
syntactically. Incorrect placement of markers would cause the parse to fail. This also assures that
no variable defined in p is used in s, since it would be outside the scope of the block.
4
Recall that l-values are written, r-values are read.
4/12
2/13/2016
Riggs
Analysis of Refactoring
Draft-2
We do need minor flow analysis, carried out in rules, of p c s to determine if an assigned
variable is an accumulator in p c s. Using the facts DF(s), LV(s) and RV(s) along with the block and
assignment structure of the original code, rules check for any circular dependencies on variables in
p. This analysis is needed only for the return value. At this point a new function name is required
from user input:
FN.name
= user_input( )
Rules
Given these analysis facts, the following rules carry out the method. CLIPS rules are given
in appendix A. The rules are evaluated in order.
1. If only one value is a l-value in c (i.e. LV(c) - (DF(pm)  DF(p)) - DF(c) is a singleton, the
new function returns that value. Otherwise the return is null.
2. If the new function returns a value, the type of the new function is the type of the value.
3. All variables used and not assigned in c, but declared in the original function before c, must
be passed as value parameters. (Unless the variable is returned as the value of the new function.)
4. All variables assigned in c, but declared in the original function before c, must be passed as
value parameters. (Unless the variable is returned as the value of the new function.)5
5. Variables used in c but not passed must be declared in the new function.
6. If a variable is returned and if it is not an accumulator, CALL is: var = FN.name vars
(PM). If it is an accumulator, then CALL is: var += FN.name vars ( PM). Otherwise the call is
just an expression:
FN  vars ( PM) .
Outputs
These rules produce the following output records for input to the code re-writing module:
5
Note some languages (e.g. JAVA) would reject this. We leave this check to the language specific re-write module.
5/12
2/13/2016
Riggs
Analysis of Refactoring
Draft-2
FN = name type
CALL = ( assign | reassign | nil ) (var | NULL)
PM = (type var (#val | #ref ) ) *
RTRN = var | NULL
DECLS = (type var )*
;; parse analysis
(DF pm )
(DF p double totalAmount int frequentRenterPoints Enumeration rentals
string result Rental each )
(DF c )
(LV p totalAmount frequentRenterPoints rentals result thisAmount each)
(LV c thisAmount)
(RV c each Movie)
;; user input
(FN amountFor )
Fig. 2 - Analysis of Parse for code [2, P 10-11]
Example
We use an example from Fowler’s text [2]. Figure 2 shows the analysis facts for the Extract
Method refactoring technique as represented in CLIPS. The facts state that the original routine had
no parameters defined, that the variables totalAmount, frequentRenterPoints, rentals are defined
in p, that no variables were defined in the code segment to extract (c). These facts also show that p
assigns values to totalAmount, frequentRenterPoints, rentals, result, thisAmount and each,
however the extracted code segment (c) assigns only thisAmount. The segment c references each
and Movie. Moreover thisAmount is not an accumulator since it is defined within the scope of the
while loop containing the assignment.
f-13
f-14
f-15
f-16
f-17
(RTRN thisAmount)
(FN name amountFor type double)
(PM Rental each #val)
(DECL double thisAmount)
(CALL assign thisAmount)
6/12
Figure 3 – output from CLIPS rules
2/13/2016
Riggs
Analysis of Refactoring
Draft-2
The output of the CLIPS rules, shown in Figure 3, gives the content of the actual code to be
generated and subsequently edited into the program text. Although this is not the specific concern,
the resulting generated code is shown to better relate the output records to the examples in the text.
To further investigate the automation of these techniques, the refactoring technique move method is
evaluated next.
RTRN =
FN =
PM =
DECL =
CALL =
"return thisAmount;"
"private double amountFor"
"rental each"
"double thisAMount;"
" thisAmount = amountFor ( each );"
C=
" private double amountFor ( rental each ) {
double thisAmount;
// original code here
return thisAmount;
}
Figure 4 –edit strings and final code derived from CLIPS output
Move Method
The refactoring technique, Move Method, locates code that would be better placed in another
class due to high coupling [2] between the method and another class. It results in the movement of a
method from one class to another.
Formal Description
Let A be the current class of the method fn and B the class targeted to eventually hold the
method. Thus we have6:
6
class A x fn ( y B b z ) { c } w };
// x (w) are all code before (after) method fn
// y (z) are parameters before (after) parameter b
class B r };
// r is all the code after the class name
We avoid here the complexity of B being a subtype of A as in Fowler, p143-145.
7/12
2/13/2016
Riggs
Analysis of Refactoring
Draft-2
The transformation target (after all steps are carried out) can be described as:
class A x w };
// old method ' fn ( y B b z ) { c }' removed
class B r FN ( y z ) { C } };
// new function added to end of B (after r)
Program | "fn(y B b z)"/ "b.FN(y z)"
// all calls to fn replaced by calls to FN on b
In Fowler’s refactoring technique, this is accomplished in two steps. First a new method is
created in B and delegated to from C. This requires creating a CALL to replace c, moving c to B
and ‘fixing’ c to give C. The step1 transformation requires the following:
FN = fn
// Fowler also renames at this point, we have not
PM = y z
// new parameters omit B object name
C = c | b.str  self.str
// replace parameter name with self reference7
CALL = b.FN ( y z )
// delegate to new method
This gives the code of figure 5, which is then compiled and tested.
class A x fn ( y B b z ) { b.FN(y,z); } w };
class B r FN ( y z ) { C } };
Figure 5 - code after step 1, delegate function
In the second step the references are modified throughout the program and the original
function is removed if it was private or we want to change the interface of the class. If the method is
private:
Class A … | fn ( x?X b?B y?Y)  b.FN ( x y) ;
fn (y B b z) { CALL }  
This states there is a need to replace references to fn in the class by a calls to FN on B (if they have
the same class signature as x, b and y) and remove the original method.
7
A more replete analysis could determine if a simple variable reference would be adequate, e.g. for a getter or setter
method.
8/12
2/13/2016
Riggs
Analysis of Refactoring
Draft-2
If the method is public and we can change the interface then the transformation below is
necessary:
Prog | ?A.fn ( x ?b y) ?b.FN ( x y)
// replace all calls on class A objects by new
We then finish the method by doing a compile and test as always.
Conclusion
Using these two simple examples, it is depicted how automation is possible for refactoring
techniques. Since the refactoring activity is labor intensive, it is important that we investigate the
promise for automation of other needed refactoring techniques. This research establishes a workable
foundation for the needed investigation and subsequent automation. The ability to formally specify,
establish the rules of automation, and define mechanisms for automation is especially useful in the
development community.
More research is necessary to determine how to select specific techniques. Additionally,
more work is needed to define how to apply the automated solutions in a knowledgeable manner.
While additional work such as these is needed, this research does lay a foundation for future work
References
[1] Yourdan, Edward, Constantine, Larry, Structured Design , Englewood Cliff, NJ.,
Prentice-Hall, 1978.
[2] Fowler, Martin, Refactoring: Improving the Design of Existing Code, Addison-Wesley ,
2000
[JAVA] JAVA, Bronnikov, Dmitri, Free Yacc-able Java(tm) grammar, 1998
http://home.inreach.com/bronikov/grammars/java.html , (7/13/01)
9/12
2/13/2016
Riggs
Analysis of Refactoring
10/12
Draft-2
2/13/2016
Riggs
Analysis of Refactoring
Draft-2
Appendix A – Example CLIPS rules for Extract Method
;; ===== transformation rules
;; determine returned values
(defrule RTRN
(declare (salience 600))
(LV c $? ?v $?)
(DF p $? ?v $?)
?f<-(RTRN $?vs)
(test (not (member$ ?v $?vs)))
=>
(retract ?f)
(assert (RTRN ?v $?vs)))
;; enforce order of rule (> first)
;; match some l-value in c
;; match some
;; ---- determine function type
(defrule FN-type
(declare (salience 500))
?f<-(FN name ?n type void)
(RTRN ?v)
(DF ~c $? ?typ ?v $?)
=>
(retract ?f)
(assert (FN name ?n type ?typ)))
;; ----- find parameters
(defrule parameters1 "do LV only parameters"
(declare (salience 400))
(RV c $? ?v $?)
(DF ~c $? ?typ ?v $?)
(not (LV c $? ?v $?))
(not (RTRN $? ?v $?))
?f<-(PM $?ps)
(test (not (member$ ?v $?ps)))
=>
(retract ?f)
(assert (PM $?ps ?typ ?v #val)))
(defrule parameters2 "do needed RV parameters"
(declare (salience 400))
(LV c $? ?v $?)
(DF ~c $? ?typ ?v $?)
(or (RTRN $? ? ?v $?)
(RTRN $? ?v ? $?))
?f<-(PM $?ps)
(test (not (member$ ?v $?ps)))
11/12
2/13/2016
Riggs
Analysis of Refactoring
Draft-2
=>
(retract ?f)
(assert (PM $?ps ?typ ?v #ref)))
;; ---- create necessary declarations
(defrule declarations
(declare (salience 300))
(or (LV c $? ?v $?)
(RV c $? ?v $?))
(DF ~c $? ?typ ?v $?)
(not (PM $? ?v $?))
?f<-(DECL $?ds)
(test (not (member$ ?v $?ds)))
=>
(retract ?f)
(assert (DECL ?typ ?v $?ds)))
;; --- determine the call statement
(defrule call-statement1a "single value returned"
(declare (salience 200))
(RTRN ?v)
(isNotAccumulated $? ?v $?)
?f<-(CALL)
=>
(retract ?f)
(assert (CALL assign ?v)))
(defrule call-statement1b "single value returned"
(declare (salience 200))
(RTRN ?v)
(isAccumulated $? ?v $?)
?f<-(CALL)
=>
(retract ?f)
(assert (CALL reassign ?v)))
(defrule call-statement2 "multiple values returned"
(declare (salience 200))
(RTRN ? ? $?)
(FN name ?n $?)
?f<-(CALL)
=>
(retract ?f)
(assert (CALL ?n of)))
12/12
2/13/2016
Download