(Paper) - "An Ordinal Metric For Intra-Method

advertisement
An Ordinal Metric for Intra-Method Class Cohesion
Frank Tsui, Challa Bonja, Sheryl Duggins, Orlando Karam
Southern Polytechnic State University
1100 S. Marietta Pkwy, Marietta, Georgia, USA
ABSTRACT
Cohesion has been studied by numerous researchers, and linked to
good quality. More recently, the study of cohesion has extended
to object oriented design and programming. Many of the cohesion
metrics in object oriented design address the cohesion of a class
through tracking the interactions of methods within the class and
the accessing of instance variables in the class and may be
classified as inter-method cohesion metrics of a class. The
individual methods within the class need also be cohesive. That is,
intra-method cohesion is also important.
In this paper, we propose an extension to the Bieman and Ott
(1994) metrics and Bieman and Kang (1995, 1998) metrics, which
are based on the notion of data slices and may be used for either
inter-method or intra-method cohesion. We also propose an intramethod cohesion metric, ITRA-C, which covers a broader set of
situations and is in turn based on the attributes of Effect and
Proximity. We also investigate into the positioning of the metrics
of Effect Indicator, Proximity Indicator and ITRA-C at different
metrics scale levels from measurement theory and show that
ITRA-C is at the ordinal scale level
Furthermore, the combination of both inter-method and
intra-method cohesion should be considered together for
class cohesion; however, that would be a topic for another
paper. In this paper, we propose an extension to the Bieman
and Ott (1994) metrics and Bieman and Kang (1995,1998)
metrics, which are based on the notion of data slices and
may be used for either inter-method or intra-method
cohesion. Within a class, each method’s cohesion is
measured individually and separately with intra-method
cohesion metric. Our main contribution is the development
of an ordinal intra-method cohesion metric, ITRA-C, which
covers a broader set of situations and is in turn based on the
attributes of Effect and Proximity. We also investigate into
the positioning of the metrics of Effect Indicator, Proximity
Indicator and ITRA-C at one of the four metrics scale levels
(nominal, ordinal, interval and ratio) from measurement
theory (Fenton & Pfleeger, 1997; Kitechenham, et al, 1995)
and show that ITRA-C is at the ordinal scale level.
Keywords
2. INTRA-METHOD COHESION
cohesion metrics, inter-method cohesion, intra-method
cohesion.
1. INTRODUCTION
Cohesion has been studied by numerous researchers. This
has been a topic of interest since structured design (Stevens,
et al, 1974; Yourdon & Constantine, 1979). Cohesion has
also been linked to good quality (Bansiya & Davis, 2002;
Basili, et al, 1996). More recently, the study of cohesion
has extended to the object oriented design and
programming environment (Bansiya & Davis, 2002; Bonja
& Kidanmarian, 2006; Briand, et al, 1998; Chae, et al,
2004; Chidamber & Kemerer, 1994; Counsel, et al, 2006;
Henserson-Sellers, 1996; Hitz & Montazeri, 1995; Zhou, et
al, 2004). A large number of the cohesion metrics in object
oriented design addresses the cohesion of a class through
tracking the interactions of methods within the class and the
accessing of instance variables in the class. These metrics,
in a sense, address the coupling among methods within a
class. According to these metrics, the more coupled the
methods are within a class, through instance variables or
through messaging each other, the more cohesive is the
class. Thus, these metrics may be classified as inter-method
cohesion metrics of a class.
The individual methods within the class need also be
cohesive. That is, intra-method cohesion is also important.
Most of the cohesion metrics for object oriented class
address the inter-method cohesion among the methods
within a class, counting the access of instance variables or
messaging activities among methods (Bonja, 2006; Briand,
et al, 1998; Chidamber & Kemerer, 1994). In this section,
we will discuss the notion of cohesion within each
individual method as opposed to cohesion within a class.
The cohesion of each method may be viewed as a microlevel of the inter-method cohesion in that we can analyze
the structural relationships of the data to the operations and
relationships among the operations. Thus, for intra-method
cohesion, we believe that each method should be viewed
from the perspective of relatedness of the operations and
data to achieve a single functionality. The key is the phrase
“single functionality.” For this we may consider reverting to
the early, traditional definition of cohesion in terms of
levels of cohesion, from coincidental cohesion to functional
cohesion (Yourdon & Constantine, 1979).
The problem is that there is no clear and simple way to
numerically measure intra-method cohesion when it is
defined through a metric of categorization of cohesion from
coincidental to functional and is ordered by fiat of the
definition. In that definition, only the best situation,
functional cohesion level, has one function. All other levels
may have multiple functions, and the manner in which the
multiple functions operate determines the level of cohesion.
One way is to naively assign functional level to be 7/7,
sequential level to be 6/7, communicational level to be 5/7,
and so on up to the worst case of coincidental level, which
will take on the value of 1/7. This primitive, numerical
approach incorrectly assumes that each level is different
from the next level in the exactly same amount.
Furthermore, there is no differentiation of number of
functions performed at different levels. Consider the
situation where one method may perform 5 functions at the
procedural level and another method performs 2 functions
at the logical level. According to the early cohesion metric
by level, the one with 5 functions at the procedural level
would be 4/7, and the one with 2 functions at the logical
level would be 2/7. Thus, this metric only serves as a
guideline, and is quite limited in its utility because it is not
clear that such a metric is really at the ordinal scale level
(Fenton & Pfleeger, 1997).
Bieman and Ott (1994) have suggested three metrics, based
on data slicing, to measure cohesion: Strong Functional
Cohesion, SFC, Weak Functional Cohesion, WFC, and
adhesiveness, A. Perhaps, a better alternative to the levels
of cohesion is to consider these metrics based on data slices
for intra-method cohesion. SFC is defined as the ratio of
super glue-tokens to total number of data tokens, and WFC
is defined as the ratio of glue tokens to the total number of
data tokens. The adhesiveness of a data token, t, in a
procedure is defined as the ratio of slices that contain t and
the total number of slices in the procedure. If the method
contains only one function, then every data token will reside
in only one function, and the adhesiveness of each of the
token is defined to be one. The average adhesiveness of all
the data tokens in the method is defined as A(m) = [ Σ A(ti)
] / |t|, where A(ti) is the adhesiveness of data token ti and |t|
is the cardinality of the set of data tokens in the method. For
the functional level where the method is performing only
one function, A(m) would be 1. For the other situations
where there are more than one function performed in the
method, A(m) may be less than 1. It provides a metric that
would address cohesion of the method in terms of the
adhesiveness of the data tokens or the connectedness of
these functions through the data tokens. The adhesiveness
metric does not differentiate the intra-method cohesion by
pre-defined levels. So it is possible to have a numerical
adhesiveness metric for intra-method cohesion that is the
same for two different levels of cohesion such as sequential
cohesion and communicational cohesion. The nature of the
functionality which differentiated the cohesion level in the
traditional, pre-ordered, cohesion categorization is not part
of metric of cohesion when measured through adhesiveness
of data tokens or the other two (SFC and WFC) metrics.
We propose a variation to the Bieman and Ott’s metrics
based on data slices as a metric for intra-method cohesion
metric. We will expand the notion of “output” in Bieman
and Ott to a broader set of situations. The intra-method
cohesion metric should take into account two characteristics
in a method:
-
The effect of the functionality in the method, and
-
The chaining within the functionality.
3. EFFECT AND EFFECT INDICATOR
The “Effect” of the functionality is a defined set of
observables. Once these are defined, then it is a much easier
characteristic to observe than general functionality. We
define the set of effects as characterized by the following
specific activities over a variable:
1. Printing, displaying, or writing of a variable,
2. Returning a value of a variable, and
3. Storing of a variable.
We will discuss these three types of effects. 1) Printing,
displaying, or writing a variable is often, though not always,
the culmination of some specific set of activities and
indicates a function is completed. Thus, tracing the slice of
code that resulted in the printing of that variable would
provide us a hint of the cohesiveness. The notion of code
slice here is the same as that provided by Weiser (1981). 2)
Similarly, returning a value of a variable implies the
completion of some functionality. However, this is a more
difficult Effect in that the return variable may not allow us
to perform a trace of the functionality. It may be the
situation where the particular method’s major activity is to
perform a synchronization activity or a sorting activity on
an instance variable array. The return value is just a success
indicator of that activity. Thus, tracing the slice of code
from the return value, in this situation, may not provide us
with a view of the real functionality. In the more traditional
case where the return value is usually the variable that
contains the result of some functionality, tracing the slice of
code from the return value would give us an idea of the
cohesiveness. 3) The final storing of a variable may be
accompanied with a previous retrieving of that variable.
This pair of retrieve/store activities often represents the
updating function. The slice of code between the retrieving
and the storing would represent the functionality, such as
sorting an array variable, performed for updating the
variable. A simple, perhaps trivial, example is the
constructor method with input parameters. The storing of a
variable without the retrieve part would imply storing the
variable after completion of some functionality. The final
storing of variable is similar to the effect of printing and
writing.
Besides type of Effect, the number of these Effects in a
method should be taken into account. The more Effects in a
method should represent more functionality and potential
diversity in functionalities. Also, the number of variables
involved in the slice of code that produced the Effect
provides an indication of the size and diversity of
functionalities involved in the resulting Effect. We define
the Effect Indicator, EI, as follows:
EI =

V (i, j)
i slices j  var iables
where V(i,j) represents the jth variable in the ith Effect
code slice. The Effect Indicator actually measures the
number of variables involved in the various Effects in a
method. Thus, the larger the Effect Indicator is the less
cohesive is the method. We will use the reciprocal of the EI
and define Effect quantitatively as:
Effect, E = 1/EI.
The Effect metric is equal to 1 when there is only one
Effect in the method and also when the slice of code
associated with that one Effect involves only one variable.
As the number of Effects and the variables involved in each
type of Effect increases, then EI increases but E decreases.
Thus E varies from 1, the best case, to potentially 0,
representing the large number of Effects over variables.
Note that the Effect Indicator is a counting of the number of
variables in the Effect code slices; it is a metric at the
interval scale level, which is higher than ordinal scale level.
Thus the metric for Effect matches well with the intuitive
notion of “single functionality” involving a single variable
for strong cohesion to “multiple functionalities” that
involve several variables for weaker cohesion.
4. CHAINING AND PROXIMITY INDICATOR
The second characteristic for intra-method cohesion is the
notion of the chain of Effect. The chaining characteristic is
also based on the slicing concepts from Weiser (1981) and
Bieman and Kang (1995, 1998). For each Effect, the slice
of code for that Effect is identified first. Then the variable
or variables that participate in the slice of code for that
Effect are traced in a chain fashion much like the defineusage (or d-u) path used in program testing (Jorgensen,
2002).The length of the chain for each variable is a count of
the number of statements involved in the completion of an
Effect. Thus the chain length provides an indication of the
size of the function. In the event that the same variable
appears several times in the chain, we only trace the longest
chain for that variable. Let the Chain Length of the slice of
code traced from the variable in the Effect all the way back
to those that affect the first definition of that variable be
CL. Let span of the chain or Chain Span, CS, be all the
statements of the code, including those not in the slice,
between the variable in the Effect to the first definition or
assignment of that variable. Since both CL and CS are
metrics based on counting of statements, they are at the
interval scale level. The ratio, CL/ CS, would represent the
proximity attribute of the variable in the Effect slice. For
each variable in each effect slice there is a Proximity
Indicator PI = CL/CS. This Proximity of Effect shows how
physically spread out the variable in each Effect in the
method is. Thus, it is an indication of the physical cohesion
of the method. A method may contain more than one Effect;
thus we need to compute the PI for each variable in the
Effect slice for all the Effects in a method. The Average PI,
or API for a method is:
API = Σ PI / | PI|
For a method that has only one Effect, one variable in that
Effect slice and the Effect slice associated with the method
is the complete method, then CL = CS. Then PI = 1, and
API will also be 1. As API moves towards 0, it indicates
that the slice of an Effect is more physically spread out in
the method. Note that API, composed of CS and CL, is a
metric that is at the interval scale level. Thus this physical
cohesion also matches well with our intuition of cohesion,
especially from a maintenance perspective. That is the more
spread out is the Effect, or the smaller is the value of API,
the more difficult is to maintain.
5. ITRA-C metric
The Intra-method Cohesion for a method, m, in an object is
defined to be the combination of Effect and Average
Proximity Indicator of that method or Intra-method
Cohesion of method
ITRA-C (m) = (w1* E) + (w2*API ),
where w1 =w2 = ½.
Certainly one may choose to have w1 and w2 be different
weighting factors. For simplicity we will use w1 = w2 = ½.
The choice of the weights and the combining of two
different attributes are what make the ITRA-C metric only
an ordinal scale metric, lower than the interval scale level.
For an object, O, which contains multiple methods, the
intra-method cohesion for the object is:
ITRA-C(O) = Σ ITRA-C (m j) / | mj |
This Intra-method Cohesion (O), or ITRA-C(O), of an
object will vary from the ideal value of 1 to the worst case
of 0. The best case is that each Intra-method Cohesion (m),
or ITRA-C(m), is equal to 1. As each of the ITRA-C (m)
decreases from 1, so will the ITRA-C(O).
The combining of two sub-attributes related to cohesion at
the method level was achieved by just averaging the metrics
for these sub-attributes. In this case, the intuitive notion of
cohesion is still preserved with the averaging. The ordering
of the intra-method cohesion matches that of the intuitive
ordering of cohesion.
Note that while we can say ITRA-C (m1) > ITRA-C (m2),
we can not pinpoint which of the sub-attributes or both, E
or API, contributed to this relationship between m1 and m2
without specifically looking at E of m1, E of m2, API of m1
and API of m2 separately. Note that ITRA-C(m1) – ITRAC(m2) = k does not have a clear meaning. Another pair of
methods, mj and mk, may also have a same ITRA-C
measurement difference of k, but these pair may differ in
ITRA-C for very different combination of E and API. This
shows why ITRA-C metric is not at the interval scale level,
but is only at the ordinal scale level.
6. EXAMPLES
In this section, we will explore a Class sample. This is not a
sample that we have artificially developed but one that we
purposely obtained from another source. The sample code
is from Lewis and Lofton’s text book (2001) on Java
programming. The Class is a bank account Class with seven
methods, including the constructor which initializes the
Class. The Class, without the constructor, is shown in
Figure1. There is an instance variable, fmt, of a
NumberFormat Class type. The formatting method from
this object, fmt, will be used by other methods in Account
Class for printing purposes. There is an instance variable,
RATE, which is set to a constant, and it is used by one
method in Account Class to compute the new balance with
interest. There are three other instance variables,
acctNumber, balance, and name used by multiple methods
in Account Class. The constructor method, in this case,
allows the initialization of the three instance variables:
acctNumber, balance and name.
Figure 1: Account Class
withdraw amount and the withdraw fee is added together to
form a total withdraw amount. If either the total withdraw
amount is negative or if the total withdraw amount is
greater than the current bank balance, an error message with
the acctNumber, withdraw amount, and the balance is sent
out. Otherwise, the bank balance is debited by the total
withdraw amount, and the new balance is returned. This
method uses the instance variables, balance and
acctNumber, and two local variables, amount and fee. It
also uses the instance variable, fmt, for printing.
The addInterest method takes the current balance, adds the
computed interest with RATE to the balance, and returns
the new balance. It uses the instance variable, balance, and
the instance variable, which is a constant RATE.
Both getBalance and getAccountNumber are access
methods for the private instance variables balance and
accNumber, respectively. They each use the instance
variables, balance and acctNumber, respectively and
nothing else. Lastly, the toString method returns the format
to be used when an Account object is a parameter in a
println instruction. It uses all three instance variables,
acctNumber, balance and name. It also uses the instance
variable, fmt, for formatting.
To illustrate the ITRA-C(O) metric, we will show the code
sample of the deposit ( ) method in Figure 2 from the
Account Class in Figure 1.
Figure 2: deposit method
The first method is the deposit method, which accepts a
deposit amount parameter. It verifies if the amount is nonnegative. If it is negative, an error message with the
acctNumber and the amount is sent out. Otherwise, a new
account balance is computed with the new deposit amount,
and the new balance returned. So this method uses the
instance variables, balance and acctNumber, and a local
variable, amount. It also uses a formatting method from the
instance variable, fmt.
The second method accepts two parameters, the withdraw
amount and the fee attached to the withdraw process. The
We start with the Effect Indicator, EI, which is the sum of
all the variables involved in the effects of the method.
There are two Effects in deposit ( ) method: printing Effect
and returning-value Effect. The slice of code included in the
printing Effect includes three variables, acctNumber,
amount, and special instance variable fmt. The slice of code
included in the return-value Effect includes two variables:
balance and amount. Thus EI = 3 + 2 = 5. The Effect
metric, E, is defined as the reciprocal of EI, E = 1/EI = 1/5
= .2.
Next we analyze the Proximity Indicator, PI, which is in
turn defined as the ratio of Chain Length, CL, and Chain
Span, CS. There are two Effects. We need to take each
variable in each of the effects separately. First let us address
the print Effect, which includes three variables:
acctNumber, amount and fmt. For the variable,
acctNumber, in print Effect, the CL includes two
statements: “System.out.println” statement that included
acctNumber and the “if” statement which influenced the
decision on printing. So CL = 2. For CS, all the statements
from the printing of acctNumber to the “if” statement are
included. So CS = 4. Thus for acctNumber, PI = CL/CS =
2/4 = .5. The variable, amount, is in the same print
statement as the acctNumber. However, the variable,
amount, is passed to the deposit ( ) method as a parameter.
Thus the CL will include the deposit method signature
statement and will be three statements. The CS for the
variable, amount, will be five. The PI for variable, amount,
is CL/CS = 3/5 = .6. For the special instance variable, fmt,
which is used for formatting the variable amount, CL
includes the print statement and the “if’ statement. The CS
for fmt includes all the statements from its print statement
to the “if” statement. So CS for fmt is 4. For the variable,
fmt, PI = CL/CS = 2/4 = .5. Next we address the return
Effect. There are two variables, balance and amount, that
need to be analyzed for the return effect. The CL for the
variable balance includes 2 statements, the return statement
itself and the defining of new balance statement. CS, in this
case, contains all eight statements in the method. Thus PI =
2/8 = .25 for variable, balance. For the variable, amount, in
the return Effect slice, we include the computation of the
new balance statement, the “else” statement, the “if”
statement, and the deposit( ) method signature statement for
its CL. So CL = 4. The CS for the variable amount includes
everything except the return statement. Thus CL = 7 for the
variable, amount. For variable, amount, in the return Effect,
PI = 4/7 = .57. The average PI, or API, for deposit ( )
method is (.5+.6 +.5+.25+.57) / 5 = .48. The Intra-method
Cohesion for deposit ( ) method would be:
following is the Intra-method Cohesion for the remaining
methods in Account class:
- ITRA-C for withdraw ( ) = .28
- ITRA-C for addInterest ( ) = .75
- ITRA-C for getAccountNumber ( ) = 1
- ITRA-C for toString ( ) = .625
Thus the Intra-method Cohesion for the complete Account
class = (.34 + 1 + .28 + .75 + 1 +.63)/ 6 = .66.
In working through the ITRA-C’s for each method, one can
see that the individual ITRA-C for the methods also provide
a guide for potential re-factoring of the methods.
7. CONCLUDING REMARKS
In this paper, we have developed an ordinal level intramethod metric, ITRA-C, for addressing cohesion within a
class. In our study of the intra-method cohesion and
samples of code, we have found that ITRA-C has been
helpful in guiding us towards improving our detail design.
At the global level, we use the relation, ITRA-C(mi) >
ITRA-C(mj), to order all the methods within a class. Once
we have the ordering of the methods by ITRA-C metric,
inevitably we would investigate into the Effect and
Proximity characteristics of the method with the lowest
ITRA-C. So far, even though ITRA-C is only at the ordinal
level, we have found it to be of value for guidance in
refactoring exercises. In the future, we would need to
establish a multi-attribute metric that is higher in the metric
scale of measurement theory.
8. REFERENCES
1.
ITRA-C(deposit) = (E + API)/2 = (.2 + .48)/2 = .34.
For purpose of contrast, we will show a much simpler
method, getBalance ( ), from the Account Class and
compute its Intra-method Cohesion. The sample code from
getBalance is shown below.
2.
Public double getBalance ( )
{ return balance;
3.
}
The Effect Indicator, EI, fr getBalance( ) is 1, and E = 1/EI
= 1. The Proximity Indicator, PI, for return effect has only
one variable, balance. For balance, PI = CL/CS = 1/1 = 1.
Since this is the only effect, API = 1. Then ITRA-C
(getBalance) = (E + API)/2 = (1 + 1)/2 = 1. The Intramethod cohesion for getBalance( ) method is a perfect 1.
Since the Intra-method Cohesion for the class or object is
defined as the average of all the Intra-method Cohesion of
the methods in that class, we need to perform the same
analysis for all the methods in the Account class. The
4.
5.
6.
Bansiya, J. and Davis, C., (2002), “A Hierarchical
Model for Object-oriented Design Quality
Assessment,” IEEE Transaction on Software
Engineering, Vol. 28, No. 1: 4-17.
Basili,V.C., Briand, L.C., and Melo, W.L. (1996),
“A Validation of Object Oriented Design Metrics
as Quality Indicators,” IEEE Transactions on
Software Engineering, Vol. 22, No 10, 751-761.
Bieman, J. and Ott, L.M., (1994), “Measuring
Functional Cohesion,” IEEE Transactions on
Software Engineering, Vol.20, No. 8: 644-657.
Bieman, J. and Kang, B. K., (1995),“Cohesion and
Reuse in an Object-Oriented System,”
Proceedings of Symposium on Software
Reusability, Seattle, Washington, USA.
Bieman, J. and Kang, B.K., (1998), “Measuring
Design Level Cohesion,” IEEE Transactions on
Software Engineering, Vol. 24, No. 2: 111-124.
Bonja, C. and Kidanmariam, E., (2006), “Metrics
for Class Cohesion and Similarity Between
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Methods,” Proceedings of the 44th ACM Southeast
Conference, Melbourne, Florida,USA.
Briand, L.C., Daly, J.W., and Wust, J., (1998), “A
Unified Framework for Cohesion Measurement in
Object-Oriented Systems,” Empirical Software
Engineering, Volume 3, Number 1: 65-117.
Chae, H.S., et al, (2004), “Improving Cohesion
Metrics for Classes by Considering Dependent
Instance Variables,” IEEE Transactions on
Software Engineering, Vol. 30, No. 11: 826-832.
Chidamber, S. R. and Kemerer, C. F., (1994), “A
Metric suite for Object-oriented Design,” IEEE
Transactions on Software Engineering, Vol.20,
No.6: 476-493.
Counsel, S., Swift, S., and Crampton, J.,(2006),
“The Interpretation and Utility of Three Cohesion
Metrics for Object-Oriented Design,” ACM
Transactions on Software Engineering and
Methodology, Vol.15, No. 2: 123-149.
Fenton, N.E. and Pfleeger, S. L., (1997), Software
Metric A Rigorous and Practical Approach, 2nd
edition, PWS Publishing Company.
Henderon-Sellers, B., (1996), Object-Oriented
Metrics: Measures of Complexity, Prentice Hall.
Hitz,M. and Montazeri, B., (1995) “Measuring
Coupling and Cohesion in Object-Oriented
Systems,” International Symposium on Applied
Corporate Computing, Monterey, Mexico: 25-27.
Jorgensen, P.C., (2002) Software Testing A
Craftsman’s Approach, 2nd edition, CRC Press.
Kitchenham, B., Pfleeger, S., and Fenton, N.,
(1995), “Towards a Framework for Software
Measurement Validation,” IEEE Transactions on
Software Engineering, Vol.21, No.12: 929-944.
Lewis, J. and Lofton, W., Java Software Solutions:
Foundations of Program Design, AddisonWesley, 2001.
Stevens, W.P., et al, (1974), “Structured Design.”
IBM Systems Journal, Vol. 13, No 2: 200-224.
Weiser, M., (1981), “Program Slicing,” 5th
International Conference on Software
Engineering, San Diego, California, USA : 439449.
Yourdon, E. and Constantine, L., (1979),
Structured Design, Prentice Hall.
Zhou, Y, et al, (2004), “A Comparative Study of
Graph Theory-based Class Cohesion Measures,”
ACM SIGSOFT, Software Engineering Notes Vol.
29, No. 2: 13 -13.
Download