ON INTER-METHOD 1 ON INTER-METHOD AND INTRA-METHOD OBJECT-ORIENTED CLASS COHESION Frank Tsui, Orlando Karam, Sheryl Duggins, Challa Bonja School of Computing and Software Engineering Southern Polytechnic State University Marietta, Georgia, USA 30060 KEYWORDS: Object-Oriented Design, Software Metrics, Software Quality, Systems Evaluation ON INTER-METHOD 2 Abstract Cohesion has been a topic of interest since structured design in the 1970’s. Cohesion may also be viewed as a characterization of a system attribute. Today, there are numerous researchers continuing this work into object-oriented designs. Most of the current research has focused on the interaction of methods within a class, the inter-method cohesion. In this paper, we consider both the inter-method cohesion and the intra-method cohesion of a class. We have utilized the concept of program slice (Weiser, 1981) and have extended Functional Cohesion (Bieman & Ott, 1994) to devise a new intra-method cohesion metric, ITRA-C, for measuring cohesion of each method within the class. This intra-method cohesion is based on the notion of effects and chaining in an effect-slice. We further combine the (inter-method, intra-method)-tuple into one combined Class Cohesion, which provides a quick view of bands of cohesion for categorizing classes. Introduction Developing high quality software continues to be a difficult task. Many attributes may be studied to understand software. Since software engineering is still in a relatively young stage, applying the “systems approach” as defined by R. L. Ackoff (Ackoff, 1971) where the complete software system is studied in a holistic manner is still a challenge. In this paper, we will focus on a specific software attribute, cohesion, and study it further through measuring this attribute from an object oriented class perspective. Cohesion has ON INTER-METHOD 3 been shown to be an important attribute for good quality software (Bansiya & Davis, 2002; Bieman & Ott,1994; Briand ,Morasca ,Basili, 1995). In this paper, instead of the complete software, the object oriented class itself is viewed as the system. Cohesion is an attribute that characterizes connectedness and thus allows us to view a system as a set of connected elements (Checkland, 1981), rather than in separate parts. We pursue an indepth analysis of this single attribute of the system through the various views of inter and intra method cohesion metrics. We will also show how the cohesion metrics may be used to help us design better object oriented classes. Thus, the value of the paper is not only in extending the concepts of cohesion and the various associated metrics, but also the application of theses metrics in guiding us in improving our class, or system, design. This emphasis on engineering software has lead to research into measurements for evaluating the quality of software. Low coupling and high cohesion have been identified as attributes of good software design (Bansiya & Davis, 2002; Briand et al, 1994) and a wide number of metrics have been developed to measure these quality attributes. The notion of cohesion has been in existence for several decades (Stevens, et al 1974; Yourdon & Constantine, 1979). These early papers introduced the concept of “functional relatedness” of modules. The relatedness among modules was called coupling and the relatedness within a module was called cohesion. Relatedness itself is an abstract concept which asks if items belonged together. Intuitively, those that “belonged” together ought to be designed into one entity. This made sense, especially, for the follow-on maintenance people who had to understand and make modifications to the design and the code. That is, if the “related” entities are spread across the system, then it is more difficult to find them. As Checkland (1981) advocates, a system should be thought of as ON INTER-METHOD 4 a connected set of elements rather than separate parts. Other than the now well know seven levels of cohesion (coincidental, logical, temporal, procedural, communicational, sequential, and functional), which defined ordered categories of cohesion, there was not a numeric metric for modular cohesion in those early days. Bieman and Ott (1994) and Bieman and Kang (1995; 1998) introduced numeric metrics based on program slices to gauge “relatedness,” or cohesion. Following the same concept of relatedness, there are several metrics designed to measure cohesion of an object-oriented class. Briand et al. (1994;1998), Hitz and Montazeri (1995), Chidamber and Kemerer (1994), Bansiya and Davis (2002), Counsel et al. (2006), Henderson-Sellers (1996), Bonja and Kidanmariam (2006), Chae et al. (2004), and Zhou et al. (2002;2004) have proposed different approaches to measuring cohesion in an objected oriented class. For the most part, these metrics all revolve around the notion of relatedness of the methods in a class. The relatedness of the methods is primarily gauged by the amount of and the type of sharing of the attributes, or data. The methods in a class are considered more cohesive if the amount of or type of (or both) sharing of attributes is higher. Also, the amount of interaction among methods in the form of method evocation of other methods in the class is considered an important factor for cohesion among methods in a class. That is, the connectedness of the methods is considered important. But, still, whether each individual method itself is cohesive or not is not clearly accounted for. In this paper, we will consider class cohesiveness to be composed of two attributes: - Relatedness, and - Singularity in function or purpose. ON INTER-METHOD 5 If one views a class as a system, then the relatedness concept of methods in that system is similar to the concept of coupling of the methods in the class. The more “coupled” the methods within a class are, the higher the cohesion of that class is. Thus, inter-method cohesion is captured by the notion of coupling of methods in the class. In such a context, one is lead to ask what an individual method cohesion is. That is, the singularity of function for each method in a class is important. Thus, intra-method cohesion must also be considered. The intra-method cohesion should answer how singular, or the degree of singularity in purpose, is the method. Intuitively, the more singular the method’s functional purpose is, the more cohesive its intra-method cohesion is. The ideal situation for a class is to maximize single purpose methods (intra-method cohesion) and also have these cohesive methods be strongly related in a class (inter-method cohesion). Both intermethod cohesion and intra-method cohesion need to be included when discussing the cohesion of a class. Furthermore, one may want to consider which one of the subattributes, inter-method or intra-method, is more important. Design quality metrics for object oriented systems can be categorized as either static or dynamic. Dynamic metrics measure object level coupling and dynamic complexity (Yacoub et al, 1999). This paper will address static metrics which measure the static cohesion of a class. The static structure of a class is considered to have only three main parts, the class name, the instance variables, and the methods. Class cohesion is analyzed by utilizing the instance variables and the methods of the class and their interplays within the class. We will first discuss the traditional relatedness of methods in a class, the intermethod cohesion metric. The notion of inter-method coupling will be studied through a set of evolving scenarios with adding instance variable and adding method to an “ideally” ON INTER-METHOD 6 inter-method wise cohesive class. We will then explore the concept of intra-method cohesion and introduce an intra-method cohesion metric. In the process of extending the metric definition, we also expand the notion of intra-method cohesion. Finally, the combination of inter-method and intra-method cohesion will be considered. Here, the difficulties involving multi-attribute metric as pointed out by Fenton and Pfleeger (1997) is explored. A potential combination metric will be proposed, and its characteristics will be discussed. Inter-Method Cohesion Cohesion of an entity is based on several basic and similar concepts (Stevens et al, 1974; Yourdon & Constantine 1979). These range from how much the entity serves a common goal to how related the parts of the entity are. These are intuitively similar in that if an entity had many unrelated parts, then chances are they may be serving more than a singular purpose. Here, we will use a very simple and contrived example for illustration purpose. Consider, as an example, where a Class Math is designed to perform a single service of providing the sum of a set of integer numbers. This Class Math may be expanded to include more services in the form of methods to provide the maximum of the set of integer numbers, the minimum of the set of integer numbers, and the average of the set of integers. As Class Math matures and enters into maintenance mode, it is further expanded to also accept floating point numbers. Further enhancement of Class Math may include a method that performs the input check and restricts the input to be only integers and floating point. In a way, these enhancements are not atypical of a Class that evolves ON INTER-METHOD 7 through its post-release enhancements. We can easily see how a very limited single purpose Class Math can be expanded to a broader multi-purpose Class Math. Using this Class Math example, let us examine how the various inter-method metrics would treat the change in single purposefulness and relatedness of a Class. The intermethod metrics which define cohesion based on the interaction of methods with the instance variables will all treat the above Class Math in a similar way. That is, the methods are all interacting with the same set of instance variables, the input integers and the input floating numbers. In Table 1, we have summarized these different cohesion metrics of interest which we will use to trace their respective changes as Class Math evolves. Table 1: Some Major Inter-Method Cohesion Metrics Metric Metric Explanation Briand, et al. (1998): RCI = |CI (C)| / |Max (C)|, where CI( C) is the set of all data RCI declaration, or DD, interactions and data-method, or DM, interactions in Class. Max (C) is the set of all possible DD and DM interactions. Bieman and Kang TCC = NDC/ NP, where NDC = # of pairs of methods that (1995;1998): directly or indirectly use common attributes, or directly TCC and LCC connected methods. NP = all possible # of pairs of methods that directly or indirectly use common attributes, or all possible directly connected pairs. ON INTER-METHOD 8 LCC = (NDC + NIC)/NP, where NIC are the pairs of methods that are indirectly connected. Bonja and CC = ( ∑(|IVC|/|IVT|) )/ |Max Pairs|, where IVC is the set of Kidanmariam (2006): common instance variables used by a pair of methods. IVT is CC the set of instance variables used by a pair of methods. The numerator is the sum of these ratios summed over all the pairs of methods, or n!/(2*(n-2)!) pairs, in the Class. Max pairs is the maximum possible pairs of methods, which is n!/(2*(n-2)!) pairs for a class with n methods. Chidamber and LCOM= |P| - |Q| if |P| > |Q|; otherwise 0. If there are n methods, Kemerer (1994): then {Ii} is the set of instance variables used by method i, Mi. LCOM Then P = { (Ii, Ij) where Ii ∩ Ij = Ø}, and Q = {(Ii, Ij) where Ii ∩ Ij ≠ Ø. If for all i, {Ii} = Ø, then P = Ø . Hitz and Montazeri LCOM4 = # of connected components in a class, where method (1995): a and method b is connected if 1) they share an instance LCOM4 variable or 2) either method a invokes method b or vise versa. Henderson-Sellers LCOM5 = [((1/a) ( ∑ u(Aj) )) - m ] / ( 1- m) where a = # of (1996): attributes or instance variables, u(Aj) = number of methods LCOM5 accessing attribute Aj, m = number of methods, and ∑u(Aj) is summed over all the attributes j=1, ---, a. Bansiya et al. (2002): CACM CACM = (∑ ∑ Oij )/ ( KL), where Oij is the (i,j )th entry in the parameter occurrence matrix. Oij = 1 if the jth data type occurs as a parameter in the ith method, and Oij = 0 otherwise. ON INTER-METHOD 9 K is the number of columns or number of data types in the parameter occurrence matrix, and L is the number of rows or the number of methods in the parameter occurrence matrix. ∑∑ Oij is summed over all the parameter data types, K, and over all the methods, L. Counsel et al. (2006): NHD NHD = (∑∑ Aij ) / [L * (K(K-1)/2) ] where Aij is the entry of parameter agreement matrix. Aij = number of parameter agreements between method mi and mj. K = number of methods and L = number of attribute types. NHD is the ratio of methods agreeing on parameter types to the maximum potential of every method agreeing with every other method in parameter types. The denominator is L attributes times the number of pairs of methods out of K methods. Consider the values that each of the metrics in Table 1 will evolve from the most ideal cohesive situation to a less ideal case as described through the following five scenarios. a) All the methods within the Class use/share the single instance variable (e.g. integers) b) Add another instance variable (e.g. floating type) that is shared by all the methods. c) Add one more method that also uses/shares the same instance variables ON INTER-METHOD 10 d) Add an instance variable that disturbs the “uniformity” of all the methods sharing all the instance variables. That is, in the software maintenance or evolution mode, we often will introduce an additional instance variable into a Class without fully considering the erosion to cohesiveness of that Class. e) Add a method that similarly disturbs the “uniformity” of all the methods sharing all the instance variables. Again, during software evolution we often will introduce an additional method into a Class without considering how it might erode the cohesiveness of that Class. In Table 2, we have summarized the evolution of the metric values as the Class Math evolves from the above condition (a) through condition (e). Class Math starts with inputting a set of integers as an instance variable. There are four methods in Class Math that compute the sum, min, max, and average of the integers respectively. Then Class Math is expanded to input floating point numbers and the same four methods are enhanced to compute the sum, min, max, and average of the floating point numbers. Then an additional fifth method is included to perform a check to ensure that both integers and floating point input numbers are between -10,000 and +10,000. Then Class Math may evolve to either include an instance variable that only some of the methods use or include a method that uses some of the instance variables. Let us pick a sample metric in Table 2, the Henderson-Sellars’ (1996) LCOM5, as we consider these scenarios. As we go through the scenarios, it will become evident that even this simple evolution is not as clear cut as it looks. ON INTER-METHOD 11 LCOM5 Computations LCOM5 was defined by Henderson-Sellers (1996). It predominantly looks at the number of methods that access each of the set of attributes or data, specifically only the instance variables. Thus, LCOM5 does not deal with data to data interactions and the non-instance variables. It focuses on instance variables to method interactions. For LCOM5 having a value of 0 is considered perfect cohesion. For scenario (a), there are 5 data elements. One is an instance variable, I1, which is accessed by all 4 methods. Let I1 be represented as A1, and the other 4 data elements be A2 through A5. Recall that these 4 data elements are variables: sum, min, max, and average. They are defined within each of the methods and accessed only by their respective methods, m1 through m4. We realize that one may argue to have these declared as instance variables. But for this example, we will purposely chose not to do so. Since these 4 data elements are not instance variables, they will not enter into LCOM5 computation. We only have u(A1) = 4. LCOM5 = [(1/1) *(4) – (4)] / (1-4) = 0. For scenario (a), LCOM5 =0 is considered perfect cohesion. For scenario (b), we introduce another instance variable, a floating type I2, to be accessed by all 4 methods again. In this case, u(A1) = 4 and u(A2) = 4. LCOM5 = [ (1/2)* (4+4) – 4 ] / ( 1- 4) = 0. Thus for scenario (b), LCOM5 = 0 indicates that the Class cohesion remains perfect. ON INTER-METHOD 12 Consider scenario (c) where a fifth method, input check method, is introduced to check both of the instance variables. Therefore, u(A1) = 5 and u(A2) = 5. LCOM5 = [(1/2) * (5 + 5) – 5]/ (1-5) = 0. For scenario (c), LCOM5 continues to indicate that the Class cohesion remains perfect. For scenario (d), we introduce a third instance variable, I3 that is only accessed by 1 of the 5 methods. In this case, u(A1) and u(A2) both remain the same as before, and u(A3) = 1. LCOM5 = [(1/3)* (5 +5 +1) – 5] / ( 1- 5) = (- 4/3) / (-4) = 1/3. This indicates that the Class cohesion has deteriorated as it is moving from the perfect 0 case towards 1, the worst case. The final scenario (e) is to introduce a sixth method that accesses only 1 of the 3 existing instance variables. We will arbitrarily pick that 1 instance variable to be I1. Now, u(A1) = 6, u(A2) = 5, and u(A3) = 1. LCOM5 = [(1/3)* (6+5+1) – 6] / (1-6) = (-2)/(-5) = 2/5. This time LCOM5 has slightly increased in value, indicating further deterioration of Class cohesion. LCOM5 metric showed perfect cohesion for scenarios (a) through (c). Intuitively, this made sense when one is only considering the instance variable. As the two cases in scenarios (d) and (e) show, the class cohesion eroded and increased in value as we introduced an instance variable that is only utilized by one method followed by the introduction of a method that only uses I1. The only inconvenient part is LCOM5 starts with a perfect 0 and increases in value to the worst case, 1 as cohesion deteriorates. Summarizing Scenario (a) through (e): ON INTER-METHOD 13 In Table 2, we have summarized the evolution of the metric values as the Class Math evolves from condition (a) through condition (e) for all the metrics listed in Table 1. Table 2: Summarizing Scenarios (a) through (e) (a) (b) (c) Briand et al .6 (1998): .6 (d) (e) add an instance variable add a method - adding an instance - adding a method that .65 variable that only does not access all interacts with some instance variables also methods further decreases further decreases the RCI value of RCI - adding an instance - adding a method that variable that only does not access all the (1995;1998): interacts with some instance variable, TCC methods creates no decreases TCC RCI Bieman and 1 Kang 1 1 change to TCC Bonja and Kidanmariam (2006): CC (X) 1 1 1 -adding an instance - adding a method that variable that only does not interact with all interacts with some the instance variables, methods decreases CC decreases CC (X) (X) Chidamber - adding an instance - adding a method that ON INTER-METHOD 14 and Kemerer 0 0 0 (1994): LCOM variable that creates a creates a “non-uniform’ “non-uniform’ situation situation that affects |P| that affects |P| and |Q|; if and |Q|; if |P| < |Q| then |P| < |Q| then LCOM = 0 LCOM = 0 and if |P| >|Q|, and if |P| >|Q|, then then LCOM increases in LCOM increases in value value from 0. from 0 - adding an instance - adding a method that variable that is not does not access all accessed by all methods instance variables creates creates a non-uniformity a non-uniformity and LCOM4 and increases LCOM 4. increases LCOM 4. Henderson- - adding instance variable - adding a method that that is accessed by some accesses some instance methods increases variable also increases LCOM5 above 0 towards LCOM5 towards 1 and 1, deteriorating cohesion further deteriorates Hitz and Montazeri 1 1 1 (1995): Sellers (1996): 0 0 0 LCOM5 cohesion Bansiya, et al. 1 (2002): CACM Counsel, et al. 1 1 -adding an instance - adding a method that variable that is not used does not use all instance by all the methods variables decreases decreases CACM from1 CACM from 1 - adding an instance - adding a method that ON INTER-METHOD 15 1 (2006): NHD 1 1 variable that is not used does not access all the by all methods decreases instance variables NHD from 1 decreases NHD from 1 This plethora of cohesion metrics of a Class shows that the ideal value for cohesion varies. Some start at 0 and increase in value as cohesion is compromised, and others start at 1 and decrease in value as cohesion erodes. While each has its own strength as a metric, it is nevertheless difficult to keep track of all the details behind these metrics. From these Class cohesion metrics, it is clear that inter-method cohesion in a Class resembles the concept of coupling of methods within a Class. However, in this case, we want tight coupling among methods, not loose coupling. As such, inter-method cohesion should be concerned with the following two characteristics: 1. Methods coupled due to sharing of control through method invocations, and 2. Methods coupled due to sharing of data among methods. From these two characteristics one can also see that the sharing of control may be multileveled. That is, a method may invoke another method which may further invoke a third method. This chain of invocations creates different degrees of inter-method cohesion. Similarly, there may be a chain of sharing of data or data dependencies (Chae et al, 2004; Zhou et al, 2002). That is, consider an instance variable, x, that is assigned a value by a method A. In a different method, B, x is used to define another variable, y. A third method, C, may utilize the variable y to compute something. In this scenario, methods A ON INTER-METHOD 16 and C do not directly share any data. However, there exists a data dependency relationship that should be accounted for when considering inter-method cohesion. From Table 1, one would like to pick an inter-method cohesion metric that comes closest to covering as much of the two attributes as possible. Then, perhaps modify or enhance the metric, if necessary. For the rest of the paper, especially in the section on Combining the Inter-method and Intra-method Cohesion, we will choose the popular LCOM5 intermethod cohesion metrics, but with one modification. We will be reversing it to start with lowest cohesion of 0 and move towards a perfect cohesion of 1. Intra-Method Cohesion In this section, we will discuss the notion of cohesion of each individual method. The cohesion of each method may be viewed as a micro-level of the inter-method cohesion in that we can analyze the structural relationships of the data to the operations and relationships among the operations. Thus, for intra-method cohesion, we believe that each method should be viewed from the perspective of relatedness of the operations and of data to achieve a single functionality. The key is the phrase “single functionality.” For this we may consider reverting to the earlier definition of cohesion in terms of levels of cohesion, from coincidental to functional. The problem is that there is no clear and simple way to numerically measure intra-method cohesion when it is defined through a metric of levels that is only ordered. In that definition, only the best situation, functional cohesion level, has one function. All other levels have multiple functions, and the manner in which the multiple functions operate ON INTER-METHOD 17 determines the level of cohesion. One way is to assign functional level to be 1/1, sequential level to be1/2, communicational level to be 1/3, and so on up to the worst case of coincidental level, which will take on the value of 1/7. This primitive, numerical metric assumes that each level is different from the next level in the exactly same amount. Furthermore, there is no differentiation of number of functions performed at different levels. Consider the situation where one method may perform 5 functions at the procedural level and another method performs 2 functions at the logical level. According to the cohesion metric by level, the one with 5 functions at the procedural level would be 1/4, and the one with 2 functions at the logical level would be 1/6. Thus, this metric only serves as a guideline, but is quite limited in its utility. Bieman and Ott (1994) have suggested three metrics, based on data slicing, to measure cohesion: Strong Functional Cohesion, SFC, Weak Functional Cohesion, WFC, and adhesiveness, A. Perhaps, a better alternative to the levels of cohesion is to consider these metrics based on data slices for intra-method cohesion. SFC is defined as the ratio of super glue-tokens to total number of data tokens, and WFC is defined as the ratio of glue tokens to the total number of data tokens. The adhesiveness of a data token, t, in a procedure is defined as the ratio of slices that contain t and the total number of slices in the procedure. If the method contains only one function, then every data token will reside in only one function, and the adhesiveness of each of the token is defined to be one. The average adhesiveness of all the data tokens in the method is defined as A(m) = (Σ A(ti) ) / |t|, where A(ti) is the adhesiveness of data token ti and |t| is the cardinality of the set of data tokens in the method. It provides a metric that would address cohesion of the method in terms of the adhesiveness of the data tokens or the connectedness of these functions ON INTER-METHOD 18 through the data tokens. The adhesiveness metric does not differentiate the intra-method cohesion by pre-defined levels. So it is possible to have a numerical adhesiveness metric for intra-method cohesion that is the same for two different levels of cohesion, such as the sequential and communicational levels. The nature of the functionality which differentiated the cohesion level in the previous, ordered, cohesion levels is not part of metric of cohesion when measured through adhesiveness of data tokens or the other two (SFC and WFC) metrics. We propose a variation to the Bieman and Ott’s (1994) metrics based on data slices as a metric for intra-method cohesion metric. We will expand the notion of “output” in Bieman and Ott (1994) to a broader set of situations. The intra-method cohesion metric should take into account of two characteristics in a method: - The effect of the functionality in the method, and - The chaining within the functionality. The “effect” of the functionality is a defined set of observables. Once these are defined, then it is a much easier characteristic to observe than general functionality. We define the set of effects as characterized by the following specific activities over a variable: 1. Printing, displaying, or writing of a variable, 2. Returning a value of a variable, and 3. Storing of a variable. ON INTER-METHOD 19 We will discuss these three types of effects. 1) Printing, displaying, or writing a variable is often the culmination of some specific set of activities and indicates a function is completed. Thus, tracing the slice of code that resulted in the printing of that variable would provide us a hint of the cohesiveness. 2) Similarly, returning a value of a variable implies the completion of some functionality. However, this is a more difficult effect in that the return variable may not allow us to perform a trace of the functionality. It may be the situation where the particular method performs a synchronization activity or a sorting activity on an instance variable array. The return value is just a success indicator. Thus, tracing the slice of code from the return value, in this situation, will not provide us with a view of the functionality. In the more traditional case where the return value is usually the variable that contains the result of some functionality, tracing the slice of code from the return value would give us an idea of the cohesiveness. 3) The final storing of a variable may be accompanied with retrieving of the variable. This pair of activities often represents the updating function. The slice of code between the retrieving and the storing would represent the functionality, such as sorting an array variable, performed for updating the variable. A simple, perhaps trivial, example is the constructor method with input parameters. The storing of a variable without the retrieve part would imply storing the variable after completion of some functionality. The final storing of variable is similar to the effect of printing and writing. The more effects in a method should represent more functionality and potential diversity in functionalities. Also, the number of variables involved in the slice of code that produced the effect provides an indication of the size and diversity of functionalities ON INTER-METHOD 20 involved in the resulting effect. The notion of code slice is the same as that provided by Weiser [21]. The Effect Indicator, EI, is represented as follows: EI = V (i, j) where V(i,j) is the jth variable in the ith effect code slice. i slices j var iables The conjecture here is that the larger the Effect Indicator is the less cohesive is the method. We will use the reciprocal of the EI and define: Effect, E = 1/EI. The Effect metric is equal to 1 when there is only one effect in the method and also when that one effect involves only one variable. As more effects and variables are involved in each effect, then EI increases but E decreases. Thus E varies from 1, the best case, to potentially 0, representing the large number of effects over variables. The second characteristic for intra-method cohesion is the notion of the chain of the effect. The chaining characteristic is also based on the slicing concepts from Weiser (1981). For each effect, the slice of code for that effect is identified first. Then the variable or variables that participate in the slice of code for that effect are traced in a chain fashion much like the define-usage (or d-u ) path used in program testing (Jorgensen, 2002).The length of the chain for each variable is a count of the number of steps involved in the completion of an effect. Thus the chain length provides an indication of the size of the function. In the event that the same variable appears several times in the chain, we only trace the longest chain for that variable. Let the Chain Length ON INTER-METHOD 21 of the slice of code traced from the variable in the effect all the way back to those that affect the first definition of that variable be CL. Let span of the chain or Chain Span, CS, be all the steps of the code, including those not in the slice, between the variable in the effect to the first definition or assignment of that variable. The ratio, CL/ CS, would represent the proximity attribute of the variable in the effect slice. For each variable in each effect slice there is a Proximity Indicator PI = CL/CS. This proximity of effect shows how physically spread out the variable in each effect in the method is. Thus, it is an indication of the physical cohesion of the method. A method may contain more than one effect; thus we need to compute the PI for each variable in the effect slice for all the effects in a method. The Average PI, or API for a method is: API = Σ PI / | PI| For a method that has only one effect, one variable in that effect slice and the effect slice associated with the method is the complete method, then CL = CS. Then PI = 1, and API will also be 1. As API moves towards 0, it indicates that the slice of an effect is more physically spread out in the method. This physical cohesion also matches well with our intuition, especially from a maintenance perspective. The Intra-method Cohesion for a method, m, in an object is defined to be the combination of Effect and Average Proximity Indicator of that method or Intra-method Cohesion of method or ITRA-C (m) = ( E + API ) / 2 ON INTER-METHOD 22 For an object, O, which contains multiple methods, the intra-method cohesion for the object is: Intra-method Cohesion of the object or ITRA-C(O) = Σ IC (m j) / | mj | This Intra-method Cohesion (O), or ITRA-C(O), of an object will vary from the ideal value of 1 to the worst case of 0. The best case is that each Intra-method Cohesion (m), or ITRA-C(m), is equal to 1. As each of the ITRA-C (m) decreases from 1, so will the ITRA-C (O). The combining of two sub-attributes related to cohesion at the method level was achieved by just averaging the metrics for these sub-attributes. In this case, the intuitive notion of cohesion is still preserved with the averaging. The ordering of the intra-method cohesion matches that of the ordering of cohesion. Note that while we can say Intra-method Cohesion (m1) > Intra-method Cohesion (m2), we can not pinpoint which of the sub-attributes or both, E or API, contributed to this relationship between m1 and m2 without specifically looking at E of m1, E of m2, API of m1 and API of m2. In the next section, we will investigate the problem of combining metrics of different sub-attributes. Specifically, we are interested in combining the two sub-attributes, inter-method cohesion and intra-method cohesion, and coming up with one unified Class Cohesion metric. Combining Inter-Method And Intra-Method Cohesion ON INTER-METHOD 23 The unification of the inter-method and intra-method cohesion metrics will be explored in this section. In software metric theory (Fenton and Pfleeger, 1997), we are reminded of the Representation Condition. That is, the mapping from the empirical world to the numerical world should maintain the relations such that the relations in the empirical world are preserved by the relations in the numerical world. Consider the case where an inter-method metric, such as LCOM5, is picked to represent “relatedness.” Recall that LCOM5 varies from 0 to 1 where 0 is the most cohesive and 1 is the least cohesive. For intra-method cohesion, or singularity in purpose, consider the Intra-method Cohesion(O), ITRA-C(O), defined in the previous section. ITRA-C(O) also varies from 0 to 1 where 1 is the most cohesive and 0 is the least cohesive. It would seem desirable to somehow combine these two metrics and come up with a single Class (or Object) Cohesion metric that takes into account relatedness and singularity in purpose. An inter-method and intra-method combined Class Cohesion metric would provide us with a quick view or comparison of cohesion among different classes. One simple way is to just take the average of the LCOM5 and ITRA-C(O). Immediately, we would run into trouble because LCOM5 works in the reverse direction than ITRAC(O). To adjust and make the two metrics move in the same direction may sometimes be a non-trivial task that will require the redefinition of the attribute. In this case, we can define an adjusted LCOM5’= (1- LCOM5) to reverse the direction of LCOM5 from 1 to 0, to 0 to 1. We will utilize LCOM5’ as the adjusted inter-method cohesion in this section. Assuming that the two metrics are adjusted and are on the same 0 to 1 interval, combining them by taking a simple average would mean that the two sub-attributes, intermethod and intra-method, are viewed to be of the same contributive value, which may not ON INTER-METHOD 24 always be true. For the time being, let’s assume that the two metrics, LCOM5’ and ITRA-C(O), are of equal value in their contribution to Class Cohesion, where LCOM5’ is the adjusted inter-method metric. Even then, the Class Cohesion derived from the average of LCOM5’ and ITRA-C(O) cannot satisfy the Representation Condition. This is illustrated in Figure 1 below. Inter-method cohesion ( LCOM5’ ) 0 Intra-method cohesion ( ITRA-C (O) ) 1 O1=.3 0 O2=.6 O2=.4 O1=.5 1 Class Cohesion 0 O1 =.4 O2=.5 Class Cohesion(O1) = (.3+.5)/2 Class Cohesion(O2) = (.6+.4)/2 Figure 1: mapping from inter and intra cohesion to a unified Class Cohesion As Figure 1 illustrates, Class Cohesion (O2) > Class Cohesion (O1). However, this relationship may not be preserved or be able to preserve the same relationship for both LCOM5’ and ITRA-C(O), at the inter-method and intra-method level. At the intermethod level, LCOM5’(O2) > LCOM5’(O2), but at the intra-method level ITRA-C(O1) > ITRA-C(O2). These relations are not preserved through the mapping from inter-method and intra-method to the Class Cohesion, if we use the “averaging” function as our mapping function. ON INTER-METHOD 25 One alternative is to keep the sub-attributes, and represent Class Cohesion (O) as a 2tuple. Class Cohesion (O) = ( LCOM5’, ITRA-C(O) ) Further define the “>” relation such that Class Cohesion (O1) > Class Cohesion(O2) if and only if i) LCOM5’(O1) > LCOM5’(O2) AND ii) ITRA-C(O1) > ITRA-C(O2) Class Cohesion (O) defined in this fashion will at least preserve the relationships in the empirical worlds of both inter-method and intra-method cohesion when the logical AND condition is met. The question still remains when it is not the case where both of the AND conditions are met. The following are the major, but not all, possible situations. a) LCOM5’ (O1) > LCOM5’ (O2) AND ITRA-C(O1) > ITRA-C(O2), then Class Cohesion (O1) > Class Cohesion (O2) b) LCOM5’ (O1) < LCOM5’ (O2) AND ITRA-C( O1) < ITRA-C (O2), then Class Cohesion (O1) < Class Cohesion (O2) c) LCOM5’ (O1) = LCOM5’ (O2) AND ITRA-C(O1) = ITRA-C(O2), then ON INTER-METHOD 26 Class Cohesion (O1) = Class Cohesion (O2) d) LCOM5’ (O1) < LCOM 5’ (O2) AND ITRA-C(O1) > ITRA-C(O2), then Class Cohesion (O1) ≠ Class Cohesion (O2) e) LCOM5’ (O1) > LCOM5’ (O2) AND ITRA-C(O1) < ITRA-C(O2), then Class Cohesion (O1) ≠ Class Cohesion (O2) For conditions d) and e) above, we can say that Class Cohesion (O1) ≠ Class Cohesion (O2). This mapping of Class Cohesion into the 2-tuple, with the very strict logical AND operator, creates a metric that is even more difficult to use under certain circumstances. Next we consider the situation where the logical AND for inter-method and intra-method conditions is relaxed to a logical OR situation. That is, define Class Cohesion (O1) >= Class Cohesion (O2) if (the reverse is not necessarily true): i) LCOM5’(O1) >= LCOM5’(O2) OR ii) ITRA-C(O1) >= ITRA-C(O2). The logical OR relaxes the constraints we placed on the mapping. This may allow us to devise a metric that will replace the Class Cohesion (O1) “not-equals” Class Cohesion (O2) situations. Using the 2-tuple of (inter-method, intra-method), or more specifically (LCOM5’, INTR-C(O)), the Class Cohesion (O) metric will run from (0, 0) to (1,1). The ON INTER-METHOD 27 question is what happens when we are dealing with situations where Class Cohesion (O) is not on the diagonal or on the same “radial line” of Figure 2. (1,1) (1,0) Class Cohesion where Intra-method < inter-method Inter-method Class Cohesion where Intra-method > inter-method (0,1) (0,0) Intra-method Figure 2: The Class Cohesion Metric on and off the diagonal With the logical AND condition, the “>” relationship of Class Cohesion (O) may be unclear if the two Class Cohesion (O) reside on different radial lines. This is a problem that makes the situation very constrained. In Figure 3, we show that besides using the radial lines, we can also look at the location where the Class Cohesions fall and roughly determine the “>” relationship. As Figure 3 shows, given a Class Cohesion (O) at some point (a,b), there is an area that is clearly less cohesive and an area that is clearly more cohesive. Both of these areas are represented with hashed lines in Figure 3. But there are two areas that are questionable. We are also interested in what happens in the unclear areas of Figure 3. ON INTER-METHOD 28 (1,0) Inter-method more unclear (a,b) less unclear (0,1) (0,0) Intra-method Figure 3: The Class Cohesion (O) in terms of Distance Function With the relaxed condition of OR, we have more latitude in devising a metric. We will explore the definition of Class Cohesion as a distance function. Class Cohesion (O) = distance of (LCOM5’, ITRA-C(O)) from (0,0) or = ( LCOM5'- 0) 2 ( ITRA - C(O) - 0) 2 The maximum distance is SQRT (2), or 1.41, in this case. Class Cohesion (O) defined as a distance function and based on the logical OR condition does handle the cases where Class Cohesion is either on or off the diagonal or the same radial line. Class Cohesion (O1) > Class Cohesion (O2) if: ON INTER-METHOD 29 (LCOM5' (O1) 2 (ITRA - C(O1) 2 > { (LCOM5' (O2) 2 (ITRA - C(O2) 2 This numerical distance function obviously satisfies the OR condition because it took that into consideration. Figure 4 shows the combined metric of Class Cohesion (O) moving from band y to band z. (1,1) (1,0) Inter-method Clas s Co Clas s Coh esio n hes io n = b an d z = b and y (0,1) (0,0) Intra-method Figure 4: The Class Cohesion (O) in terms of Distance Function On the band of Class Cohesion = band y, for example, there are classes of different combinations of inter-method and intra-method cohesions. Thus the relaxed OR condition gives a greater amount of flexibility in representing the Class Cohesion (O) than the AND condition. According to Figure 4, all classes on Class Cohesion = band z have a higher cohesion than those on the y-band. This combined metric is a good initial step in combining the inter-method and intramethod cohesion of a class. It eliminates the earlier “unclear” situations for Class Cohesion (O) of the AND condition where [ LCOM5’ (O1) < LCOM 5’ (O2) AND ON INTER-METHOD 30 ITRA-C(O1) > ITRA-C(O2) ] or [ LCOM5’ (O1) > LCOM5’ (O2) AND ITRA-C(O1) < ITRA-C(O2) ]. With the OR condition, the Class Cohesion (O) metric seems to better preserve the relationship in the numerical world with the empirical world. This unified Class Cohesion provides us with bands of cohesion and gives us a quick global view of classes of cohesion. The grouping of classes may be viewed as shorthand to cataloging classes, or systems, by cohesion bands. Concluding Remarks In this paper, we have taken an object oriented class as a system. The connectedness or relatedness of the system, known as cohesion, has been shown to be associated to quality (Bansiya & Davis, 2002; Bieman & Ott, 1994; Briand et al, 1994). Thus cohesion is an important characteristic to investigate, and we have viewed it as such a characterization of an object oriented class in software. More specifically, we have limited our view to just a static analysis perspective by analyzing the interaction between the instance variables and the methods within a class and by analyzing all the individual methods within the class. These were classified as inter-method cohesion and intra-method cohesion, respectively. The literature is rich with inter-method cohesion metrics; any one of them may be picked for inter-method cohesion. Our main contributions in this paper are twofold. First, we utilized the concept from program slices and modified the Functional Cohesion concepts from Bieman and Ott to formulate a new intra-method cohesion metric, ITRA-C. Second, while it may be necessary to keep the (inter-method, intra-method)-tuple as the class metric, we also used ON INTER-METHOD 31 the distance function and unified these two metrics into one Class Cohesion metric to provide “bands” of class cohesion as a high level categorization of classes or systems. In the future, we plan to extend this work into several areas. One area is to further study the bands of Class Cohesion, utilizing samples of classes and observing them through multiple maintenance cycles. Another would utilize the (inter-cohesion, intra-cohesion)tuple as a guide to further analyze and re-factor class design. A third area would be to extend the study to include the interactions among classes and the dynamic analysis of cohesion such as applying Class Cohesion to run-time analysis of classes (Mitchell & Power, 2004; Yacoub et al, 1999). And lastly, we would explore the potential of applying the cohesion concepts and metrics to other, more general software systems beyond object oriented software classes. References ON INTER-METHOD 32 Ackoff, R. L. (1971). Towards A System of Systems Concepts. Management Science, 17(11), 661- 671. Bansiya, J. & Davis, C. (2002). A Hierarchical Model for Object-oriented Design Quality Assessment. IEEE Transaction on Software Engineering, 28(1), 4-17. Bieman, J. & Ott, L.M. (1994). Measuring Functional Cohesion. IEEE Transactions on Software Engineering, 20(8), 644-657. Bieman, J. & Kang, B. K. (1995). Cohesion and Reuse in an Object-Oriented System. In Proceedings of Symposium on Software Reusability, Seattle, Washington, USA. Bieman, J. & Kang, B.K. (1998). Measuring Design Level Cohesion. IEEE Transactions on Software Engineering, 24(2), 111-124. Bonja, C. & Kidanmariam, E. (2006). Metrics for Class Cohesion and Similarity Between Methods. In Proceedings of the 44th ACM Southeast Conference, Melbourne, Florida, USA. Briand, L., Morasca, S., & Basili, V.C. (1994). Defining and Validating HighLevel Design Metrics. University of Maryland CS-TR3301-1, Maryland, USA. Briand, L.C., Daly, J.W., & Wust, J. (1998). A Unified Framework for Cohesion Measurement in Object-Oriented Systems. Empirical Software Engineering, 3(1), 65-117. Chae, H.S., Kwon, Y.R., & Bae, D.H. (2004). Improving Cohesion Metrics for Classes by Considering Dependent Instance Variables. IEEE Transactions on Software Engineering, 30(11), 826-832. ON INTER-METHOD 33 Checkland, P. (1981), Systems thinking, systems practice. West Sussex, England: JohnWiley & Sons. Chidamber, S. R. & Kemerer, C. F. (1994). A Metric Suite for Object-oriented Design. IEEE Transactions on Software Engineering, 20(6), 476-493. Counsel, S., Swift, S., & Crampton, J. (2006). The Interpretation and Utility of Three Cohesion Metrics for Object-Oriented Design. ACM Transactions on Software Engineering and Methodology, 15(2), 123-149. Fenton, N.E. & Pfleeger, S. L. (1997). Software metric a rigorous and practical approach, 2nd edition. PWS Publishing Company. Henderon-Sellers, B. (1996). Object-Oriented metrics: measures of complexity. Upper Saddle River, New Jersey: Prentice Hall. Hitz, M. & Montazeri, B. (1995). Measuring Coupling and Cohesion in ObjectOriented Systems. In Proceedings of International Symposium on Applied Corporate Computing, (25-27), Monterey, Mexico. Jorgensen, P.C. (2002). Software testing a craftsman’s approach, 3nd edition. Boca Raton, Florida: Auerbach Publications. Kitchenham, B., Pfleeger, S., & Fenton, N. (1995). Towards a Framework for Software Measurement Validation. IEEE Transactions on Software Engineering, 21(12), 929-944. Kramer, S. & Kaindl, H. (2004). Coupling and Cohesion Metrics for KnowledgeBased Systems Using Frames and Rules. ACM Transactions on Software Engineering and Methodology, 13(3), 332-358. ON INTER-METHOD 34 Lewis, J. & Lofton, W., (2001). Java software solutions: foundations of program design. Reading, Massachusetts: Addison Wesley Longman, Inc. Mitchell, A. & Power, J. F. (2004). Run-Time Cohesion Metrics: An Empirical Investigation. In Proceedings of International Conference on Software Engineering Research and Practice, Las Vegas, Nevada, USA. Sarkar, S., Rama, G. M., & Kak, A.C., (2007). API-Based and InformationTheoretic Metric for Measuring Quality of Software Modularization. IEEE Transactions on Software Engineering, 33(1), 14-32. Stevens, W.P., Myers, G.J., & Constantine, L. (1974). Structured Design. IBM Systems Journal, 13(2), 200-224. Weiser, M., (1981). Program Slicing. In Proceedings of the 5th International Conference on Software Engineering, (439 – 449), San Diego, California, USA. Yacoub, S., Ammar, H., & Robinson, T., (1999). Dynamic Metrics for Objectoriented Design. In Proceedings of Software Metrics Symposium, (50 – 61), Boca Raton, Florida, USA. Yourdon, E. & Constantine, L., (1979). Structured Design. Upper Saddle River, New Jersey: Prentice Hall. Zhou, Y., Xu, B., Zhao, J. & Yang, H. (2002). ICBMC: An Improved Cohesion Measure for Classes. In Proceedings of International Conference on Software Maintenance, (44-53), Montreal, Canada. ON INTER-METHOD 35 Zhou, Y, Lu, J., Lu, H. & Xu, B. (2004). A Comparative Study of Graph Theorybased Class Cohesion Measures. ACM SIGSOFT, Software Engineering Notes, 29(2), 13 - 18.