Measuring Levels of Abstraction in Software Development Frank Tsui, Abdolrashid Gharaat, Sheryl Duggins, Edward Jung School of Computing and Software Engineering Southern Polytechnic State University Marietta, Georgia, USA Abstract – In software engineering and development, we are expected to utilize the technique of abstraction. Yet, this is one of the most confounding topics, and there is a dearth of guideline. In this paper we explore and enhance the concepts of abstraction as applied to software engineering, define and discuss a metric called levels-of-abstraction, LOA, and discuss the potential of utilizing this metric in software engineering. Keyword: software, abstractions abstraction, levels of I. Introduction In developing software from requirements we are often faced with a “first step” syndrome of where should one start. Many high level architectural styles and patterns [1] have helped overcome this initial hurdle. However, we are still faced with further analysis of the requirements to somehow group similar requirements together along functional line or along some data usage line into sub-components. The question that faces many developers during this early stage of requirements analysis, design and code is the decision of what should be the granularity level of these major components. Stated in another way, the question is what should be the appropriate level of abstraction for specifying, for designing and for implementing the functional requirements. In this paper we first explore the general notion of abstraction and enhance the concept to include levelsof-abstraction as it applies to software engineering. Next we propose a general metric for levels-ofabstraction, LOA, which allows us to gauge the amount of abstraction. Finally, we show some characteristics of LOA and how one may decide on what is the appropriate level of abstraction when developing software. This is a report on the current status of our research in this area. This research is showing some promise in the area of explaining and formulating guidelines for the amount of abstraction and the depth of abstraction required in performing different software engineering activities. II. General Concepts Related to Abstractions in Software Engineering One of the fundamental reasons for engaging in the task of abstraction in software analysis, design and development is to reduce the complexity to a certain level so that the “relevant” aspects of the requirements, design and development may be easily articulated and understood. This starts with the requirements definition through the actual code implementation. In the three major activities of requirements definition, of design definition, and of implementation code definition, there are different degrees of abstraction based on what is considered relevant and necessary. The general relationships of the individual world domain, the abstractions of those domain entities, and the artifacts specifying those abstractions are shown in Figure 1. The bold, vertical arrows represent the intra-transformations occurring within each individual world domain of requirements, design, and implementation. The horizontal arrows represent the inter-transformations occurring across those domains. In software engineering, we are concerned with both the horizontal and the vertical transformations. specifications of abstractions Requirements Document Design Document Implementation Code abstractions Requirements Models of Abstraction Design Models of Abstraction Implementation Models of Abstraction User Design Solutions Executing Software System Individual world Needs/Wants domains Figure 1: Relationships Among Abstractions, Software Artifacts, and Individual World Domains In this paper we focus on the vertical intratransformations, especially the transformations from the individual world domains to the abstractions. The term abstraction used in the form of a verb, as represented with bold vertical arrows in Figure 1 from individual world domain to abstractions, would include the notion of simplification. Simplification represents the concept of categorizing and grouping domain entities into components and relating those components. We simplify by: (i) reduction and (ii) generalization. By reduction, we mean the elimination of the details. Generalization, on the other hand, is the identification and specification of common and important characteristics. Through these two specific subtasks of reduction and generalization we carry out the task of abstraction. Rugaber [7] states that design abstraction is a “unit of design vocabulary that subsumes more detailed information.” This notion of abstraction via simplification is also similar to the notion explicated by Wagner and Deissenboeck [9] and Kramer [3]. Thus via the process of abstraction, we aim to simplify or decrease the complexity of the domain of software design solution by reducing the details and by generalization. One may view the well established concept of modularization in software engineering as a specific technique within the broader context of abstraction. The employment of abstraction in software engineering where we aim to reduce complexity is certainly not just limited to the domain of software design. As Figure 1 shows, many other domains also employ abstraction. At the early stage of software development, the requirements represent the needs and wants of the users and customers. The user requirement is represented in some form of the user business flow, user functional needs, user information flow, user information representation, user information interface, etc. Each of these categories is an abstraction of the user world. Different degrees of abstraction may be employed [3, 4, 8, 11] depending on the amount of details that need to be portrayed in the requirements. This intratransformation is represented by the vertical arrow from User Needs/Wants domain to Requirements Models of Abstraction in Figure 1. As we move from requirements towards the development of the solution for the user requirements, a new form of abstraction takes place. The new form of abstraction is necessitated due to the fact that the solution world includes the computing machines and other logical constructs that may not exist in the original user requirements. One of the first artifacts from the solution side is the software architecture and high level design of the software system. At this point, the abstraction, once again, should not include all the details. The intertransformation of the requirements models of abstraction to the design models of abstraction is shown as the horizontal arrow in Figure 1. Note that the box labeled “Design Models of Abstraction” is the result of two types of transformations: a) inter-transformation from the requirements domain and b) intra-transformation within the design solution domain. The “Implementation Models of Abstraction” box in Figure 1 represents the packaging/loading models of the processes, information and control of the mechanical execution of the solution in the form of source code to satisfy the functionalities and the system attributes described in the requirements and design documents. Thus it is a result of the intertransformations from requirements models of abstraction through the design models of abstractions and the intra-transformations from the actual software execution domain. The “Implementation Code” box is the specification of this abstraction. The actual execution of the Implementation Code and the interactions with the users formulate the final deployment of the source code abstraction or the “Executing Software System” box in Figure 1. Thus the employment of various abstractions and the transformations of these abstract entities are crucial, integral parts of software engineering. III. Measuring Abstraction Abstraction, both as a verb and as a noun, is a crucial element in software engineering. As a verb, we have defined it as the activity of simplification, composed of reduction of details and the generalization of crucial and common attributes. Now, it is relevant to ask how much abstraction would be appropriate so that we can arrive at the “Implementation Code” box and the “Executing Software System” box in Figure 1. Jackson [2] admonishes us that we need to be careful with abstraction and the degree of abstraction because so many seemingly good designs fall apart at implementation time. His warning is well founded in that many design abstractions, the noun, are often missing some vital information for the detail coding activities. In the past, we have utilized the technique of decomposition to move from abstraction to details. However, if our abstraction is generalizing too much to not include the vital information, then Jackson’s warning will turn into reality. The levels of abstraction should be different for various software artifacts and be dictated by the purpose of abstraction. Wang [10] has expressed a similar concern and defined a Hierarchical Abstraction Model for software engineering; his hierarchical model of abstraction describes the necessary levels of preciseness in representing abstractions of different objects. He argues for more precise, rigorous and formal ways to describe different software artifacts such as software architecture and design. In terms of our Figure 1, Wang addressed the issue of rigor of specifications of abstraction in the requirement and design documents, not how much should be included in the abstraction. The how much, or the amount, of abstraction is a reflection of the result of the simplification activity, which is in turn composed of reduction and generalization activities. Measuring the amount of abstraction is gauging the extent of reduction and generalization that took place. For example, this may be possible in the requirements domain. We may consider the set of the original requirements statements of needs and wants as X in the “User Needs/Wants” box in Figure 1. Then, |X|, the cardinality of X is a count of the raw requirement statements collected through some solicitation process. These are the pre-analysis requirements statements. We then designate Y as the statements in the “Requirements Models of Abstraction” box. The cardinality of Y, |Y|, is a count of the statements that resulted from requirements analysis, which include activities such as organizing, grouping, prioritizing, etc. In other words, the post-analysis of the solicited requirements is a form of abstraction of the raw user needs and wants requirement statements. Then the “difference” between |X| and |Y| is: (Level-of-Abstraction)REQ. = |X| - |Y|. (Level-of-Abstraction)REQ represents the “difference” between pre-analysis and post analysis of requirements, and it may be considered a metric of abstraction for requirements. Since simplification is a vital characteristic of abstraction, we expect |Y| to be less than |X|. Thus we will need to further refine this definition with the constraint that if |Y| is not less than |X|, then no abstraction activity really took place. We will also take the subscript, REQ, off the terminology for the general case. In general, let X be the statements in the domain world, and let Y be the set of statements in the abstraction, the noun, then Level-of-Abstraction (LOA) = |X| - |Y|, if |X| > |Y| else =0 Note that when the abstraction activity is carried to its extreme, |Y| should just be 1. For example, all the raw requirement statements of needs and wants are abstracted into one abstract statement. Thus Level-ofAbstraction is bounded by (|X| - 1) and 0. This also implies that Level-of-Abstraction is heavily influenced by |X|. Consider two domain world representations, X1 and X2, where |X1| > |X2|, with their respective abstractions of Y1 and Y2. If |Y1| = |Y2|, then |X1|-|Y1| > |X2|-|Y2|. Thus Level-ofAbstraction states that more abstraction, the verb, took place in the X1 to Y1 scenario than the X2 toY2 case. A more interesting consideration is the situation where we have two different abstractions, Y1 and Y2, from the same world domain, X. Assume |Y1| < |Y2|, then (|X| - |Y1|) > (|X| - |Y2|). Thus, as we would expect, Level-of-Abstraction from X to Y1 is more than that of X to Y2. While we are not certain of the real interval size between one Level-of-Abstraction and its immediate next level, we can say that Level-of-Abstraction is an ordinal metric that provides us a sense of ordering. IV. Measuring Requirements Abstraction: In this section we will further explore the Level-ofAbstraction measurement concept, using requirements analysis as an example. Note that it is very likely that X and Y are not expressed with the same language. English sentences and some diagrams may be the main ingredients of the wants and needs expressed by the users and customers. The result of requirements prioritization, categorization and analysis is some form of abstraction, Y, which may be expressed with a Use Case Diagram. The amount of requirements abstraction defined as the “difference” between pre and post analysis of requirements deserves some further explanation and illustration here. Suppose there are X = {x1, x2, ---, xz} raw requirement statements. The set X may contain a variety of statements, referring to functionality, data and other attributes. A common type of abstraction that may be employed is to categorize and group X by functionality. Thus only a subset of X, X’, is addressed. X’ is the subset of requirement statements that addresses functionality needs. Let X’ = {xx1, xx2, ---, xxn}, where |X’| ≤ |X|. The subset X’ is analyzed and partitioned into some set of categories of functionalities. Let us call this partitioned set, set Y. Y may look as follows. Y = {(xx1, xx2); (xx3, xx5, xx10); ----} Every functionality xxi € X’ is in one of the partitions of Y and in only one of the partitions. Renaming the partitioned set Y as follows, yields: Y = {y1, y2, ----, yk} where y1 = (xx1, xx2) y2 = (xx3, xx5, xx10) . . Yk. The set Y may be represented by a Use Case diagram where y1, y2, ---, yk are the named interaction represented as “bubbles” in the Use Case diagram. Clearly there is more than one way to partition X’; thus there may be different Y’s. A use case diagram with one “bubble” would be an extreme case as well as a use case diagram with a bubble for each xxi. The extreme points of |Y| = 1 nor |Y| = |X’| would be very rare. Now consider a specific case where the domain set X has 4 functional requirements x1, x2, x3, x4. Then there are the following partitioning sets, P’s for different functional abstractions, Y’s. P0 has Y01 = {(x1); (x2); (x3); (x4)} = X P1 has Y11 = {(x1); (x2); (x3,x4)}, Y12 = {(x1); (x3); (x2,x4)}, Y13 = {(x1); (x4); (x2,x3)}, Y14 = {(x2); (x3); (x1,x4)}, Y15 = {(x2); (x4); (x1,x3)} and Y16 = {(x3); (x4); (x1,x2)} P2 has Y21 = {(x1); (x2,x3,x4)}, Y22 = {(x2); (x1,x3,x4)}, Y23 = {(x3); (x1,x2,x4)}and Y24 = {(x4); (x1,x2,x3)} P3 has Y31= {(x1,x2); (x3,x4)}, Y32= {(x1,x3); (x2,x4)}, and Y33= {(x1,x4); (x2,x3)} P4 has Y41 = {(x1,x2,x3,x4)} At P0, there is only one abstraction, Y01, which is the same as the original requirement set X. So Levelof-Abstraction is |X| - |Y| = 0. There is no abstraction at P0. At P1, any of the Y1x has a cardinality of 3. So at P1, |X| - |Y| = 4 – 3 = 1. The Level of Abstraction is 1. At P2, the Y2x’s are grouped differently and each has a cardinality of 2. Thus, at P2, |X|-|Y| = 4-2 = 2. P3 partitioning has the functionalities grouped differently, but each Y3x has a cardinality of 2, just like those in P2. At P3, |X| - |Y| = 4 -2 = 2. Thus all partitions in P2 and in P3 are at the same Level-ofAbstraction, 2. Finally at P4, there is again only one abstraction, Y41, which combined all four functionalities into 1 category. Thus at P4, |X| - |Y| = 4 – 1 = 3. The Level-of- Abstraction is the highest here. From this example, one can easily see that abstraction of functionalities into groups for a relatively small set of four functional requirements has many choices. In this case there are 15 choices and there are 4 different Level-of-Abstraction, namely 0 through 3. Coming up with the one “best” abstraction is not an easy task even with this small example. V. General Theorems We have shown that for a |Y| = k, there may be several different abstractions with that same cardinality. That is, given an abstraction level j, there may be more than one solution. Theorem 1: For Level-of-Abstraction = |X| - |Y| = j, there exists more than one abstraction or partitioned set, Y, at that Level-of-Abstraction, unless j = 0 or j = |X| -1. Proof: Given |x| - |y| = j, if j=0 then |x| - |y| = 0 and there is no abstraction. If |x| - 1 = j, then |y| has only 1 category with all functionalities included, so |x| - |y| = |x| - 1, which is the highest level of abstraction. From combinatorics, we know for any set with n>0 elements and r an integer such that 0 ≤ r ≤ n, then the number of subsets that contain r elements of S is n!__ . Hence for all the values in between, there r!(n-r)! will be partitioned sets. Note that the different Level-of-Abstraction’s as shown in our simple example of P0 through P4 form a partially ordered set. Next we introduce an Up and a Down operator on Level-of-Abstraction. Given a Level-of-Abstraction, Px, and a Level-of-Abstraction, Py, any x€Px and y€Py, then UP and Down operators are defined a follows: UP(x,y) = x if Py ≤ Px and = y otherwise. Down(x,y) = y if Py ≤ Px and = x otherwise. Theorem 2: Set of Level-of-Abstractions, with the operators of Up and Down form a lattice structure. Proof: The set of Level-of-Abstraction is a partially ordered set, or a Poset. A lattice is a Poset in which any two elements have a lowest upper bound (lub) and a greatest lower bound (glb). The Up operator provides us the glb and the Down operator provides us the lub. Thus the set of Level-of Abstraction forms a lattice. VI. SUMMARY AND RESULTS In this paper we explored the general notion of abstraction as applied to software engineering. We further proposed a general metric for levels-ofabstraction, LOA, which allows us to gauge the amount of abstraction. Lastly, we showed some characteristics of LOA and indicated how one may decide on the appropriate level of abstraction when developing software. This paper indicates our preliminary work in this area. Our next task is to further explore the properties of our Up and Down operators. For example, we believe for the UP operator, the glb is the identity element. With it, the set of LOA forms a Group. With the Down operator, the identity element is the lub. Thus with both Up and Down and their respective identity elements, the set of LOA may be a Ring. We further will explore the fact that when you go Up and Down in LOA, how do you know which gives the best coverage of abstraction? Our initial idea is that the top level of abstraction may not be the best in terms of design. References 1. D. Garlan and M. Shaw, “An Introduction to Software Architecture,” CMU-CS-94-166, Carnegie Mellon University, Pittsburgh, PA, 1994. 2. D. Jackson, Software Abstractions Logic, Language, and Analysis, MIT Press, 2006. 3. J. Kramer, “Is Abstraction the Key to Computing,” Communications of the ACM, Vol. 50, No 4, April 2007, pp 37-42. 4. J. Kramer and O. Hazzan, “Introduction to The Role of Abstraction in Software Engineering,” International Workshop on Role of Abstraction in Software Engineering,” Shanghai, China, May, 2006. 5. S. Morasca, “On the Use of Weighted Sums in the Definition of Measures,” Workshop on Emerging Trend in Software Metrics, Cape Town, S. Africa, May, 2010. 6. D. E. Perry, “Large Abstractions for Software Engineering,” 2nd International Workshop on Role of Abstraction in Software Engineering, Leipzig, Germany, May, 2008. 7. S. Rugaber, “Cataloging Design Abstractions,” International Workshop on Role of Abstraction in Software Engineering,” Shanghai, China, May, 2006. 8. F. Tsui and O. Karam, Essentials of Software Engineering, 2nd edition, Jones and Bartlett Publishers, 2010. 9. S. Wagner and F. Deissenboeck, “Abstractness, Specificity, and Complexity in Software Design,” 2nd International Workshop on Role of Abstraction in Software Engineering, Leipzig, Germany, May, 2008. 10. Y. Wang, “A Hierarchical Abstraction Model for Software Engineering,” 2nd International Workshop on Role of Abstraction in Software Engineering, Leipzig, Germany, May, 2008. 11. F. Tsui, A. Gharaat, S. Duggins and E. Jung, “Measuring Levels of Abstraction in Software Development”, Internal Research Report, Software Engineering, Southern Polytechnic State University, July 2010.