Component Mining Mahdi Cheraghchi-Bashi-Astaneh cheraghchi@ce.sharif.edu Outline What is a component? Software reuse What is component retrieval? Pros and cons of reuse How to retrieve? Evaluation Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 2 What is a component? A part of the whole. “A piece of software small enough to create and maintain, big enough to deploy and support, and with standard interfaces for interoperability" - Jed Harris, President CI Labs. Self contained binary pieces of software, but not complete applications. Can be combined with other components to produce complete applications, regardless of the languages the components are implemented in or platforms they run on. Object-Oriented methods are often used for component development and reuse. Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 3 Some Examples in Practice Borland Delphi Borland C++ Builder Borland Kylix OLE / COM / ActiveX JavaBeans CORBA Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 4 Software Reuse Software reuse is the process of creating software systems from existing software rather than building software systems from scratch. [Krueger,1992] Levels of software reuse: source code, algorithms, architectures, domain models, design, program transformations, documentation, … every possible aspect of a software system Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 5 What is Component Retrieval? The mere existence of a component library does not automatically entail its re-use. “Component Mining” is the deliberate, organized and automated process of extracting reusable components from an existing rich software base. Re-users need support to help them identifying components which suit their needs, This task is the topic of software component retrieval. The goal is to develop reusable, adaptable software components rather than large, monolithic applications. Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 6 Types of Reuse Black-Box Reuse: a client may reuse the retrieved components “as is.” Component-adaptive Grey-Box Reuse: a client may reuse the retrieved components without meeting any additional conditions but only after interface-level modifications of the components. White-Box Reuse: arbitrary additions and modifications are required. Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 7 Pros and Cons of Reuse Advantages: 1. 2. 3. 4. Reduces time and cost spent on programming. Increases programmers’ productivity. Increases program quality and reliability. Expertise sharing Problems: 1. 2. 3. 4. It is hard to find things, especially in a large scale. Typically components are not (easily) modifiable. It is hard to manage a large pool of components. It only worth if it is easier to locate and modify a reusable component than to write it from scratch. Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 8 How to Retrieve? Component retrieval is in fact a form of information retrieval. Despite this fact, “dedicated” component retrieval algorithms are being developed, since software is more than an ordinary text. Component retrieval is a complex and heuristic process. Typically needs a well-structured repository of components. Methods of retrieval 1. 2. Algorithms based on the meta-data accompanying software components. Algorithms based on the structure of the components. Exact retrieval versus approximated retrieval Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 9 Retrieval by Meta-Data By meta-data we mean the documentation accompanying the component. This method relies on existence and quality of the documentation and needs some preprocessing. How to find? 1. 2. Using full-text search on documents and program files: No cost, but inaccurate By classification of the components either automatically or manually. (depending on the cost and accuracy we need) Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 10 Retrieval by Structure Depends on the availability of the structure in some form (source code, interface, etc) Depends on the availability of computer language processors. Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 11 Some Other Methods Formal component specification 1. 2. 3. Domain theories: algebraic model, signatures, etc Interface specifications Interface matching (automated theorem proving, etc) Semantic Classification Feature-based methods (What possible features can a component have?) Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 12 Some Other Methods Deduction-Based Component Retrieval Is the only method which retrieves proven matches only. Suitable for the development of high-reliability or safety-critical applications, e.g. space craft control systems. Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 13 Searching and Browsing Searching: Software developers formulate a query, and the repository system returns components that match the query. Problem: Formulating an effective query is a challenging task. Browsing: Developers determine the relevance of the components currently being displayed in terms of their development task, and traverse the associated links. It is an incremental task, and is usually preferred. Problem: Software developer may be puzzled. Context-Aware Browsing: Infers developers’ tasks by monitoring their interactions with the environment. Similar to browsing, but results in a significantly smaller browsing space. Uses learning methods to refine itself. Problem: It is difficult to “understand” the content. Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 14 The Reuse Environment A component database. A library management system providing access to the database. A software component retrieval system (e.g. an ORB) that enables client applications to retrieve components from the library server. CBSE tools that support the integration of reused components into a new design. Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 15 Evaluation Measures Recall = Ratio of the number of relevant components retrieved to the total number of relevant components in repository Precision = Ratio of the number of relevant components retrieved to the total number of components retrieved Response time Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 16 Summary and Conclusion Software reuse is a crucial concern in today’s world of complex software products. Component-based development model plays an important role in software reuse. Component-based model is useful only when an satisfactory means of retrieval is available. No definite answer has yet been developed for description of components in unambiguous classifiable terms. Component retrieval is a difficult problem and more work is needed to find an efficient solution. Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 17 References D. Spinellis, K. Raptis, Component Mining: a process and its pattern language, Information and Software Technology 42 (2000) pp 609-617 Hafedh Mili et al, An experiment in software component retrieval, Information and Software Technology 45 (2003) pp 633-649 K. McArthur et al, An evaluation of the impact of component-based architectures on software reusability, Information and Software Technology 44 (2002) pp 351-359 P.A.V. Hall, Architecture-driven component reuse, Information and Software Technology 41 (1999) pp 963-968 I. Crnkovic, M. Larsson, Challenges of component-based development, The Journal of Systems and Software 61 (2002) pp 201-212 Y. Ye, G. Fischer, Context-Aware Browsing of Large Component Repositories, IEEE 16th International Conference on Automated Software Engineering, 2001 A. M. Zaremski, J. M. Wing, Signature Matching, A Key to Reuse B. Fischer, Deduction-Based Software Component Retrieval (Thesis) Mahdi Cheraghchi-Bashi-Astaneh (cheraghchi@ce.sharif.edu) 18