Component Mining - Mahdi Cheraghchi

advertisement
Component Mining
Mahdi Cheraghchi-Bashi-Astaneh
cheraghchi@ce.sharif.edu
Outline
What is a component?
 Software reuse
 What is component retrieval?
 Pros and cons of reuse
 How to retrieve?
 Evaluation

Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
2
What is a component?





A part of the whole.
“A piece of software small enough to create and
maintain, big enough to deploy and support, and with
standard interfaces for interoperability" - Jed Harris,
President CI Labs.
Self contained binary pieces of software, but not
complete applications.
Can be combined with other components to produce
complete applications, regardless of the languages the
components are implemented in or platforms they run
on.
Object-Oriented methods are often used for component
development and reuse.
Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
3
Some Examples in Practice
Borland Delphi
 Borland C++ Builder
 Borland Kylix
 OLE / COM / ActiveX
 JavaBeans
 CORBA

Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
4
Software Reuse
Software reuse is the process of creating
software systems from existing software
rather than building software systems from
scratch. [Krueger,1992]
 Levels of software reuse: source code,
algorithms, architectures, domain models,
design, program transformations,
documentation, … every possible aspect
of a software system

Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
5
What is Component Retrieval?




The mere existence of a component library does
not automatically entail its re-use.
“Component Mining” is the deliberate, organized
and automated process of extracting reusable
components from an existing rich software base.
Re-users need support to help them identifying
components which suit their needs, This task is
the topic of software component retrieval.
The goal is to develop reusable, adaptable
software components rather than large,
monolithic applications.
Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
6
Types of Reuse
Black-Box Reuse: a client may reuse the
retrieved components “as is.”
 Component-adaptive Grey-Box Reuse: a
client may reuse the retrieved components
without meeting any additional conditions
but only after interface-level modifications
of the components.
 White-Box Reuse: arbitrary additions and
modifications are required.

Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
7
Pros and Cons of Reuse

Advantages:
1.
2.
3.
4.

Reduces time and cost spent on programming.
Increases programmers’ productivity.
Increases program quality and reliability.
Expertise sharing
Problems:
1.
2.
3.
4.
It is hard to find things, especially in a large scale.
Typically components are not (easily) modifiable.
It is hard to manage a large pool of components.
It only worth if it is easier to locate and modify a
reusable component than to write it from scratch.
Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
8
How to Retrieve?




Component retrieval is in fact a form of information
retrieval. Despite this fact, “dedicated” component
retrieval algorithms are being developed, since
software is more than an ordinary text.
Component retrieval is a complex and heuristic
process.
Typically needs a well-structured repository of
components.
Methods of retrieval
1.
2.

Algorithms based on the meta-data accompanying software
components.
Algorithms based on the structure of the components.
Exact retrieval versus approximated retrieval
Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
9
Retrieval by Meta-Data



By meta-data we mean the documentation
accompanying the component.
This method relies on existence and quality of
the documentation and needs some preprocessing.
How to find?
1.
2.
Using full-text search on documents and program
files: No cost, but inaccurate
By classification of the components either
automatically or manually. (depending on the cost
and accuracy we need)
Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
10
Retrieval by Structure
Depends on the availability of the structure
in some form (source code, interface, etc)
 Depends on the availability of computer
language processors.

Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
11
Some Other Methods

Formal component specification
1.
2.
3.

Domain theories: algebraic model,
signatures, etc
Interface specifications
Interface matching (automated theorem
proving, etc)
Semantic Classification

Feature-based methods (What possible
features can a component have?)
Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
12
Some Other Methods

Deduction-Based Component Retrieval
Is the only method which retrieves proven
matches only.
 Suitable for the development of high-reliability
or safety-critical applications, e.g. space craft
control systems.

Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
13
Searching and Browsing



Searching: Software developers formulate a query, and the
repository system returns components that match the query.
 Problem: Formulating an effective query is a challenging
task.
Browsing: Developers determine the relevance of the
components currently being displayed in terms of their
development task, and traverse the associated links.
 It is an incremental task, and is usually preferred.
 Problem: Software developer may be puzzled.
Context-Aware Browsing: Infers developers’ tasks by
monitoring their interactions with the environment.
 Similar to browsing, but results in a significantly smaller
browsing space.
 Uses learning methods to refine itself.
 Problem: It is difficult to “understand” the content.
Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
14
The Reuse Environment
A component database.
 A library management system providing
access to the database.
 A software component retrieval system
(e.g. an ORB) that enables client
applications to retrieve components from
the library server.
 CBSE tools that support the integration of
reused components into a new design.

Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
15
Evaluation Measures
Recall = Ratio of the number of relevant
components retrieved to the total number
of relevant components in repository
 Precision = Ratio of the number of relevant
components retrieved to the total number
of components retrieved
 Response time

Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
16
Summary and Conclusion





Software reuse is a crucial concern in today’s
world of complex software products.
Component-based development model plays an
important role in software reuse.
Component-based model is useful only when an
satisfactory means of retrieval is available.
No definite answer has yet been developed for
description of components in unambiguous
classifiable terms.
Component retrieval is a difficult problem and
more work is needed to find an efficient solution.
Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
17
References








D. Spinellis, K. Raptis, Component Mining: a process and its pattern
language, Information and Software Technology 42 (2000) pp 609-617
Hafedh Mili et al, An experiment in software component retrieval,
Information and Software Technology 45 (2003) pp 633-649
K. McArthur et al, An evaluation of the impact of component-based
architectures on software reusability, Information and Software
Technology 44 (2002) pp 351-359
P.A.V. Hall, Architecture-driven component reuse, Information and
Software Technology 41 (1999) pp 963-968
I. Crnkovic, M. Larsson, Challenges of component-based development,
The Journal of Systems and Software 61 (2002) pp 201-212
Y. Ye, G. Fischer, Context-Aware Browsing of Large Component
Repositories, IEEE 16th International Conference on Automated
Software Engineering, 2001
A. M. Zaremski, J. M. Wing, Signature Matching, A Key to Reuse
B. Fischer, Deduction-Based Software Component Retrieval (Thesis)
Mahdi Cheraghchi-Bashi-Astaneh
(cheraghchi@ce.sharif.edu)
18
Download