A Case-Based approach to Software Component Retrieval

A Case-Basedapproachto Software Component
Retrieval
Carmen Fern~indez-Chamizo,
Pedro A. Gonz~lez-Calero
Luis Hermtndez-Yltfiez,
and Alvaro Urech Baqu~
Dep. Informtttica -Fac. F/sica - Universidad Complutense
28040 Madrid, SPAIN
34-1-3944381 (Phone); 34-1-3944687 (FAX); cfernan@dia.ucm.es
From: AAAI Technical Report SS-93-07. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.
Abstract
A major problemconcerning the reusability of software is the retrieval of software
components. Different approaches, ranging from automatic indexing methods to
knowledge-basedsystems, have been followed to solve this problem.
In this paper we present three prototypes which implementdifferent methods
of componentindexing. From these prototypes we propose a hybrid system that
uses CBRas primary approach but taking advantage of IR methods to facilitate
the access to the adequate components.
1. Introduction
Softwarereuse is widely’believedto be one of the most
promisingtechnologiesfor improvingsoftware quality
andproductivity(Biggerstaff& Richter1987).
Work on reusability has followed several
approaches.As Krueger(Krueger1992)says, wecan find
verydifferent types andlevels of reusability. Hedivides
the different approachesto softwarereuse into eight
categories: high-level languages, design and code
scavenging,source codecomponentes,
softwareschemas,
application generators, very high-level languages,
transformational systemsand software architectures.
Theseapproachesrely on reuse techniquesthat rangefrom
abstractions at a very low level, such as assembly
languagepatterns, to very high level abstractions, as
softwaredesigns. However,
there are manytechnical and
non-technicalproblemsto be solvedbefore widespread
softwarereuse becomesa reality. Oneof the problemsis
the one concernedwith the classification, storage and
retrieval of reusable components.Ourcurrent workis
aimedat the solutionof this problem.
If a reusabilitylibraryis to be successful,it mustbe
structuredto help the reuser to find softwarecomponents
of interest, browsethroughrelated components
to locate
the component
that best suits the current needs,andbuild
a custom-tailored software componentusable in the
softwaresystembeing developed.
Theuser shouldbe able to find possiblecomponents
with only an approximate idea of the component’s
function, without requiring to write a detailed formal
specification.
2. Software Component Retrieval
Oneof the fundamentalissues in the reusability of
softwareis the retrieval of softwarecomponents.
In order
to reuse a software component,wehave to be able to
retrieve it froma component
base in an easy andefficient
way.Therefore,the softwarecomponent
retrieval focuses
in two majorproblemareas: the classification of the
software componentsand the retrieval method. Both
problems are interrelated, because the way the
components
are organizedinfluencesthe retrieval method
to be used.
Traditionalapproaches
to softwareretrieval fall into
two complementary
categories: high.level classification
techniques, which emphasizeretrieval by software
category, and low-level cross-reference tools, which
facilitate variouskindsof browsing
at the codelevel. We
Thisworkis supported
by the SpanishComitteeof Science&Technology
(CICYT,
TIC92-0058)
35
are primarily concernedwith high-level techniquesbut
low-level tools are also needed to establish some
importantrelationships betweencomponents
at the code
level.
The high-level classification of the software
components
can followbasically two approaches.Wecan
extract the classification informationfromthe component
itself (code, documentation).
Or, on the other hand,
can implement
a classification schemeusing information
about the components
that lies outsideof them.That is,
we can provide for each componentsomeadditional
informationfor the classificationscheme
to base on; this
information could be obtained, for example,from an
analysis of the domain.Mostof the systemsbasedon the
first approachuse automaticindexingmethods,whilethe
systems based on the second approach are usually
"knowledge-based
systems".
ThoseLAswhich have a high resolving powerin the
documentwith respect to the corpus are selected as
indices. In Guru,all softwarecomponents
are classified,
stored, compared,and retrieved according to these
indices. Theycan evenbe organizedinto hierarchies for
browsingusing clustering techniques. At the retrieval
stag.e, the user canspecifya queryin free-style natural
language, whichis indexed using the sameindexing
technique.Theset of LAsextractedfromthe querydirects
the repositorysearch. Variousrankingmeasurescan then
be usedby the Gurusystemto select the best candidate
for a particular query.Thissystemhas beenappliedto the
UNIXcommand
set and also to an object-oriented class
library (Helm& Maarek1991).
As the informationprovidedby IR tools is derived
automatically,this approachpresentsadvantagesin cost,
transportability and scalability. Statistical methods,
however,can’t substitute for meaning.
2.1. Automaticindexing approach
2.2. Knowledge-Basedapproach
Dueto the increasingsize of natural language
descriptions
of softwarecomponents
in recent libraries, information
retrieval (IR) techniquesbasedon statistical methods
(Salton 1986) are becomingmore usual in component
retrieval. This approachextracts informationfromthe
natural-language documentation of the software
components.It doesn’t use any semanticknowledgeand
it doesn’t intend to understandthe documentation.The
goal of this approachis to characterizeeach component
by a set of indicesthat are automaticallyextractedfrom
its natural languagedocumentation.
Maarek(Maarek et al. 1991) identifies three
requirements
that shouldbe fulfilled by IR techniquesin
the softwaredomain:allowmultiplewordretrieval, select
onlykeyindices, andachievehigh precision.
Thereis a growinginterest in the potential contributions
of artificial intelligenceto softwareengineering(Arango
1988). Developmentof knowledge-basedtools for
softwarereusability is oneof the mostpromisingresearch
topicsin this area.
Thekey feature of this approachis that it draws
semanticinformationabout softwarecomponentsfroma
humanexpert. Knowledge-based
systemsare often very
sophisticated.Asa tradeoff, they requiredomainanalysis
and a great deal of pre-encoded, manuallyprovided
semanticinformation.
Prieto-Dfaz(Prieto-Diaz &Freeman1987) created
a classification schemebasedon the library science. He
proposes a set of six facets: three related to the
functionality of the component
and three related to its
environment.Thedifferent values a facet can haveare
called terms. Theseterms are structured aroundcertain
supertypes that represent organizing concepts.
Conceptualdistances betweenterms are assignedby the
user. Toclassify a component,
a value for eachfacet, a
term, mustbe given, so eachcomponent
is characterized
by a six-tuple of terms. The search for a reusable
component
is accomplishedby entering a querywith six
terms.Theset of termsis finite, but a thesaurus
is provided
to help makingthe query. Theconceptual graph that
organizesdomainconcepts represents manuallyencoded
knowledgeabout the domain.
Thesystem proposedin (Frakes and Nejmeh1987)
uses an existing IR system, CATALOG,
for storing and
retrieving C software components.Each componentis
characterizedby a set of single- termindices that are
automatically extracted from the natural- language
headers of C programs.
The atomic indexing unit in the Guru system
(Maareket al. 1991)is the lexical affinity (LA).An
betweentwounits of languagestandsfor a correlation of
their common
appearance.A set of LA-based
indices (or
profile) is built for each documentby performing
statistical analysisof the distributionof wordsnotonlyin
the document,
but also in the corpusformedbythe set of
all the documents
in the repository.LA-based
profiles are
automatically built for each componentdescription.
The system proposed in (Woodand Sommerville
1988) uses Conceptual Dependency(Schank 1972)
36
represent knowledgeabout software components.They
define the "componentdescriptor frames" in order to
represent the function performedby the component
and
the objects manipulatedby the function. After analyzing
severalapplicationdomains,theyidentified from10 to 25
basic functions(primitiveactions) for software.There
onebasic functionfor eachclassification of conceptually
similar verbs, that is, verbsthat describe semantically
similar software functions. Also required is a
classification of the objects manipulatedby software
components,into classes or "nominals"that represent
conceptuallysimilar objects. Theinterface to the system
is forms-based.
hierarchyensuresthat components
are properlyorganized
andcategorized.Thetaxonomy
can also be useful in query
formulation and reformulation. Whenquerying the
database,if there are no answersor if there are too many
answers, the hierarchy can be used to specialize,
generalize, or look for alternatives for an appropiate
portionof the query;modifythis portionandqueryagain.
LASSIE
incorporates a natural-languageinterface which
uses a list of compatibilitytuples to parse the input.
Compatibility tuples, which indicate plausible
associations amongobjects, are obtained from the
frame-like knowledgerepresentation mechanism
used by
LASSIE.
Embley and Woodfield (Embley & Woodfield
1987)definea knowledge
structure for a softwarelibrary
consistingof abstract data types (ADTs).This knowledge
structure supportsdifferent relations amongADTs.The
ADTs
in the library includenatural languagedescriptions
andkeywords
to assist in findingandbrowsing
activities.
Relationships in the ADTlibrary can be automatic
(relationships implied by ADTinformationcontainedin
the library) or user-defined(explicitly providedby the
user). Defined relationships for automatically
determiningcloseness, generalizations,specializations,
generics, and aggregationsimplicitly imposea useful
structureonthe library. In addition,library administrators
may also define their own generalizations,
specializations,andclassificationsto furtheraid the user.
The"find" operation, implementedin the prototype,
allows several meansby whichan ADTcan be located:
by name,by general descriptionsin natural-language,by
keywordsor by operation descriptions. Closeness
betweennatural-languagedescriptions is determinedby
evaluating the numberand frequencyof content words
foundin both the givenandcandidatedescriptions.
All these knowledge-based
systems have several
things in common.
Theyall represent the knowledgein
frame-likestructureswithsimilarsets of slot-fillers. They
all organize the knowledgearound the functions
performedby the components,although someof them
also includeframerepresentationsof the objectsinvolved.
In every single case, the characterization of the
components
with slot-fillers is donemanually,following
a pre-establishedmodelof the domain.
TheLASSIE
system(Devanbuet al. 1991) embodies
a frame-based knowledgerepresentation. Software
components
are describedin terms of the operationsthat
theyperform.Eachof these actionsis describedby giving
its actor, object, recipient, agent,environment,
andso on.
Theserelationshipsare codedin a specializedknowledge
representation system whichclassifies them into a
conceptualhierarchy.Thefour principalobjects typesare
OBJECT,ACTION,DOER,and STATE.Nodes below
DOERand OBJECTrepresent the architectural
component of the system. Nodes below ACTION
represent the system’s functional component. The
relationship between the two system componentsis
captured by various slot-filler relationships between
ACTIONs, OBJECTsand DOERs. This taxonomic
37
Another approach is undertaken in CODEFINDER
(Henninger1991). This systemfaces a slightly different
goal. It exploresthe problemof findingprogramexamples
whichare relevant to a design task. CODEFINDER
uses
an associative spreadingactivation methodfor locating
softwareobjects.
Curtis(Curtis 1989)has analyzeddifferent software
indexing methodsused by knowledge-based
systems. He
postulatesthat the effectiveuse of a reusablelibrary will
require an indexing schemesimilar to the knowledge
structures built into mostprogrammers
workingin an
applicationarea. Thechallengeof suchindexingschemes
is that the organizationof a programmer’s
knowledge
will
changewith increasedexperience.
3. Previous
work
Wehave developedthree prototypes: ARGOS,
S ARCand
PROSA.ARGOS
is an IR system which helps UNIX
users find out the commands
they need. TheSARC
system
implementsa component
classification mechanism
based
on facets. The PROSA
system is a CBRsystem for
ProgramSynthesis whichindexes software components
in a DynamicKnowledgeBase. These prototypes are
describedbelow.
3.1.
ARGOS: An IR assistant
for
Total values of frequencyof appearanceof the
wordsand LAsin the wholecorpora are computed.
UNIX
ARGOS
(Buenagaet al. 1992) is an on line intelligent
assistant for UNIX
basedon IR techniques,user modeling
and hypertext. The goal of this system is to obtain
information about the UNIXcommand(or commands)
that implements
an operationdescribedby the user using
natural-language.TheUNIX
manualtext files constitute
the document database. ARGOS
uses the method
proposedin (Maarek1991) to characterizeeachdocument
by detectingthe lexical affinities (LAs).
The user will normally use ARGOS
to solve a
particularproblem,but he or she will needto makeseveral
queriesto find the solution. Todisplaythe information
on
screen, the systemtakes into accountnot only the last
query,but the wholeset of previousqueriesrelative to the
user’s problem.Thereis a buttonfor the user to click on
whenthe documentation
displayedis irrelevant to his or
her problem.Thesystemwill use this informationabout
irrelevance whendoingthe subsequentsearches. This is
accomplishedby the ARGOS’
User ModelModule.The
user maypartially reset the user modelto indicatethat he
or she wantsto start a newsearchon a differentsubject.
ARGOS
uses hypertext techniques as navigating
tool through the information. Whenthe user wants to
knowmoreabout a topic, ARGOS
searches the related
information
in the document
database,anddisplaysit. The
hypertext links are not fixed. They are generated
dynamically,depending
on the interest of the user.
Theuser modeltries to guess whatare the user’s
interests in order to modifythe subsequent
searchesor to
makecorrectionsto the outputof the IR module.Theuser
modelis incremental. The information is implicitly
obtainedby the interactionwiththe user.
Theanalysis of the documentsin the database is
performed
onlyonce.Theresults of the analysisare stored
in auxiliary files. In the analysis of eachdocument,
the
followinggoals are accomplished:
¯ Onlyopenclass words(nouns,verbs, adjectivesand
adverbs) are considered. Closed class words
(pronouns, prepositions, conjunctions and
interjections)are ignored.
¯ Wordsare taken into their canonical form,
eliminating morphemes
and suffixes.
¯ For each document,the set of wordsand their
frequencyof appearanceis computed.The set of
lexical affinities andtheir frequencyof appearance
is also computed.
38
Valuesfor the resolvingpowerof eachwordand LA
are calculated for eachdocument.
3.2.
SARC: A faceted system for class
retrieval
Wehave also developed the SARC
system (Cervig6n
Hernfindez
1992),a prototypethat helpsthe reusersin the
retrieval of software components. The software
components
are the classes of an object-orientedlibrary
andthe systemlets the user select oneof twolibraries:
S malltalkor ObjectProfessional.Thesystemincorporates
a friendly user interface with multiple windows,mouse
supportandcontext-sensitivehelp.
SARC
integrates high-level functional knowledge
and low-level knowledge.The high-level functional
knowledgeis built into the classification scheme,a
simplified version of the faceted classification of
(Prieto-Dfaz 1991) with somemodificationsdue to the
differences betweenobject classes and general software
components.
Thecomponents
are characterizedby facets.
For eachfacet, a component
has an associatedset of terms
(or keywords).In order to find candidatecomponents,
the
user selects onetermfor eachfacet andthe systemshows
a list with all the components
that matchthe required
terms,if any.
Thesystemalso uses low-level knowledge
acquired
by a deepanalysis of the sourcecodeof the components.
Thefacetedclassificationis constructed
by the interaction
with expert programmerswhocharacterize the new
components
they incorporate to the library. This means
that the characterization
is moreor less subjective.Forthis
reason,the systemusesan affinity metricbasedin the code
shared betweenany pair of components(Chidamber
Kemerer1991). These affinity values let the system
associateto eachcandidatea set of "near"components
to
be considered
by the user. Theuser establishesthe affinity
thresholdthe systemuses in this search. SARC
uses other
metrics in order to computethe reuse complexityof the
selected components,sorting them by the complexity
values.
Oncethe systemdisplays the selected components,
the user can inspect the code and access a hypertext
browserthat allowshimor her to get moreinformation
about the selected andother related components.
Finally, whena newcomponentis added to the
library, the systemasksthe programmer
for the termsthat
characterizethe facets of the newcomponent.
In order to
help the user in the characterizationof the facet that
reflects the functionality of the component,
the system
extracts keywords
fromthe comments
of the sourcecode.
Theuser mayselect anyof those keywords,or any other
word,as terms.
integers, links, etc. andnon-primitive
objectslike linear
lists, trees, graphs,etc. Thesystemknows
implicitlyabout
the primitive objects. Non-primitive objects are
introducedinto the systemgivingtheir namesanda set of
attributes. Newobjectsare definedin termsof the objects
(primitive or non-primitive)knownby the system.
Operationsrefer to the actions that manipulatethe
objects. Wecan distinguish betweenprimitiveoperations
(non-definedoperations) and non-primitiveoperations
(stated in terms of primitive or previously defined
operations).
3.3. PROSA:A CBRsystem for Program
Synthesis
The study of the behavior of humanprogrammerscan
offer useful guidancefor supporting software reuse,
because human programmers perform many
programming
tasks by "reusing" previously acquired
mental schemas,rather than by reasoning from first
principles (Rich andWaters1988)(Steier 1991).
Programming
is usually taught by example.Froma
set of training programs,students can abstract some
programschemas, or even somegeneral programming
techniques. Therefore, weconsider that CBR(Riesbeck
&Schank1989) is a natural approachto the software
design problem.Followingthis approachwedevelopeda
prototype, the PROSA
System(Mdndizet al. 1992),
whichis describedbelow..
The core of the system is a DynamicKnowledge
Base (DKB)(Fermindez-Valmayor and Fermindez
Chamizo1992), implemented
as a discrimination net of
frame-like
structures
named MOPs(Memory
OrganizationPackages)(Schank1980). This system
initially developedto modelthe learningprocessesin a
different application domain(Fem/mdez-Valmayor
and
Fern~dez Chamizo1991).
Initially, weintroducedinto the DKB
two kinds of
information: a basic Concept Dictionary and a Case
Library. The Concept Dictionary is a hierarchical
structurethat containsthe basic terminology
to be usedin
the applicationdomain.At anytime, the user can expand
the dictionary(and the CaseLibrary)accordingto his
her needs.Eachconceptdefinitionis introducedas a name
and a property list (set of attribute-value pairs
characterizing the concept). Weaccess the conceptsby
givingtheir names
or bygivingtheir attribute lists. Access
by nameis providedby a concepttable whichlinks each
namewith the MOP
correspondingto the concept in the
DKB.
There are two kinds of entities in the Concept
Dictionary:objects andoperations.
The system makesabstractions from the common
attributes of various objects or operations. Whenan
abstract conceptis obtainedin this manner,the system
asks the user for a name.This name,if providedby the
user, will identifythe abstractconcept.
Eachease stored in the CaseLibraryconsists of a
problemand its solution. A solution is described at
different levels of abstraction (stepwise refinement
paradigm).At each level, the solution is expressedin
pseudocodeas the compositionof several subproblem
solutions (software components).Therefore, we can
represent recursively the refinement tree which
correspondsto the problemsolution. Thefinal refinement
levelin this tree canbe directlytranslatedto an imperative
programminglanguage. Problem descriptions and
solutionsare also introducedinto the systemas property
lists and they are stored in the DKB
as instance MOPs.
Problem(and subproblem)solutions are accessible
giving the problemattributes or someof the concepts
mentionedin the problem(or subproblem)
description.
Casesare indexedin the DKB
by the operationsthey
implement. The subproblems obtained in the
decomposition
of the initial problemare also indexedby
the operationsthey implement.Theseindexesallow the
reuse of subproblem
solutions in other problems.
Whenintroducinga newtraining case, the system
attemptsto find common
attributes betweenthe newcase
andexisting cases resident in the DKB,similar to the
abstraction process we described in the Concept
Dictionary. Fromthese common
attributes the system
generates abstraction MOPswhichrepresent generic
problems. Common
attributes are selected from the
problem descriptions (without refinement trees).
Therefore, they correspond to abstract problem
descriptionsinsteadof abstractsolutions.
Whena new problem is presented, the sytem
searches the DKB
for analogousproblems,by following
a top-downstrategy. The major goal of PROSA
is the
Objectsrefer to entities that are manipulated
by a
program. Weuse primitive objects like characters,
39
adaptation of the retrieved solutions to solve the new
problems. The problem solving mechanismsused in this
adaptationprocessare out of the scopeof this paper, where
we are concernedwith the software indexing methods.
4. A Hybrid Approach to
Software Components Retrieval
The goal of our current work is to develop a tool for
indexing and retrieving software components from a
library of object classes. In fact, this tool will be part of a
wider environment that will embodyseveral tools for
helping and teaching the usage of the class library. We
intend to take advantage of the results of the three
previously developedsystems, but taking into account the
different requirements of the newsystem:
¯ Object-oriented design, unlike classical functional
design, bases the modular decomposition of a
software systemon the classes of objects the system
manipulates, not on the functions the system
performs (Meyer 1987). Therefore, in a class
library, the software componentsare classes of
objects which include the operational blocks
(methods)that are applicable to these objects.
In a class library, the documentationis organized
aroundobject classes instead of aroundfunctional
blocks. ARGOSand PROSAassumed that each
software description correspondedto a functional
block.
Due to the inheritance mechanism, many class
descriptions are incomplete because the inherited
methodsare decribed in the ancestor classes.
The Case Library of software componentswe used
in the PROSA
system was specifically designed for
it. Therefore, the functional description of each
software componentwas tailored to fit the property
list required by the prototype. Now,the goal is to
apply the same indexing methodsto a commercially
available class library.
Load
Declaration
constructorLoad(varS : idStream);
Purpose
Loada string dictionary froma stream.
Ducription
S is a properlyinitialized streamobject (a streamfile openedfor reading,for example).Notethat S canbe any
descendant of the IdStreamtype,which includes DosIdStream,BufldStream,and MemIdStream.
Loadreads the next sequenceof bytes fromthe streamS. Thesebytes musthavebeenwritten by a previouscall to
StringDict.Store.Loadallocates heapspacefor the hash table, then reads a series of strings and data valuesfromthe
stream and adds themto the dictionary.
If Loadfails, the constructorwill returnFalse andan error codewill be stored in the globalvariableInitStatus.
Thestreamregistration procedurefor a StringDictis StringDictStream.
Declaration
procedureStore(varS : idStream);
Purpox
Storea string dictionaryin a stream.
Deecription
S is a properlyinitialized streamobject (a streamfile openedfor writing, usually). Notethat S can be anydescendant
the IdStreamtype, whichincludes DosldStream,BufldStream,and MemIdStream
(although youtypically wouldn’t
write to a Meml-dStream).
Store writes the current size of the hashtable followedby a packedimageof all the strings anddata values for the
dictionary. Thisimagemaybe read by a later call to Load.
Tocheckfor errors that mayhaveocurredduringthe Stere, call the stream’sGetStatusfunctionas shownin the
examplebelow.
Thestreamregistration procedurefor a StringDictis StringDictStream.
Figure 1
40
As starting point, wehave selected two widely
available class libraries, Object Professional (Turbo
PowerSoftware1990)and Smalltalk(Digitalk 1986),
experimentwith. In this way,wecan benefit fromthe
experienceacquiredusing the faceted SARC
system,that
workswith the sameclass libraries. Weare particularly
interestedin the set of termsselectedto characterizeeach
class, and in the low-levelknowledge
obtainedfromthe
deepanalysis of the sourcecode(inheritance andclient
relationshipsbetween
classes).
("string dictionary")is describedin twelvemanualpages,
includingthe descriptionof its thirteen methods.Some
fragmentsof two of its methoddescriptionsare shownin
Fig. 1.
Wehaveanalyzedthe purposedescription of every
methodin this library, andalso somemethodsof other
different libraries, and wecan concludethat the syntax
and the vocabularyused in these descriptions is very
simple.Byperforminga domainanalysis, wecan identify
a set of basic functions and objects that allowsus to
expressthe functionalities of the method.Atthe moment,
weare characterizing each methodby a MOP
structure
that will be stored in the DynamicKnowledgeBase
(DKB)that we used in the PROSA
system. Following
with the previous example, we will have two MOPs
characterizingthe twomethodsof Fig.1.
Theclass libraries wehaveselected are relatively
small. Theycontain aroundone hundredclasses andone
thousandmethods.Theassociate documentation
contains
approximately
twothousandpagesof text, mostof which
are natural languagedescriptions with someexamplesof
use at the sourcecode-level.
Theuser will describehis or her needsby givinga
free-text generaldescriptionof the class or of the methods
he or she requires. Theretrievingprocessconstitutes an
iterative process of queryingthe systemto retrieve
adequatecomponents.This process can be implemented
as a formulate-retrieve-reformulate
cycle (Devanbu
et al.
1991).If the answerto an initial queryis unsatisfactory,
the user can reformulate the query by using partial
descriptionsof the retrieved classes, or browseamongst
functionally related classes. Anotherapproachwouldbe
basedona interactiveprocesswhere,after the first query,
the systeminterrogates the user to disambiguatethe
meaningof the query. Weconsider that the browsing
approachis the mostadequateapproachfor our initial
prototype.
Thekey points of the systemwill be the indexing
andrelrieving methods.
I-M -LOAD.#12 isA M-LOAD
selector :
implementor :
Load
l-M-StringDict
code: I.M-LOAD-IMPLEMENTATION.#15
origin :
destination :
I-M-Stream
I-M-StringDict
I-M-STORE.#5 iaA M-STORE
selector :
implementor :
code :
origin :
Store
I-M-StringDict
I-M~TORE-IMPLEMENTATION.#8
l.M~StringDict
d~tination
: I.M-Stream
EverymethodMOP
includes three slots specifying
the ownerclass of that method,implementor,the actual
implementation
of the method,code, andthe identifier of
the method,selector. The MOP
also includes the slots
correspondingto the specific action being described
(origin anddestinationin this case).
Indexing methods
Wewill assignan indexnot onlyto the classes but also to
every methodof each class. Eachmethodwill be linked
to the ownerclass andeachclass will be linked to every
oneof its methods.Thiswill allowthe accessbyclass and
the access by method.
These two MOPS
will be linked in the DKBunder
the corresponding action type, LOAD
or STORE.
Both
actions are placed under the basic action ASSIGN,
becausetheyare instancesof it.
Weintend to use a hybrid approach (knowledge
basedandautomaticfree-text indexing)to characterize
the software components.Wedescribe this approachin
the context of the ObjectProfessionallibrary. Aclass
descriptionmayrangefrom3 to 55 text pages.Amethod,
however,
is usuallydescribedin less thanonepageandits
purposeis describedwith only oneline. This figures are
not mantained
in other libraries, but the proportionis more
or less conserved.For example,the object StringDict
This processwill producea classification of the
methods
in the library. Amethod
will usuallybe classified
under only one action, but whena methodimplements
severalactionsit will be placedunderall of them.
MOPscorresponding to functionally related
methodswill be close in the DKB.
For example,there are
manyother "store methods"in the library. Oneof them
41
corresponds
to the class SingleList and its associate MOP
is:
I-M-STORE.#7isA M-STORE
selector : Store
implementor: I-M-SingleList
code : I-M-STORE-IMPLEMENTATION.#12
origin : I-M-SingleList
affinity basedindices of every class in the library. These
indices will be usedin the first step of the analysis of the
user’s query. Furthermore,we will create for each class a
MOPcontaining information about its structural
description, its relationships with other classes
(inheritance, clien0 and the links to all its methods.The
structural description definesa set of values for the class,
i.e. a list of other classes neededto construct the current
class. For example,
destination : I-M.Stream
This MOPwill be placed under the same M-STORE
MOPthat the MOPrepresenting the previous "store
method". These methods, however,were not close in the
class hierarchy becausethey belongto different classes.
StringDict {Dictionary of strings}
object(Root)
sdSize : Word; {Maximum elements in hash pool}
sdUsed : Word; {Current elements in hash pool}
sdPool : HashPoolPtr;
sdStatus : Word;
{Pointer to hash pool}
{Status variable}
The slructural description and the relationships with
other classes will be obtained from the source code using
the low-level
tool of the SARC system. MOPs
correspondingto classes are stored in the DKB
under the
MOPOBJECTat the root level of the memory. A
fragment of the resulting memoryorganization is shown
in Fig. 2.
Anabstraction MOPwill be generated from the two
MOPsbecause they have the same filler for the
destinationslot:
M-STORE-TO-STREAM
isA M.STORE
destination : I.M-Stream
Aclass description, as previouslystated, can extend
to several text pages. Therefore, it would be a very
complextask to express all this information as MOPs.We
think this is the right place to use statistical indexing
methods. So, we will use ARGOS
to obtain the lexical
Retrieving method
The analysis of the user’s query will be madein two
steps:
Figure 2
42
Automaticindexing methodshave proved to be
succesfulwhenapplied to softwarecomponents.
Wedon’t
haveanyresults that predict better results in component
retrieval by addingknowledgebasedmethods.
1. Thequery is indexedby the samestatistical
methodusedto indexthe classes. Theprofile of lexical
affinities detectedin the queryallowsthe accessto the
classes with similar profiles. MOPs
correspondingto
these classes andall their methods
will be activated.
2. The query is analyzed again by an
expectation-based parser (Dyer 1983) (Martin 1989)
which uses the MOPsactivated in the step 1 as
expectations.Whenthe parser recognizesthe methodsor
the structural descriptionof a class, this class will be
displayedas an answerto the query.Acompletematching
between
the queryandthe selectedclass is not necessary,
becausethe class requiredby the user maynot exist in the
library. In this case,the user wantsto obtaina functionally
closeclass to adaptit. If the parserhas recognized
several
classes, the class with morerecognizedmethodswill be
displayed. If several classes have the samenumberof
recognizedmethods,the rate betweenthe functionality
requiredby the user andthe global functionality of the
class will be used to select the candidate class. For
example,
if the querywas"a dictionaryto be usedto detect
whethera particularwordis in the dictionaryor not", two
classes wouldbe selected, SlringDictandStringSet.Both
implementthe two methodsrequired, but the first one
implementsanother eleven methodsand the secondone
doesn’thaveanyothers. Sothe S tringSetwill be displayed
becauseit fits better the user requirements.
Thisparser has to be completed
witha browser.This
browserwill allow to explorethe classes functionally
related to the class selected by the system.Thebrowser
will also allowto exploresimilar methodsbelongingto
differentclasses.
Thecurrentstate of the prototypeonly supportsthe
indexingmethods.Weare undertakingthe designof the
parser and specifyingthe browserfeatures. Weare also
recodifying someparts of the prototype becausethe
previous systems were implemented in different
languagesand machines.
Anywaywe believe that CBRapproaches can be
usefulwhenteaching(or aiding)the user to reusesoftware
components.If someof the componentsin the library
comefrompast designs, the experienceacquiredin the
designingof sucha systemcan be of use to novices.Even
experienceddesigners, can save time and effort using
reusable design templatesand exploringrationale from
previousdesigns,insteadof findingtheir waythroughtrial
anderror prototyping.
Asweundertook
not onlythe retrievingtask but also
the educationaltask, weneedto represent the user’s
mentalmodelof the domain,and also be able to reason
about the functionality of the components.
Therefore,in
order to develop an environment which helps
programmerslearn about the library’s structure and
contents, wehavechosenCBRas the primaryapproach.
Weexpect that this approach will produce, as a
side-effect, a better performancein the retrieving of
softwarecomponents.
Evenin this case, the information
provided by IR tools wouldbe useful since it can be
derivedautomatically.
6. References
Arango,G., Baxter, I. & Freeman,P., 1988.AFramework
for IncrementalProgressin the Applicationof Artificial
Intelligence to Software Engineering, ACM
Software
Eng.Notes,vol. 13, no. 1,46-50.
Biggerstaff, T.J. & Richter, C., 1987. Reusability
Framework,
Assessment,and Directions, IEEESoftware,
vol.4, 2.
Theprototype is being implemented
in BIMProlog
underSunOS4.1 in a SUNSparc workstation.
5. Conclusions
Wehavepresenteda hybridapproachto the indexingand
retrieving problem for software components.This
approachconsists of using CBRtechniquesto represent
knowledge
about classes and methods,but simultaneously
usingstatistical indexingmethods
to facilitate the access
to classes.
43
Buenaga,M., Fernandez,B., Vaquero,A. & Urech, A.,
1992.An OnLine Intelligent Assistant for Unix, Tech.
Report DIA92/16. Departmentof ComputerScience,
Complutense
University of Madrid.
Cervig6n,C., Hernfindez,L., 1992. Manualde uso del
Sistemade Ayudaparala Recuperacitnde Clases. Tech.
Report DIA92/11. Departmentof ComputerScience,
Complutense
University of Madrid.
Chidamber,S.R. & Kemerer, C.F., 1991. "Towardsa
Metrics Suite for Object-OrientedDesign". OOPSLA-91.
Curtis, B., 1989."Cognitiveissues in reusing software
artifacts", SoftwareReusability,volumeH, Applications
and Experience, (Biggerstaff, T.J. &Perlis, A.J., eds.),
ACMPress, Addison-Wesley Publishing Company.
Martin, C., 1989. "Case-based parsing". Inside
Cased-Based Reasoning. Riesbeck, C.K. & Schank,
R.C.(eds.) 1989. Hillsdale, N.J. Lawrence Erlbaum
Associates.
Devanbu,P., Ballard, B.W., Brachman,RJ. &Selfridge,
P.G., 1991. "LASSIE: A Knowledge-Based Software
Information System", Automating Software Design
(Lowry, M.R. & McCarlney, R.D., eds.), AAAIPress/
The MITPress.
Digitalk, 1986. SmalltalklV. Tutorial and Programming
Handbook.
M6ndiz,
I.,
Fern~Indez-Chamizo,
C. &
Fermindez-Valmayor, A., 1992. "Program Synthesis
Using Case Based Reasoning", 12 IFIP Congress, Poster
Session: Software Development and Maintenance,
Madrid, Sept. 1992.
Meyer, B., 1987. Reusability:
The Case for
Object-OrientedDesign. IEEESoftware, vol.4, 2.
Dyer, M., 1983. In Depth Understanding. MITPress.
Embley, D.W. & Woodfield, S.N., 1987. "A Knowledge
Structure for ReusingAbstract Data Types", Proceedings
of the Ninth International Conference on Software
Engineering, ACM,360-368.
Fern~dez-Valmayor,A.&Fermtndez Chamizo, C., 1991.
"An Educational Application of MemoryOrganization
Models", AdvancedResearch on Computersin Education
(R. Lewis&S. Otsuki, eds.). Elsevier Science Publishers
B.V., IFIP.
Prieto-D[az, R., Freeman,P., 1987. Classifying Software
for Reusability. IEEESoftware, January.
Prieto-Dfaz,
R., 1991. Implementing Faceted
Classification for Software Reuse. Communication
s of the
~
ACM,vol. 34, n 5.
Rich, C. &Waters, R.C., 1988. Automatic Programming
:
Myths and Prospects, Computer, August 1988, 40-51.
Fermtndez-Valmayor,A.& FermtndezChamizo, C., 1992.
Educational and Research Utilization of a Dynamic
KnowledgeBase, Computers &Education, 18, No. 1-3,
51-61.
Frakes, W. B. & Nejmeh, B. A., 1987. "Software Reuse
through Information Retrieval". Proceedingsof the 20th
Annual HICSS, Kona, HI.
Riesbeck, C.K. & Schank, R.C.,
Cased-Based Reasoning. Hillsdale,
ErlbaumAssociates.
1989. Inside
N.J. Lawrence
Salton, G., 1986. Another look at Automatic TextRetrieval Systems, Communicationsof the ACM,vol.29,
.
Helm,R., Maarek, Y.S., 1991. "Integrating Information
Retrieval and DomainSpecific Approachesfor Browsing
and Retrieval in Object-Oriented Class Libraries".
OOPSLA-91.
Schank, R.C., 1972. Conceptual Dependency: A Theory
of Natural Language Understanding,
Cognitive
Psychology, 552-631.
Schank, R.C., 1980. Language and Memory,Cognitive
Science, 4, 243-284.
Steier, D., 1991. "AutomatingAlgorithmDesign within a
General Architecture for Intelligence", Automating
Software Design (Lowry, M.R. &McCartney,R.D., eds.),
AAAIPress/The MIT Press.
Henninger,S., 1991. "Retrieving Software Objects in an
Example-Based
Programming Environment".
Proceedings of the Fourteenth Annual International
ACM/SIGIRConference on Research and Development
in InformationRetrieval.
Krueger, C.W., 1992. Software Reuse. ACMComputing
Surveys,vol. 24, 2.
Maarek, Y.S. et al., 1991. An Information Retrieval
Approach for Automatically Constructing Software
Libraries. IEEETransactions on Software Engineering,
vol. 17, 8.
44
Turbo PowerSoftware, 1990. Object Professional User’s
Manual.
Wood, M. & SommerviUe, I., 1988. An Information
Retrieval System for Software Components, ACMSIGIR,
vol. 22, 3,4.