Feature mining in software product lines Trial lecture – April 15 2010

advertisement
Trial lecture – April 15th 2010
Jon Oldevik
UiO / SINTEF
jonold at ifi.uio.no
Feature mining in
software product lines
1
Outline
The context: Software product lines
An overview of feature mining
Related concepts
Techniques for feature mining
Tools for feature mining
Summary






2
What is a Software Product Line?

A software product line is:
a set of software intensive systems (products) that:




share a common, managed feature set
satisfying a particular market segment’s specific needs or mission
are developed from a common set of core assets in a prescribed way

It involves strategic (planned) reuse

Both business and technical perspective
3
L. Northrop, SEI’s Software Product Line Tenets, IEEE SW
Software Product Lines and Features

Features represent functional, quality, or design
characteristics of the products

A software product line is commonly described by
features


Common features
Variable features
“audio playback”
“video recording”
Products are derived in a product resolution process



4
By composing common features and selected variable features
Configuration process
Example of a feature model for the iPod
family
IPod
FM Radio
File
system
Live
Pause
Audio
playback
Shake to
shuffle
Voiceover
iTunes
Synch
Video
playback
Pedometer
Web
Browsing
Video
recording
<1..1>
Flexible
(FAT32 v HFS+)
The product line
feature model
Locked
(FAT32)
IPod
Shuffle
File
system
Locked
(FAT32)
5
Audio
playback
Voiceover
iTunes
Synch
Product
configuration
Feature mining –What is it?
Features represent high level characteristics of systems.
Feature mining is:
The process of extracting information
About features
Feature relationships
From existing SPLs or
systems
Commonalities and
variabilities
Relationships to
implementation
With the purpose of understanding, maintaining, evolving, reusing
6
Existing systems implement features. It
is not always easy to see what parts of
a system that implement which features.
Features
Existing system / Legacy system
7
System
components
Several products may have common – or
similar – implementation parts.
Common features
Specific features
Product 1
8
Product 2
Product 3
Why care about feature mining?
9
Feature-oriented re-engineering of legacy
systems to product line assets (an example)
New market needs
Feature-oriented
product line engineering
Reverse engineering
Refined Feature
model
Feature model
Legacy Architecture
Legacy applications
Home service
robots
10
Reengineering
Product Line
Architecture
Product Line
Components
Feature-oriented Re-engineering of Legacy Systems into Product Line Assets – a Case Study, Kang et. Al, SPLC 05
Feature mining overlaps with several other
concepts
Terms for the same concept
Feature location
Feature identification
Feature extraction
Feature refactoring
Other, overlapping concepts
11
Asset mining
Concern mining
Aspect mining
Program comprehension
Data mining
Information Retrieval
Program comprehension
12
Program comprehension
The domain of computing science dealing
with the processes used by software
engineers and code analysis to understand
programs (Wikipedia)
The concept assignment
problem
Program comprehension
is supported in many
different ways by tools
Rigi – a toolset for architecture re-construction and analysis
13
Program understanding and the concept assignment problem, Biggerstaff et.al., ACM COMM,1994
Data mining and
Information Retrieval
14
Data mining

Data mining is the process of extracting patterns from
data.



Gaining information and knowledge
Typically done on prepared data sets, which are mined, and
validated.
Applications within profiling practices


In software engineering, data mining techniques are applied



15
Marketing, fraud detection, surveillance, scientific discovery (med,
biomed, education)
On version histories, e.g. to find change patterns
On source code to facilitate program understanding
It could also be used to extract information on features in existing
systems
Information Retrieval
the science of searching for documents,
information within documents, and metadata
about documents
Document content & structure analysis
Classification, grouping and segmentation
visualization
Indexing, search, and relevance ranking
personalized interaction & collaboration
16
Documentation to Code Traceability
Information retrieval: documentation to
code traceability (an example)
Doc
Traceability
Recovery
Code
Pre-processing
Structured representation of
terms from documentation and code (corpus)
<d1,d2,d3,d4,d5>
<c1, c2, c3,c4,c5,c6>
The structured data is analysed :
Latent Semantic Indexing
Traceability links are established
Trace links
Provides quality measures for the
correctness of traceability
Similar techniques have been used
For mining features
17
Recovering documentation-to-source-code traceability links using latent semantic indexing, Marcus and Maletic, ICSE 2003
SNIAFL:Towards a Static Noninteractive Approach to Feature Location, Zhao et.al,TOSEM 2006
Asset mining
18
Mining existing assets in software
product line context

Locating useful information from an asset base for reuse

Candidates for mining


Mining is done to rehabilitate parts of an old system for
reuse


program code, designs, system architectures, specifications, algorithms
…
Purpose: Architecture reconstruction of legacy systems –
Including features (commonalities and variabilities)
Methods exist that address this process

E.g. the Option Analysis for Reengineering (OAR) method
OAR process
19
Clements, Northrop, Software Product Lines, Practices and Patterns, Addison Wesley
Options Analysis for Reengineering (OAR): A Method for Mining Legacy Assets, Berget et.al. 2001, SEI
Aspect and concern
mining
20
Aspect and concern mining

Discovering the crosscutting
concerns that potentially
could be turned into aspects.

Example of potentially
crosscutting concerns:

Transaction code, error handling, ..

This can be used for migrating
toward an aspect-oriented
paradigm

Or gaining knowledge  on
how concerns (=features) are
scattered in the code base
21
An example tool: the Aspect Browser
Techniques for feature
mining
22
Feature mining requires analysis of the
system assets
Data Extraction
Parsing source
Code, executing
system...
23
Information
Representation
• Abstract syntax tree
• Symbol tables
• Dependency graphs
Knowledge
Exploration
Interpret data:
- Infer features ...
The main strategies for mining features
Interactive approaches
Dynamic analysis
Static analysis
24
There are several approaches to analysing
data in source code
Control flow graph analysis
Analysing patterns of execution traces
Dependency graph analysis
Clone detection (code duplication)
Fan-in analysis
Information retrieval
Program slicing
Formal concept analysis (FCA)
(E.g. applied on execution traces or
class and method names)
25
Natural language processing
(of source code)
Locating features by dynamic analysis
(example)
Test case /
Requirement /
Feature
Play audio
play
The system
Execution traces
Feature
Trace(s)
Play
open, read, playback, visualise
Stop
stop, stop-visualiser, close
Pause
pause, pause-visualiser
Forward
pause, forward, playback
Next
Stop, find-next, open,
read, playback, visualise
Execution profile
26
open
read
playback
visualise
Play Stop Pause Forward Next
open
read
Concept analysis playback
visualise
stop
stop-vis
close
pause
pause-vis
forward
find-next
Locating Features in Source Code, Eisenbarth et.al., IEEE Trans of Software Eng. 2002
Mining features based on pragmas in source
code
Pragmas have been a common way of representing
conditional compilation – or program variability
#ifdef COLOR_DISPLAY
#else
#endif
Optional Feature
COLOR_DISPLAY
27
#ifdef FILESYS_FAT32
#elseif FILESYS_HFS
#endif
Alternative Features
FAT32 and HFS
A Case Study in Refactoring a Legacy Component for Reuse in a Product Line, Kolb et.al., ICSM 05
Clone detection for mining features
Clone detection is the systematic identification
of code clones in a code base.
Can help / guide migration toward a product line
By highlighting identical and near-identical code fragments
These can be used for establishing specifications of variable
and common features
and for refactoring the code for use in a product line
Clone detection is supported by a range of analysis tools
CloneTracker, ConQat, Clone Doctor, Clone Digger, ….
28
Extending the Reflexion Method for Consolidating Software Variants into Product Lines, Frenzel et.al.WCRE 2007
Example of feature mining from non-code
artifacts  logical models
Logical Formulas
(IPod → Audio) and
(IPod → Filesys) and
(Filesys → Flexible or Locked)
Binary Decision Diagrams
Implication Hypergraph
IPod
Feature Diagrams
Audio
Filesys
Flexible
29
Locked
Feature Diagrams and Logics: There and Back Again, Czarnecki,Wasowski, SPLC 2007
Some tool examples
30
CIDE – Colored Integrated Development
Environment – An interactive approach
software product line tool for
software product line
development
especially for analysing and
decomposing legacy code
The user defines the
features in a feature
model
31
http://wwwiti.cs.uni-magdeburg.de/iti_db/research/cide/
CIDE [2]
Lets user annotate
source code with
feature information
In a disciplined
manner – using the
underlying AST
Uses colours to
visualise features
Feature and variant
generation
Feature and variant
views
32
Bauhaus – Software architecture,
reengineering, understanding
Code quality metrics
Code duplicates
Mature support for C, C++
Research: Java, ADA, COBOL
Architecture
reconstruction
http://www.bauhaus-stuttgart.de/bauhaus/index-english.html
33
List of some tools for mining
Feature mining / discovery
CIDE
FEAT
Bauhaus
…
Generic code query tools
JQuery
CodeQuest
....
Information retrieval
Apache lucene
Google Analytics…
Any web search engine
(looong list....)
Aspect mining
Aspect Browser
Aspect mining tool
Dynamo
FINT (Fan-In Tool)
EA-Miner
…
Code Analysis & Metrics
(code comprehension)
Rigi
ConQat
Understand
...
34
Summary

We have seen


an overview of feature mining and supporting techniques
We addressed feature mining by




interactive, dynamic, and static analysis
We saw concrete examples of different approaches
some tool examples
Some interesting areas we haven’t covered

Details on methods for architecture recovery and asset mining


Mining features from


35
framework code
Early aspect mining


Such as the Option Analysis for Reengineering (OAR) method
Mining concerns from requirements documents
Many tools
36
List of references for further reading














L. Northrop, SEI’s Software Product Line Tenets, IEEE SW
Feature-oriented Re-engineering of Legacy Systems into Product Line Assets – a Case Study, Kang et. Al, SPLC
05
Program understanding and the concept assignment problem, Biggerstaff et.al., ACM COMM,1994
Recovering documentation-to-source-code traceability links using latent semantic indexing, Marcus and
Maletic, ICSE 2003
SNIAFL: Towards a Static Noninteractive Approach to Feature Location, Zhao et.al, TOSEM 2006
Clements, Northrop, Software Product Lines, Practices and Patterns, Addison Wesley
Locating Features in Source Code, Eisenbarth et.al., IEEE Trans of Software Eng. 2002
A Case Study in Refactoring a Legacy Component for Reuse in a Product Line, Kolb et.al., ICSM 05
Extending the Reflexion Method for Consolidating Software Variants into Product Lines, Frenzel et.al. WCRE
2007
Feature Diagrams and Logics: There and Back Again, Czarnecki, Wasowski, SPLC 2007
Options Analysis for Reengineering (OAR): A Method for Mining Legacy Assets, Berget et.al. 2001, SEI
Representing Concerns in Source Code, Robillard and Murphy, ACM TSEM
Visualizing Software Product Line Variabilities in Source Code, Kästner et.al.
…
37
Processes - Methods
focusing on asset mining
(for software product lines)
38
Option Analysis for Reengineering (OAR)
Comes from one of the leading authorities on product
lines


Software Engineering Institute (SEI) at Carnegie Mellon

Method for evaluating feasibility and economy of mining
existing components for a product line

OAR provides


39
a set of mining options
estimates of the cost, effort, and risks associated with those
options.
PuLSE and ADORE

Defined by Fraunhofer IESE

Another leading authority on product lines

PuLSE - a general method for software product line
development

ADORE


Architecture- and Domain-Oriented Reengineering
Framework for integration of reengineering of legacy
systems and transitioning to product line
40
41
IPod
Audio
Audio
Ipod -> Audio and
Ipod -> filesys
Filesys -> locked xor fixed
Filesys
Fixed
0
Locked Fixed Filesys Audio Ipod
0
0
0
0
0
1
0
1
0
0
0
1
1
0
0
1
0
1
1
1
0
1
1
1
1
1
1
0
0
0
1
1
0
0
0
0
Locked
Locked
0
0
1
Locked
1
Fixed
0
Fixed
1
1
Lo
IPod
Audio
Audio
1
Filesys
0
1
Fixed
0
1
Fixed
0
1
Locked Locked
0
1
Download