Trial lecture – April 15th 2010 Jon Oldevik UiO / SINTEF jonold at ifi.uio.no Feature mining in software product lines 1 Outline The context: Software product lines An overview of feature mining Related concepts Techniques for feature mining Tools for feature mining Summary 2 What is a Software Product Line? A software product line is: a set of software intensive systems (products) that: share a common, managed feature set satisfying a particular market segment’s specific needs or mission are developed from a common set of core assets in a prescribed way It involves strategic (planned) reuse Both business and technical perspective 3 L. Northrop, SEI’s Software Product Line Tenets, IEEE SW Software Product Lines and Features Features represent functional, quality, or design characteristics of the products A software product line is commonly described by features Common features Variable features “audio playback” “video recording” Products are derived in a product resolution process 4 By composing common features and selected variable features Configuration process Example of a feature model for the iPod family IPod FM Radio File system Live Pause Audio playback Shake to shuffle Voiceover iTunes Synch Video playback Pedometer Web Browsing Video recording <1..1> Flexible (FAT32 v HFS+) The product line feature model Locked (FAT32) IPod Shuffle File system Locked (FAT32) 5 Audio playback Voiceover iTunes Synch Product configuration Feature mining –What is it? Features represent high level characteristics of systems. Feature mining is: The process of extracting information About features Feature relationships From existing SPLs or systems Commonalities and variabilities Relationships to implementation With the purpose of understanding, maintaining, evolving, reusing 6 Existing systems implement features. It is not always easy to see what parts of a system that implement which features. Features Existing system / Legacy system 7 System components Several products may have common – or similar – implementation parts. Common features Specific features Product 1 8 Product 2 Product 3 Why care about feature mining? 9 Feature-oriented re-engineering of legacy systems to product line assets (an example) New market needs Feature-oriented product line engineering Reverse engineering Refined Feature model Feature model Legacy Architecture Legacy applications Home service robots 10 Reengineering Product Line Architecture Product Line Components Feature-oriented Re-engineering of Legacy Systems into Product Line Assets – a Case Study, Kang et. Al, SPLC 05 Feature mining overlaps with several other concepts Terms for the same concept Feature location Feature identification Feature extraction Feature refactoring Other, overlapping concepts 11 Asset mining Concern mining Aspect mining Program comprehension Data mining Information Retrieval Program comprehension 12 Program comprehension The domain of computing science dealing with the processes used by software engineers and code analysis to understand programs (Wikipedia) The concept assignment problem Program comprehension is supported in many different ways by tools Rigi – a toolset for architecture re-construction and analysis 13 Program understanding and the concept assignment problem, Biggerstaff et.al., ACM COMM,1994 Data mining and Information Retrieval 14 Data mining Data mining is the process of extracting patterns from data. Gaining information and knowledge Typically done on prepared data sets, which are mined, and validated. Applications within profiling practices In software engineering, data mining techniques are applied 15 Marketing, fraud detection, surveillance, scientific discovery (med, biomed, education) On version histories, e.g. to find change patterns On source code to facilitate program understanding It could also be used to extract information on features in existing systems Information Retrieval the science of searching for documents, information within documents, and metadata about documents Document content & structure analysis Classification, grouping and segmentation visualization Indexing, search, and relevance ranking personalized interaction & collaboration 16 Documentation to Code Traceability Information retrieval: documentation to code traceability (an example) Doc Traceability Recovery Code Pre-processing Structured representation of terms from documentation and code (corpus) <d1,d2,d3,d4,d5> <c1, c2, c3,c4,c5,c6> The structured data is analysed : Latent Semantic Indexing Traceability links are established Trace links Provides quality measures for the correctness of traceability Similar techniques have been used For mining features 17 Recovering documentation-to-source-code traceability links using latent semantic indexing, Marcus and Maletic, ICSE 2003 SNIAFL:Towards a Static Noninteractive Approach to Feature Location, Zhao et.al,TOSEM 2006 Asset mining 18 Mining existing assets in software product line context Locating useful information from an asset base for reuse Candidates for mining Mining is done to rehabilitate parts of an old system for reuse program code, designs, system architectures, specifications, algorithms … Purpose: Architecture reconstruction of legacy systems – Including features (commonalities and variabilities) Methods exist that address this process E.g. the Option Analysis for Reengineering (OAR) method OAR process 19 Clements, Northrop, Software Product Lines, Practices and Patterns, Addison Wesley Options Analysis for Reengineering (OAR): A Method for Mining Legacy Assets, Berget et.al. 2001, SEI Aspect and concern mining 20 Aspect and concern mining Discovering the crosscutting concerns that potentially could be turned into aspects. Example of potentially crosscutting concerns: Transaction code, error handling, .. This can be used for migrating toward an aspect-oriented paradigm Or gaining knowledge on how concerns (=features) are scattered in the code base 21 An example tool: the Aspect Browser Techniques for feature mining 22 Feature mining requires analysis of the system assets Data Extraction Parsing source Code, executing system... 23 Information Representation • Abstract syntax tree • Symbol tables • Dependency graphs Knowledge Exploration Interpret data: - Infer features ... The main strategies for mining features Interactive approaches Dynamic analysis Static analysis 24 There are several approaches to analysing data in source code Control flow graph analysis Analysing patterns of execution traces Dependency graph analysis Clone detection (code duplication) Fan-in analysis Information retrieval Program slicing Formal concept analysis (FCA) (E.g. applied on execution traces or class and method names) 25 Natural language processing (of source code) Locating features by dynamic analysis (example) Test case / Requirement / Feature Play audio play The system Execution traces Feature Trace(s) Play open, read, playback, visualise Stop stop, stop-visualiser, close Pause pause, pause-visualiser Forward pause, forward, playback Next Stop, find-next, open, read, playback, visualise Execution profile 26 open read playback visualise Play Stop Pause Forward Next open read Concept analysis playback visualise stop stop-vis close pause pause-vis forward find-next Locating Features in Source Code, Eisenbarth et.al., IEEE Trans of Software Eng. 2002 Mining features based on pragmas in source code Pragmas have been a common way of representing conditional compilation – or program variability #ifdef COLOR_DISPLAY #else #endif Optional Feature COLOR_DISPLAY 27 #ifdef FILESYS_FAT32 #elseif FILESYS_HFS #endif Alternative Features FAT32 and HFS A Case Study in Refactoring a Legacy Component for Reuse in a Product Line, Kolb et.al., ICSM 05 Clone detection for mining features Clone detection is the systematic identification of code clones in a code base. Can help / guide migration toward a product line By highlighting identical and near-identical code fragments These can be used for establishing specifications of variable and common features and for refactoring the code for use in a product line Clone detection is supported by a range of analysis tools CloneTracker, ConQat, Clone Doctor, Clone Digger, …. 28 Extending the Reflexion Method for Consolidating Software Variants into Product Lines, Frenzel et.al.WCRE 2007 Example of feature mining from non-code artifacts logical models Logical Formulas (IPod → Audio) and (IPod → Filesys) and (Filesys → Flexible or Locked) Binary Decision Diagrams Implication Hypergraph IPod Feature Diagrams Audio Filesys Flexible 29 Locked Feature Diagrams and Logics: There and Back Again, Czarnecki,Wasowski, SPLC 2007 Some tool examples 30 CIDE – Colored Integrated Development Environment – An interactive approach software product line tool for software product line development especially for analysing and decomposing legacy code The user defines the features in a feature model 31 http://wwwiti.cs.uni-magdeburg.de/iti_db/research/cide/ CIDE [2] Lets user annotate source code with feature information In a disciplined manner – using the underlying AST Uses colours to visualise features Feature and variant generation Feature and variant views 32 Bauhaus – Software architecture, reengineering, understanding Code quality metrics Code duplicates Mature support for C, C++ Research: Java, ADA, COBOL Architecture reconstruction http://www.bauhaus-stuttgart.de/bauhaus/index-english.html 33 List of some tools for mining Feature mining / discovery CIDE FEAT Bauhaus … Generic code query tools JQuery CodeQuest .... Information retrieval Apache lucene Google Analytics… Any web search engine (looong list....) Aspect mining Aspect Browser Aspect mining tool Dynamo FINT (Fan-In Tool) EA-Miner … Code Analysis & Metrics (code comprehension) Rigi ConQat Understand ... 34 Summary We have seen an overview of feature mining and supporting techniques We addressed feature mining by interactive, dynamic, and static analysis We saw concrete examples of different approaches some tool examples Some interesting areas we haven’t covered Details on methods for architecture recovery and asset mining Mining features from 35 framework code Early aspect mining Such as the Option Analysis for Reengineering (OAR) method Mining concerns from requirements documents Many tools 36 List of references for further reading L. Northrop, SEI’s Software Product Line Tenets, IEEE SW Feature-oriented Re-engineering of Legacy Systems into Product Line Assets – a Case Study, Kang et. Al, SPLC 05 Program understanding and the concept assignment problem, Biggerstaff et.al., ACM COMM,1994 Recovering documentation-to-source-code traceability links using latent semantic indexing, Marcus and Maletic, ICSE 2003 SNIAFL: Towards a Static Noninteractive Approach to Feature Location, Zhao et.al, TOSEM 2006 Clements, Northrop, Software Product Lines, Practices and Patterns, Addison Wesley Locating Features in Source Code, Eisenbarth et.al., IEEE Trans of Software Eng. 2002 A Case Study in Refactoring a Legacy Component for Reuse in a Product Line, Kolb et.al., ICSM 05 Extending the Reflexion Method for Consolidating Software Variants into Product Lines, Frenzel et.al. WCRE 2007 Feature Diagrams and Logics: There and Back Again, Czarnecki, Wasowski, SPLC 2007 Options Analysis for Reengineering (OAR): A Method for Mining Legacy Assets, Berget et.al. 2001, SEI Representing Concerns in Source Code, Robillard and Murphy, ACM TSEM Visualizing Software Product Line Variabilities in Source Code, Kästner et.al. … 37 Processes - Methods focusing on asset mining (for software product lines) 38 Option Analysis for Reengineering (OAR) Comes from one of the leading authorities on product lines Software Engineering Institute (SEI) at Carnegie Mellon Method for evaluating feasibility and economy of mining existing components for a product line OAR provides 39 a set of mining options estimates of the cost, effort, and risks associated with those options. PuLSE and ADORE Defined by Fraunhofer IESE Another leading authority on product lines PuLSE - a general method for software product line development ADORE Architecture- and Domain-Oriented Reengineering Framework for integration of reengineering of legacy systems and transitioning to product line 40 41 IPod Audio Audio Ipod -> Audio and Ipod -> filesys Filesys -> locked xor fixed Filesys Fixed 0 Locked Fixed Filesys Audio Ipod 0 0 0 0 0 1 0 1 0 0 0 1 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0 Locked Locked 0 0 1 Locked 1 Fixed 0 Fixed 1 1 Lo IPod Audio Audio 1 Filesys 0 1 Fixed 0 1 Fixed 0 1 Locked Locked 0 1