Software maintenance and evolution Software re-engineering and reverse engineering Prof. Robertas Damaševičius, robertas.damasevicius@ktu.lt Prof. Vytautas Štuikys Kaunas University of Technology Terminų žodynėlis • Tiesioginė inžinerija – Forward Engineering • Apgrąžos (atvirkštinė) inžinerija – Reverse Engineering • Dvipusė inžinerija (reinžinerija) – Reengineering • Rekonstravimas – Reconstruction, re-factoring • Restruktūrizavimas – Restructuring, remodularization Definitions • Forward engineering - traditional software engineering approach starting with requirements analysis and progressing to implementation of a system • Reverse engineering – system analysis process to: – identify the system's components and their interrelationships and – create representations of the system in another form or at higher levels of abstraction • Reengineering - process of analysis and change whereby a system is modified by first reverse engineering and then forward engineering. • Re-factoring (restructuring) - transformation of a system from one representational form to another Software re-engineering and reverse engineering 3 Forward Engineering and Reengineering* System specification Design and implementation Ne w system Understanding and transformation Re-engineered system Forward engineering Existing software system Software re-engineering *According to Sommerville Software re-engineering and reverse engineering 4 Contents • Re-engineering vs maintenance • Reverse engineering • Re-factoring (re-structuring) Software re-engineering and reverse engineering 5 System Reengineering • Restructuring or rewriting part or all of a system without changing its functionality • Required when some (but not all) subsystems of a larger system require frequent maintenance • Reengineered system may also be restructured and should be re-documented Software re-engineering and reverse engineering 6 Goals of Reengineering • Port to other Platform – when hardware or software support becomes obsolete • Design extraction – to improve maintainability, portability, etc. • Exploitation of New Technology – new language features, standards, libraries, etc. – when tools to support restructuring are readily available Software re-engineering and reverse engineering 7 Reengineering Categories • • • • Automatic restructuring Automatic and semi-automatic transformation Design recovery and reimplementation Code reverse engineering and forward engineering • Data reverse engineering and schema migration • Migration of legacy systems to modern platforms Software re-engineering and reverse engineering 8 Reengineering Techniques • Restructuring – automatic conversion from unstructured to structured code – source code translation — Chikofsky and Cross • Data reengineering – integrating and centralizing multiple databases – unifying multiple, inconsistent representations – upgrading data models — Sommerville, ch 32 • Refactoring – renaming/moving methods/classes etc. Software re-engineering and reverse engineering 9 Life-Cycle of Reengineering (0) requirement analysis Requirements (2) problem detection (3) problem resolution Designs (1) model capture • people centric • lightweight Code (4) program transformation (Acc. To S. Ducasse et al.) Software re-engineering and reverse engineering 10 Generic Reengineering Process • Requirement analysis: analyse on which parts of your requirements have changed • Model capture: reverse engineer from the source-code into a more abstract form, typically some form of a design model • Problem detection: identify design problems in that abstract model • Problem resolution: propose an alternative design that will solve the identified problem • Program transformations: make the necessary changes to the code, so that it adheres to the new design yet preserves all the required functionality Software re-engineering and reverse engineering 11 Reengineering Process* Program documentation Original program Modularised program Original data Reverse engineering Program modularisation Source code translation Data reengineering Program structure improvement Structured program *According to Sommerville Software re-engineering and reverse engineering Reengineered data 12 Software Reengineering Process Model (1) • Software Inventory analysis – sorting active software applications by business criticality, longevity, current maintainability, and other local criteria – helps to identify reengineering candidates • Document restructuring options – update poor documents if they are used – fully rewrite the documentation for critical systems focusing on the "essential minimum" Software re-engineering and reverse engineering 13 Software Reengineering Process Model (2) • Reverse engineering – process of design recovery – analyzing a program in an effort to create a representation of the program at some abstraction level higher than source code • Code restructuring – source code is analyzed and violations of structured programming practices are noted and repaired – revised code needs to be reviewed and tested Software re-engineering and reverse engineering 14 Software Reengineering Process Model (3) • Data restructuring – existing data structures are reviewed for quality • Forward engineering – sometimes called reclamation or renovation – recovers design information from existing source code – uses this design information to reconstitute the existing system to improve its overall quality or performance Software re-engineering and reverse engineering 15 Re-engineering advantages • Reduced risk – There is a high risk in new software development: development problems, staffing problems and specification problems • Reduced cost – Cost of re-engineering is often less than costs of developing new software Software re-engineering and reverse engineering 16 How? Reengineering Manual Reading code Automated Extracting info Presenting info Static analysis Querying Program slicing Browsing Dependency graphs Animation Dynamic analysis Profiling Stepwise execution Contents • Re-engineering vs maintenance • Reverse engineering • Re-factoring (re-structuring) Software re-engineering and reverse engineering 18 Principles of reverse engineering • Reverse engineering: – Systematic process of acquiring important design factors and information regarding engineering aspects from an existing product – A process which analyses a product/technology to find out the design aspects and its functions – A kind of analysis which engages an individual in a process of constructive learning of design and its functionality of systems and products Reverse engineering • Goal: to facilitate change by allowing a software system to be understood in terms of what it does, how it works and its architectural representation. • Objectives: – – – – – – – – – to recover lost information, to facilitate migration between platforms, to improve and/or provide new documentation, to extract reusable components, to reduce maintenance effort, to cope with complexity, to detect side effects, to assist migration to a CASE environment to develop similar or competitive products. Software re-engineering and reverse engineering 20 Goals of Reverse Engineering • Cope with complexity – need techniqes to understand large, complex systems • Recover lost information – extract what changes have been made and why • Detect side effects – help understand ramifications of changes • Synthesize higher abstractions – identify latent abstractions in software • Facilitate reuse – detect candidate reusable artifacts and components — Chikofsky and Cross Software re-engineering and reverse engineering 21 Uses of Reengineering • Recreating the design, making the design decision, and information which is developed by the original design team • Learning the working principle of a system • Justifications of design decisions of the original design team • Redesigning the product to improvise, modify to suit the modern circumstances etc. • Understanding the functionality of product in depth • Evaluating a product to understand its limitations and comparison with other products using simulation technology Role of Reverse Engineering (Reengineering) • Reengineering method plays a significant role in the development of science and technology innovations • Reengineering involves disassembling and disclosing the method in which it works • It often shows an effective way for learning to construct a product/ technology or improvise it Reverse Engineering Concepts (1) • Abstraction level – ideally want to be able to derive design information at the highest level possible • Completeness – level of detail provided at a given abstraction level • Interactivity – degree to which humans are integrated with automated reverse engineering tools Software re-engineering and reverse engineering 24 Reverse Engineering Concepts (2) • Directionality – one-way means the software engineer doing the maintenance activity is given all information extracted from source code – two-way means the information is fed to a reengineering tool that attempts to regenerate the old program • Extract abstractions – meaningful specification of processing performed is derived from old source code Software re-engineering and reverse engineering 25 Reverse Engineering Activities • Understanding process – source code is analyzed to at varying levels of detail – to understand procedural abstractions and overall functionality • Understanding data – internal data structures – database structure • Understanding user interfaces – what are basic actions processed by the interface? – what is system's behavioral response to these actions? Software re-engineering and reverse engineering 26 Reverse Engineering Techniques • Re-documentation – pretty printers – diagram generators – cross-reference listing generators • Design recovery – – – – – software metrics browsers visualization tools static analyzers dynamic (trace) analyzers Software re-engineering and reverse engineering 27 Factors that Motivate the Application of Reverse Engineering Software re-engineering and reverse engineering 28 Summary of Objectives and Benefits of Reverse Engineering Software re-engineering and reverse engineering 29 Benefits of Reverse Engineering for Software Maintenance • Corrective change: – abstraction of unnecessary detail gives greater insight into the parts of the program to be corrected – easier to identify defective program components and the source of residual errors • Adaptive/perfective change: – Eases understanding of system’s components and their interrelationships, showing where new requirements fit and how they relate to existing components – Extracted information whean be used during enhancement of the system or for the development of another product • Preventive change: – brings benefit to future maintenance of a system Software re-engineering and reverse engineering 30 Legality of software re-engineering • Software reengineering is done – to retrieve the source code of a program because the source code was lost, – to study how the program performs certain operations – to improve the performance of a program – to fix a bug (correct an error in the program when the source code is not available) – to identify malicious content in a program such as a virus or to adapt a program written for use with one microprocessor for use with another. • Reengineering for the purpose of copying or duplicating programs may constitute a copyright violation. • In some cases, the licensed use of software specifically prohibits reverse engineering Reengineering for software • Reengineering takes a program and constructs a high level representation for documentation, maintenance or reuse • To accomplish this, most current reengineering techniques begin by analyzing a program’s structure • The structure is determined by lexical, syntactic and semantic rules for legal program constructs • Because we know how to do these kinds of analyses quite well, it is natural to try and apply them to understanding a program Reengineering and program rephrasing • Imagine trying to understand a program in which all identifiers have been systematically replaced by random names and in which all indentation and comments have been removed • Example of transformation by rephrasing is obfuscation • The task of understanding would be difficult if not impossible Obfuscation • Source code obfuscation which is a useful anti-reversing technique for source code • There may exist a requirement to ship the source code of an application so that the machine code can be generated on the end user’s computer • If the source code contains intellectual property that is worth protecting, one can perform transformations to the source code which make it difficult to read, but have no impact on the machine code that would ultimately be generated when the program is compiled. 34 Applications of Reengineering • Reverse-engineering is used for many purposes: – as a learning tool; – as a way to make new, compatible products that are cheaper than what's currently on the market; – for making software interoperate more effectively; or – to bridge data between different operating systems or databases; – and to uncover the undocumented features of commercial products. Reengineering and program understanding • Given that the source code by itself is not sufficient to understand the program • The problem is that programs have a purpose: their job is to compute something • For the computation to be of value, the program must model or approximate some aspect of the real world • To the extent that the model is accurate, the program will succeed in accomplishing its purpose • To the extent that the model is comprehended by the reverse engineer, the process of understanding the program will be eased Contents • Re-engineering vs maintenance • Reverse engineering • Re-factoring (re-structuring) Software re-engineering and reverse engineering 37 Basic Issues in refactoring • • • • 38 What is refactoring? Why should you refactor? When should you refactor? Categories of refactorings What is refactoring? • A refactoring is a software transformation that – preserves the external behaviour of the software – improves the internal structure of the software • Refactoring [Fowler 1999] – [noun] a change (keitimas) made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behaviour – [verb] to restructure software by applying a series of refactorings without changing its (user) observable behavior 39 Why should you refactor? (1) • To improve the software design • To counter code decay (software aging, decay) and increase its life-time • To increase software understandibility (comprehensibility) • To reduce costs of software maintenance 40 Why should you refactor? (2) • … • To find bugs and write more robust code • To reduce testing – automatic refactorings are guaranteed to be behaviourpreserving • To prepare for / facilitate future customisations • To turn an OO application into a framework – introduce design patterns in a behaviourally preserving way 41 When should you refactor? • Only if you see the need for it – Not on a preset periodical basis • Apply the rule of three – When 1st time: implement from scratch – When 2nd time: implement something similar by code duplication – When 3rd time: do not implement similar things again or duplicate, but refactor • Refactor when adding new features – Especially if feature is difficult to integrate with the existing code • Refactor during bug fixing – If a bug is very hard to trace, refactor first to make the code more understandable • Refactor during code reviews 42 Categories of refactorings • Based on granularity – low-level (primitive) vs – high-level (composite) refactorings • Based on programming language – language-specific (e.g. Java, Smalltalk, ...) – language-independent (e.g. [Tichelaar&al 2000]) • Degree of formality – formal (e.g. [Bergstein 1997]) – ad-hoc (nesisteminis, be metodikos) (e.g. [Fowler 1999]) – semi-formal • Degree of automation – fully automated (e.g. [Moore 1996]) – interactive (e.g. Refactoring Browser of [Roberts&al 1997]) – fully manual (e.g. [Fowler 1999]) 43 Refactoring to Understand Problem: How do you decipher cryptic code? Solution: Re-factor it till it makes sense • • • • Goal (for now) is to understand, not to reengineer Work with a copy of the code Refactoring requires an adequate test base Restructuring is a transformation from one form of presentation to another • Refactoring is the object-oriented variant of restructuring • The subjects external behavior is preserved • Idea is to make existing code more extensible Software re-engineering and reverse engineering 44 Program structure improvement • Maintenance tends to corrupt the structure of a program. It becomes harder and harder to understand • The program may be simplified to make them more readable Software re-engineering and reverse engineering 45 Types of Restructuring • Code restructuring – Program transformation – Architecture transformations • Data restructuring – – – – – analysis of source code data redesign data record standardization data name rationalization file or database translation Software re-engineering and reverse engineering 46 Program modularization • Process of re-organising a program so that related program parts are collected together in a single module – Data abstractions: Abstract data types where data structures and associated operations are grouped – Hardware modules: All functions required to interface with a hardware unit – Functional modules: Modules containing functions that carry out closely related tasks – Process support modules: Modules where the functions support a business process or process fragment Software re-engineering and reverse engineering 47 Restructuring Approaches* Automated program restructuring Automated source code conversion Program and data restructuring Automated restructuring with manual changes Restructuring plus architectural changes Increased cost *According to Sommerville Software re-engineering and reverse engineering 48 Refactoring activities • Identify where the software should be re-factored • Determine which re-factoring(s) should be applied to the identified places • Guarantee that the applied refactoring preserves behaviour • Apply the refactoring • Assess the effect of the refactoring on quality characteristics of the software • Maintain the consistency between the re-factored program code and other software artefacts Software re-engineering and reverse engineering 49 Types of software artifacts • Refactoring program source code • Refactoring of models, independent of the underlying programming language – Refactoring of UML diagrams • Restructuring software architectures • Restructuring of software requirements Software re-engineering and reverse engineering 50 Qualities of refactoring tools • Reliability: check if the provided re-factorings are truly behavior preserving • Coverage: number of refactoring activities supported by the tool • Configurability: modifying refactoring pattern specifications • Scalability: combine frequently used primitive re-factorings into composite re-factorings • Language independence: applicability to different languages Software re-engineering and reverse engineering 51 Program Translation Process* System to be re-engineered Identify source code differences *According to Sommerville Design translator instructions System to be re-engineered Re-engineered system Automatically transla te code Manually transla te code Software re-engineering and reverse engineering 52 Source Code Translation • Converting the code from one language (or language version) to another • May be necessary because of – hardware platform updates – staff skill shortages – organizational policy changes • Cost and time effective only if an automatic translator is available • Manual fine tuning of new code is always required Software re-engineering and reverse engineering 53 Restructuring problems • Problems with re-structuring are: – Loss of comments – Loss of documentation – Heavy computational demands • Restructuring doesn’t help with poor modularisation where related components are dispersed throughout the code. • The understandability of data-driven programs may not be improved by re-structuring. Software re-engineering and reverse engineering 54 Restructuring Benefits • Improved program and documentation quality • Makes programs easier to learn – improves productivity – reduces developer frustration • Reduces effort required to maintain software • Software is easier to test and debug Software re-engineering and reverse engineering 55 Summary: acquired knowledge • Objective of re-engineering is to improve the system structure to make it easier to understand and maintain • Re-engineering involves source code translation, reverse engineering, program structure improvement and data reengineering • Reverse engineering is the process of deriving the system design (documents, models, e.g., control graph and specification) from its source code • Program structuring involves reorganization to group related items • Data re-engineering may be necessary because of inconsistent data management Software re-engineering and reverse engineering 56 Conclusions • Reverse engineering is a general transformation approach to transform various products and artefacts from lowerlevel representation to higher-level representation • Directly related to learning and understanding of facts, products, processes and artefacts • Process of new knowledge discovery can be also conceived of as a re-engineering process or extending previous knowledge • Reengineering is the way for obtaining know-how or knowledge, which can be used for many purposes, including better software maintenance References • I. Somerville, Software Engineering. New York, NY, Addison-Wesley, 1995. • R. Arnold, Software Reengineering. IEEE Computer Society Press, 1993. • J.H. Cross, E.J. Chikofsky, C.H. May, ”Reverse engineering”. Advances in Computers, 35: 199-254, 1992. Software re-engineering and reverse engineering 58