Object-Oriented Reengineering Patterns and Techniques Wahyu Andhyka Kusuma, S.Kom kusuma.wahyu.a@gmail.com 081233148591 Materi 5 Problem Detection Topik • Metrics • Object-Oriented Metrics dalam Praktek • Duplikasi kode Topik • Metrics – Kualitas dari Perangkat Lunak – Menganalisa Kecenderungan • Object-Oriented Metrics dalam Praktek • Duplikasi kode Mengapa menggunakan OO dalam Reengineering? • Menaksir kualitas dari perangkat lunak – Komponen mana yang memiliki kualitas yang buruk? (sehingga dapat di reengineering) – Komponen yang mana memiliki kualitas yang baik? (sehingga dapat di reverse engineered) Metrics sebagai peralatan untuk reengineering • Mengontrol proses dari reengineering – Menganalisa kecenderungan : • Komponen mana yang bisa diubah?? – Bagian refactoring mana yang dapat digunakan? Metrics sebagai peralatan reverse engineering! 7.4 ISO 9126 Quantitative Quality Model Functionality Error tolerance Reliability Accuracy Software Quality Efficiency defect density = #defects / size Consistency Usability Simplicity correction time Portability Modularity correction impact = #components changed Factor Characteristic Metric Maintainability ISO 9126 7.5 Product & Process Attributes Process Attribute Definisi : Mengukur aspek dari Proses dimana memproduksi produk Contoh : waktu untuk memperbaiki, kerusakan jumlah dari komponen Yang dirubah per perbaikan Product Attribute Definisi : Mengukur aspek dari Hasil yang dikirimkan ke pelanggan Contoh : Jumlah dari sistem Yang rusak, mempelajari tentang sistem 7.6 External & Internal Attributes Internal Attribute Definisi : mengukur didalam Istilah didalam produk Memisahkan FORM, dalam konteks behaviour Contoh : class coupling dan cohesion, method size External Attribute Definisi : mengukur bagaimana product/process berjalan dalam environment Contoh : waktu rata-rata dalam kesalahan, #components changed 7.7 External vs. Internal Product Attributes External Internal Keuntungan: Kerugian: > close relationship dengan quality > relationship dengan quality factors factors tidak dalam empirically validated Kerugian: Keuntungan: > Mengukur hanya setelah produk > Dapat diukur kapanpun > Pengumpulan data dapat secara digunakan > Pengumpulan data sulit data serinkali ada interfrensi pengguna > Menghubungkan eksternal efek ke dalam internal sangat sulit mudah dan otomatis > Berhubungan langsung dengan pengukuran dan penyebabnya 7.8 Metrik dan Pengukuran • Weyuker [1988] mendefinisikan sembilan properti dimana Metrik software harus diambil • Untuk OO hanya 6 properti yang sangat penting [Chidamber 94, Fenton & Pfleeger ] – Non coarseness: • Diberikan sebuah Class P dan sebuak metrik m, kelas lain misal Q juga dapat ditemukan sehingga menjadi m(P) m(Q) • Tidak semua kelas memiliki nilai yang sama untuk metrik – Non uniqueness. • Dimana kelas P dan Q memiliki ukuran tetap sedemikian sehingga m(P) = m(Q) • Dua kelas dapat memiliki metrik yang sama – Monotonicity • m(P) m (P+Q) dan m(Q) m (P+Q), P+Q adalah “kombinasi” dari kelas P dan Q. 7.9 Metrik dan Pengukuran – Design Details are Important • Inti utama dari Class harus mempengaruhi nilai dari metrik. Setiap class melakukan aksi yang sama dengan detailnya harus memberikan dampak terhadap nilai dari metrik. – Nonequivalence of Interaction • m(P) = m(Q) m(P+R) = m(Q+R) dimana R interaksi dengan Class – Interaction Increases Complexity • m(P) + (Q) < m (P+Q). • Dimana dua class digabungkan, interaksi diantaranya juga akan menambah nilai dari metrik • Kesimpulan: Tidak semua pengukuran berupa Metrik 7.10 Memilih Metrik • Cepat – Scalable: Kita tidak dapat menghasilkan log(n2) dimana n 1 juta LOC (Line of Code) • Tepat – (misalnya #methods — perhitungkan semua method, public, juga inherited?) • Bergantung pada kode – Scalable: Kita menginginkan mengumpulkan metrik dalam waktu sama • Sederhana – Metrik yang komplek sulit untuk diterjemahkan 7.11 Menaksir kemudahan perbaikan • Ukuran dari sistem, termasuk entitas dari sistem – Ukuran Class, Ukuran method, inheritance – Ukuran entitas mempengaruhi maintainability • Kesatuan dari entities – Class internal – Perubahan harusnya ada dikelas tersebut • Coupling (penggabungan) diantara entitas – Didalam inheritance: coupling diantara class-subclass – Diluar inheritance – Strong coupling mempengarui perubahan di kelas tersebut 7.12 Sample Size and Inheritance Metrics Class Size Metrics # methods (NOM) # instance attributes (NIA, NCA) # Sum of method size (WMC) Inheritance Metrics hierarchy nesting level (HNL) # immediate children (NOC) # inherited methods, unmodified (NMI) # overridden methods (NMO) Inherit Class BelongTo Method Size Metrics # invocations (NOI) # statements (NOS) # lines of code (LOC) Invoke Attribute Method Access 7.13 Sample class Size • (NIV) – [Lore94] Number of Instance Variables (NIV) – [Lore94] Number of Class Variables (static) (NCV) – [Lore94] Number of Methods (public, private, protected) (NOM) • (LOC) Lines of Code • (NSC) Number of semicolons [Li93] number of Statements • (WMC) [Chid94] Weighted Method Count – WMC = ∑ ci – where c is the complexity of a method (number of exit or McCabe Cyclomatic Complexity Metric) 7.14 Hierarchy Layout • (HNL) [Chid94] Hierarchy Nesting Level , (DIT) [Li93] Depth of Inheritance Tree, • HNL, DIT = max hierarchy level • (NOC) [Chid94] Number of Children • (WNOC) Total number of Children • (NMO, NMA, NMI, NME) [Lore94] Number of Method Overridden, Added, Inherited, Extended (super call) • (SIX) [Lore94] – SIX (C) = NMO * HNL / NOM – Weighted percentage of Overridden Methods 7.15 Method Size • (MSG) Number of Message Sends • (LOC) Lines of Code • (MCX) Method complexity – Total Number of Complexity / Total number of methods – API calls= 5, Assignment = 0.5, arithmetics op = 2, messages with params = 3.... 7.16 Sample Metrics: Class Cohesion • (LCOM) Lack of Cohesion in Methods – [Chidamber 94] for definition – [Hitz 95] for critique Ii = set of instance variables used by method Mi let P = { (Ii, Ij ) | Ii Ij = } Q = { (Ii, Ij ) | Ii Ij } if all the sets are empty, P is empty LCOM = |P| - |Q| if |P|>|Q| 0 otherwise • Tight Class Cohesion (TCC) • Loose Class Cohesion (LCC) – [Bieman 95] for definition – Measure method cohesion across invocations 7.17 Sample Metrics: Class Coupling (i) • Coupling Between Objects (CBO) – [Chidamber 94a] for definition, – [Hitz 95a] for a discussion – Number of other classes to which it is coupled • Data Abstraction Coupling (DAC) – [Li 93] for definition – Number of ADT’s defined in a class • Change Dependency Between Classes (CDBC) – [Hitz 96a] for definition – Impact of changes from a server class (SC) to a client class (CC). 7.18 Sample Metrics: Class Coupling (ii) • Locality of Data (LD) – [Hitz 96] for definition LD = ∑ |Li | / ∑ |Ti | Li = non public instance variables + inherited protected of superclass + static variables of the class Ti = all variables used in Mi, except non-static local variables Mi = methods without accessors 7.19 The Trouble with Coupling and Cohesion • Coupling and Cohesion are intuitive notions – Cf. “computability” – E.g., is a library of mathematical functions “cohesive” – E.g., is a package of classes that subclass framework classes cohesive? Is it strongly coupled to the framework package? 7.20 Conclusion: Metrics for Quality Assessment • Can internal product metrics reveal which components have good/poor quality? • Yes, but... – Not reliable • false positives: “bad” measurements, yet good quality • false negatives: “good” measurements, yet poor quality – Heavyweight Approach • Requires team to develop (customize?) a quantitative quality model • Requires definition of thresholds (trial and error) – Difficult to interpret • Requires complex combinations of simple metrics • However... – Cheap once you have the quality model and the thresholds – Good focus (± 20% of components are selected for further inspection) • Note: focus on the most complex components first! 7.21 Topik • Metrics • Object-Oriented Metrics dalam Praktek – Detection strategies, filters and composition – Sample detection strategies: God Class … • Duplikasi kode Detection strategy • A detection strategy is a metrics-based predicate to identify candidate software artifacts that conform to (or violate) a particular design rule 7.23 Filters and composition • A data filter is a predicate used to focus attention on a subset of interest of a larger data set – Statistical filters • I.e., top and bottom 25% are considered outliers – Other relative thresholds • I.e., other percentages to identify outliers (e.g., top 10%) – Absolute thresholds • I.e., fixed criteria, independent of the data set • A useful detection strategy can often be expressed as a composition of data filters 7.24 God Class • A God Class centralizes intelligence in the system – Impacts understandibility – Increases system fragility 7.25 Feature Envy • Methods that are more interested in data of other classes than their own [Fowler et al. 99] 7.26 Data Class • A Data Class provides data to other classes but little or no functionality of its own 7.27 Data Class (2) 7.28 Shotgun Surgery • A change in an operation implies many (small) changes to a lot of different operations and classes 7.29 Topik • Metrics • Object-Oriented Metrics dalam Praktek • Duplikasi kode – Detection techniques – Visualizing duplicated code Kode di salin Contoh dari Mozilla Distribution (Milestone 9) Diambil dari /dom/src/base/nsLocation.cpp [432] [433] [434] [435] [436] [437] [438] [439] [440] [441] [442] [443] [444] [445] [446] [447] [448] [449] [450] [451] [452] [453] [454] [455] [456] [457] [458] [459] [460] [461] [462] [463] [464] [465] [466] NS_IMET HODIMP [467] LocationImpl::GetP athname(nsSt [468] ring { [469] nsAutoString href; [470] nsIURI *url; [471] nsresult result = NS_OK; [472] [473] result = Get Href(href); [474] if (NS_OK == result ) { [475] #ifndef NECKO [476] result = NS_NewURL(&url, href); [477] #else [478] result = NS_NewURI(&url, href); [479] #endif // NECKO [480] if (NS_OK == result ) { [481] #ifdef NECKO [482] char* file; [483] result = url->GetP ath(&file); [484] #else [485] const char* file; [486] result = url->GetFile(&file); [487] #endif [488] if (result == NS_OK ) { [489] aP at hname.SetString(file); [490] #ifdef NECKO [491] nsCRT ::free(file); [492] #endif [493] } [494] NS_IF_RELEASE(url); [495] } [496] } } return result; NS_IMET HODIMP [497] LocationImpl::Set P athnam e(const nsString [498] { [499] nsAutoString href; [500] nsIURI *url; [501] nsresult result = NS_OK; [502] [503] result = Get Href(href); [504] if (NS_OK == result ) { [505] #ifndef NECKO [506] result = NS_NewURL(&url, href); [507] #else [508] result = NS_NewURI(&url, href); [509] #endif // NECKO [510] if (NS_OK == result ) { [511] char *buf = aP athname.T oNewCSt ring(); [512] #ifdef NECKO [513] url->Set P ath(buf); [514] #else [515] url->Set File(buf); [516] #endif [517] Set URL(url); [518] delete[] buf; [519] NS_RELEA SE(url); [520] } [521] } [522] [523] return result; [524] } [525] [526] [527] [528] [529] NS_IMET HODIMP LocationImpl::GetP ort(nsString& aP ort ) { nsAutoString href; nsIURI *url; nsresult result = NS_OK; result = Get Href(href); if (NS_OK == result ) { #ifndef NECKO result = NS_NewURL(&url, href); #else result = NS_NewURI(&url, href); #endif // NECKO if (NS_OK == result ) { aP ort .Set Lengt h(0); #ifdef NECKO P RInt 32 port ; (void)url->Get P ort (& port); #else P RUint 32 port ; (void)url->Get Host P ort (& port); #endif if (-1 != port) { aP ort.Append(port , 10); } NS_RELEA SE(url); } } } return result; 7.31 Berapa banyak kode diduplikasi? Biasanya diperkirakan: 8 hingga 12% dari kode Contoh LOC Duplikasi tanpa komentar gcc 460’000 8.7% 5.6% Database Server 245’000 36.4% 23.3% Payroll 40’000 59.3% 25.4% Message Board 6’500 29.4% 17.4% Dengan komentar 7.32 Apa itu duplikasi kode? • Duplikasi kode = Bagian dari kode program ditemukan ditempat lain dalam satu sistem yang sama – Dalam File yang berbeda – Dalam File sama tapi Method berbeda – Dalam Method yang sama • Bagian tersebut harus memiliki logika atau struktur yang sama sehingga dapat diringkas, ... computeIt(a,b,c,d); ... ... computeIt(w,x,y,z); ... is not considered duplicated code. ... getIt(hash(tail(z))); ... ... getIt(hash(tail(a))); ... could be abstracted to a new function 7.33 Permasalahan dari duplikasi • Biasanya memberikan efek negatif – Penggelembungan kode • Efek negatif ketika perbaikan sistem atau software • Menyalin menjadi kerusakan tambahan dalam kode – Software Aging, “hardening of the arteries”, – “Software Entropy” increases even small design changes become very difficult to effect 7.34 Mendeteksi duplikasi kode Nontrivial problem: • No a priori knowledge about which code has been copied • How to find all clone pairs among all possible pairs of segments? Lexical Equivalence Syntactical Equivalence Semantic Equivalence 7.35 General Schema of Detection Process Transformation Source Code Author Comparison Transformed Code Level Transformed Code Duplication Data Comparison Technique Johnson 94 Lexical Substrings String-Matching Ducasse 99 Lexical Normalized Strings String-Matching Baker 95 Syntactical Parameterized Strings String-Matching Mayrand 96 Syntactical Metric Tuples Discrete comparison Kontogiannis 97 Syntactical Metric Tuples Euclidean distance Baxter 98 Syntactical AST Tree-Matching 7.36 Recall and Precision 7.37 Simple Detection Approach (i) • • Assumption: • Code segments are just copied and changed at a few places Noise elimination transformation • remove white space, comments • remove lines that contain uninteresting code elements – (e.g., just ‘else’ or ‘}’) … //assign same fastid as container fastid = NULL; const char* fidptr = get_fastid(); if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ]; … fastid=NULL; constchar*fidptr=get_fastid(); if(fidptr!=NULL) intl=strlen(fidptr) fastid = newchar[l+] 7.38 Simple Detection Approach (ii) • Code Comparison Step – Line based comparison (Assumption: Layout did not change during copying) – Compare each line with each other line. – Reduce search space by hashing: • Preprocessing: Compute the hash value for each line • Actual Comparison: Compare all lines in the same hash bucket • Evaluation of the Approach – Advantages: Simple, language independent – Disadvantages: Difficult interpretation 7.39 A Perl script for C++ (i) $equivalenceClassMinimalSiz e = 1; $slidingWindo wSize = 5; $remo veKeyw ords = 0; @keyw ords = qw(if then else ); while (<>) { chomp; $totalLines++; # remo ve comments of type /* */ my $codeOnly = ''; while(($inComment && m|\*/|) || (!$inComment && m|/\*|)) { unless($inComment) { $codeOnly .= $` } $keyw ordsRegExp = join '|', @k eyw ords; $inComment = !$inComment; $_ = $'; } @unw antedLines = qw( else $codeOnly .= $_ unless $inComment; retur n $_ = $codeOnly; retur n; { s|//.*$||; # remo ve comments of type // } s/\s+//g; #remo ve white space ; s/$keyw ordsRegExp//og if ); $remo veKeyw ords; #remo ve keywords push @unw antedLines, @keyw ords; 7.40 A Perl script for C++ (ii) $codeLines++; push @currentLines , $_; push @currentLineNos , $.; if($slidingWindo wSiz e < @currentLines) { shift @currentLines; shift @currentLineNos;} #print STDERR "Line $totalLines >$_<\n"; my $lineToBeCompared = join '', @currentLines; my $lineNumbersCompared = "<$ARGV>"; # append the name of the ¼ le $lineNumbersCompared .= join '/', @currentLineNos; #print STDERR "$lineNumbersCompared\n"; if($bucketRef = $eqLines{$lineT oBeCompared}) { push @$b ucketRef , $lineNumbersCompared; } else {$eqLines{$lineT oBeCompared} = [ $lineNumbersCompared ];} if(eof) { close ARGV } # Reset linenumber-count f or next ¼le • Handles multiple files • Removes comments and white spaces • Controls noise (if, {,) • Granularity (number of lines) • Possible to remove keywords 7.41 Output Sample Lines: create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pnMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); create_property(pd,pnOwnership,stBool,true,*iOwnership); Locations: </face/typesystem/SCTypesystem.C>6178/6179/6180/6181/6182 </face/typesystem/SCTypesystem.C>6198/6199/6200/6201/6202 Lines: create_property(pd,pnSupertype,stReference,true,*iSupertype); create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); Locations: </face/typesystem/SCTypesystem.C>6177/6178 </face/typesystem/SCTypesystem.C>6229/6230 Lines = duplicated lines Locations = file names and line number 7.42 Enhanced Simple Detection Approach • Code Comparison Step – As before, but now • Collect consecutive matching lines into match sequences • Allow holes in the match sequence • Evaluation of the Approach – Advantages • Identifies more real duplication, language independent – Disadvantages • Less simple • Misses copies with (small) changes on every line 7.43 Abstraction – Abstracting selected syntactic elements can increase recall, at the possible cost of precision 7.44 Metrics-based detection strategy • Duplication is significant if: – It is the largest possible duplication chain uniting all exact clones that are close enough to each other. – The duplication is large enough. 7.45 Automated detection in practice • Wettel [ MSc thesis, 2004] uses three thresholds: – Minimum clone length: the minimum amount of lines present in a clone (e.g., 7) – Maximum line bias: the maximum amount of lines in between two exact chunks (e.g., 2) – Minimum chunk size: the minimum amount of lines of an exact chunk (e.g., 3) Mihai Balint, Tudor Gîrba and Radu Marinescu, “How Developers Copy,” ICPC 2006 7.46 Visualization of Duplicated Code • Visualization provides insights into the duplication situation – A simple version can be implemented in three days – Scalability issue • Dotplots — Technique from DNA Analysis – Code is put on vertical as well as horizontal axis – A match between two elements is a dot in the matrix abc defa bcdef a b c d e fa b x y e f a bcd e a b x yc de a x bc xd e x f g xh Exact Copies Copies with Variations Inserts/Deletes Repetitive Code Elements 7.47 Visualization of Copied Code Sequences Detected Problem File A File B File A contains two copies of a piece of code File B contains another copy of File A this code Possible Solution Extract Method All examples are made using Duploc from an industrial case File B study (1 Mio LOC C++ System) 7.48 Visualization of Repetitive Structures Detected Problem 4 Object factory clones: a switch statement over a type variable is used to call individual construction code Possible Solution Strategy Method 7.49 Visualization of Cloned Classes Class A Class B Detected Problem: Class A is an edited copy of class B. Editing & Insertion Class A Possible Solution Subclassing … Class B 7.50 Visualization of Clone Families Overview Detail 20 Classes implementing lists for different data types 7.51 Kesimpulan • Duplikasi Kode adalah masalah nyata – Membuat sistem semakin susah untuk diubah • Mendeteksi duplikasi kode adalah masalah berat – Beberapa teknik sederhana dapat membantu – Dukungan dari alat lain juga dibutuhkan • Visualisasi dari kode sangat berguna • Mengatasi duplikasi kode bisa dijadikan bahan penelitian 7.52