ORGANIZATIONAL MEASUREMENT OF DEFECT DENSITY NURDATILLAH BINTI HASIM A project report submitted in partial fulfillment of the requirements for the award of the degree for MSc. (Computer Science - Real Time Software Engineering) Centre for Advanced Software Engineering Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia APRIL 2009 iii Special dedication to my beloved mother Che Saerah Rahmani and father Hasim Din. iv ACKNOWLEDGEMENT First and foremost, thank you to Allah the Almighty because finally I managed to complete my industrial attachment and this project report. Deep appreciation goes to my supervisor, Mr. Mohd Ridzuan Ahmad for all the guidance, support, constructive comments and valuable knowledge and motivations to write this project report. A million of thank you dedicated to Practices and Compliance Management Department (PCM) in HeiTech Padu Berhad. Thank you very much for all the opportunity given, priceless knowledge and experience especially to my Industrial Mentor Mdm Lenny Ruby Atai, Mdm Yanti Salwani Che Saad as the head of the PCM department and other PCM team members which includes Miss Azreen Diana Abdullah and Miss Siti Fatimah Abu Bakar. A thousand of thank you to my lovely friends who give a lot of useful opinions and thought also shared every good and bad moment together during these five months attachment period. Thank you to my family for the continuous encouragement, full support and understanding in order for me to complete my master study in UTM. May Allah bless all of you. v ABSTRACT Measurement and Analysis Process is one of process areas covered in the Capability Maturity Model Integration (CMMI) Level 2. However, in order to produce the measurement a proper plan and process are required. Organizational Measurement of Defect Density is a measurement conducted in HeiTech Padu Berhad (HeiTech). This project report will be discussing the issue of defect density measurement focusing on defects found during testing phase. Issue raised with defect density measurement is that the result of the measurement tends to be imprecise because of the various ways used to count the line of codes. To this end, this project report presents a method for measurement process in order to calculate defect density which is the HeiTech Measurement and Analysis Process. These methods consist of three phases which are Planning, Implementing and Improving phase. As for the implementation, three projects has been selected which are BHEUU-CSMBBG, BHEUU-SKHD and iProcurement-Vendor Management. Line of code for all three projects has been count using selected code counter which is GeroneSoft Code Counter Pro V1.32. Moreover, along with the defect classification developed it has been useful and effective in turn to detect defect discover in a program. As for defect density measurement output, results has been gathered and presented in the Measurement and Analysis Report. vi ABSTRAK Prosess pengiraan dan analisis adalah salah satu lapangan proses dalam Capability Maturity Model (CMMI) peringkat 2. Namun begitu, perancangan dan proses yang teratur amat penting bagi memastikan keputusan hasil pengiraan adalah seperti yang dijangkakan. Pengiraan Kepadatan Kecacatan Berorganisasi adalah sejenis pengiraan yang dijalankan di HeiTechPadu Berhad (HeiTech). Laporan projek ini akan membincangkan perincian tentang pengiraan kepadatan kecacatan dan penumpuan dilakukan lebih kepada kecacatan yang dijumpai semasa fasa pengujian. Isu yang berbangkit dalam pengiraan kepadatan kecacatan adalah keputusan yang diperolehi hasil daripada pengiraan tersebut adalah tidak tepat kerana wujudnya kepelbagaian dalam mengira baris kod. Selain itu, laporan projek ini juga menunjukkan kaedah bagi proses pengiraan kepadatan kecacatan iaitu Proses Pengiraan dan Analisis HeiTech. Kaedah ini terbahagi kepada tiga fasa iaitu fasa perancangan, fasa perlaksanaan dan fasa pengukuhan. Tiga jenis projek telah dipilih sebagai pembelajaran kes iaitu BHEUU-CSMBBG, BHEUU-SKHD dan iProcurementVendor Management. Baris kod bagi ketiga-tiga projek tersebut telah dikira menggunakan satu perisian pengira baris kod iaitu GeroneSoft Code Counter Pro V1.32. Selain itu, klasifikasi kecacatan yang telah diwujudkan juga sangat membantu dalam proses mengenal pasti kecacatan pada satu-satu program dengan lebih efektif. Akhir sekali, hasil pengiraan kepadatan kecacatan dikumpul dan dimasukkan ke dalam Laporan Pengiraan dan Analisis. vii TABLE OF CONTENTS CHAPTER 1 TITLE PAGE DECLARATION ii DEDICATION iii ACKNOWLEDGEMENT iv ABSTRACT v ABSTRAK vi TABLE OF CONTENTS vii LIST OF TABLES xii LIST OF FIGURES xiv LIST OF ABBREVIATIONS xv LIST OF APPENDICES xvi INTRODUCTION 17 1.1 Organizational Background 17 1.2 Core Business 17 1.3 Practices & Compliance Management Department 18 Background 1.4 1.3.1 Practices & Compliance Management Structure 18 1.3.2 Practices & Compliance Management Scope of Work 20 Project Background 21 1.4.1 Organizational Measurement Analysis of Defect 21 Density viii 2 3 SCOPE & OBJECTIVES 23 2.1 Vision Statement 23 2.2 Project Objectives 23 2.3 Project Scopes 24 2.4 Project Deliverables 24 2.5 Project Plan 25 LITERATURE STUDY 26 3.1 Defect Density Calculation 26 3.2 Definitions and Classifications 26 3.2.1 Defect Definitions 26 3.2.2 Defect Classifications 27 3.3 3.4 3.5 3.2.2.1 HeiTech Classifications 27 3.2.2.2 Amendments on HeiTech Classifications 28 3.2.2.3 Defect Log 29 System Size 30 3.3.1 Type of Calculation for System Size 30 3.3.1.1 Counting Source Line of Code 30 3.3.1.2 Counting Token 31 3.3.1.3 Counting Modules 34 3.3.1.4 Counting Functions 34 HeiTech Defect Density Calculation 34 3.4.1 HeiTech Defect Density 35 3.4.1.1 Advantages Using Metrics Line of Code 37 3.4.1.2 Disadvantages Using Metrics Line of Code 38 3.4.2 Proposed Tools 39 3.4.2.1 GeroneSoft Code Counter Pro V1.32 40 3.4.2.2 LOC Counter 41 3.4.2.3 LocMetrics 44 3.4.2.4 Krakatau Essential Project Management 44 3.4.2.5 Code Counter 3 45 3.4.3 Tools Comparison 46 HeiTech Measurement and Analysis Plan 46 ix 3.5.1 Functional Correctness 3.5.1.1 Code and Testing Defects 47 3.5.1.2 Review Defects 48 3.5.2 General Analysis Concept 3.6 47 48 3.5.2.1 Quantitative Analysis 52 3.5.2.2 Structure Analysis Model 59 3.5.2.3 Performance Analysis Procedure 62 Analysis of Other Related Measurement Plan 63 3.6.1 Using In-Process Metrics to Predict Defect Density in 63 Haskell Programs 3.7 4 3.6.2 STREW-H 64 3.6.3 Initial Set of Metrics for STREW-H Version 0.1 64 3.6.4 Feasibility Study on STREW-H 65 3.6.5 Upcoming work 67 Comparison between Any Related Measurement Plan 67 3.7.1 Advantages of the Measurement Plan 67 3.7.2 Disadvantages of the Measurement Plan 67 3.7.3 Rework on HeiTech Measurement Plan 68 PROJECT METHODOLOGY 69 4.1 Project Management Information System (PROMISE) 69 4.1.1 Quality Management 69 4.2 Measurement and Analysis Process 71 4.3 Planning Phase 71 4.3.1 Identify Scope 72 4.3.2 Define Procedure 73 4.3.2.1 Define the Measures 74 4.3.2.2 Define the Counting Procedure 74 4.3.2.3 Assemble Forms or Procedure to Record the 75 Measures 4.3.2.4 Select and Design Database or Spreadsheet 75 for Storage 4.3.2.5 Define Analysis Method 75 x 4.3.2.6 Define Potential Relationship among the 76 Measures and Construct Measurement Reporting Format 4.3.2.7 Define Mechanism for Feedback 4.4 4.5 4.6 5 Implementing Phase 76 4.4.1 Data Collection 77 4.4.2 Data Analyzing 78 4.4.3 Result Communicating 80 Improving Phase 80 4.5.1 80 Process Evolving Measurement Tools 81 4.6.1 Classes of Tools and its Support Functions 81 4.6.2 Guideline for Measurement Support Tools 82 PROJECT DISCUSSIONS 83 5.1 Defect Log Documentation 83 5.2 Establishing Measurement and Analysis 84 5.2.1 Planning Phase 84 5.2.2 Implementing Phase 85 5.2.2.1 Defect Density Data Collection 85 5.2.2.2 Defect Density Data Analyzing 86 5.2.2.3 Defect Density Result Communicating 87 5.2.3 Project Constraints 6 76 88 CONCLUSION 89 6.1 89 Towards Capability Maturity Model Integration Level 5 in HeiTech 6.2 Training Experience 89 6.3 Recommendations 90 6.4 Conclusion 91 xi REFERENCES 92 Appendices A-D 95-98 xii LIST OF TABLES TABLE NO. TITLE PAGE 1.1 PCM Units 19 2.1 Project Deliverables 24 3.1 Details on the GeroneSoft Code Counter Pro V1.32 40 3.2 Default Counting Rules 42 3.3 Objects Detail Description 42 3.4 Line of Code Counter Tools Comparison 46 3.5 Code and Testing Defects 48 3.6 Review Defects 49 3.7 Parts of indicator 51 3.8 Requirement Data Type Descriptions 56 3.9 Requirement Peer Review Defects 56 3.10 Calculation Data Type Descriptions 57 3.11 Calculation for Requirements Peer Review Defects 57 3.12 Data from seven version of GHC 66 4.1 Identify Scope Activity 72 4.2 Type of Tools and its Support Functions 81 5.1 Defect Density Objectives 84 5.2 Defect Density Data Collection 86 xiii LIST OF FIGURES FIGURE NO. 1.1 TITLE Organizational Structure of Practices & Compliance PAGE 19 Management Department 1.2 Defect Density Calculation 22 3.1 Defect Density Calculation 26 3.2 Fault, Defect and Error Relationship 27 3.3 Equation 1 (E1) 32 3.4 Equation 2 (E2) 32 3.5 Equation 3 (E3) 32 3.6 Equation 4 (E4) 33 3.7 Screenshot on the code counted using GeroneSoft 40 Code Counter Pro V1.32 3.8 LOC Counter Report Result 43 3.9 LocMetrics Code Counting Results 44 3.10 CodeCounter Counting Result 45 3.11 Example of an Indicator 52 3.12 Software Quality Management and Quantitative 53 Process Management Flow 3.13 Defect Prevention Flow 54 3.14 Control Chart 55 3.15 Control Chart for Requirements Peer Review Defects 58 3.16 Control Chart for Requirements Peer Review Defects 58 without HMI 3.17 Issue Analysis Model Showing the Relationship among Common Issue Areas 59 xiv 3.18 Model of Common Software Issue Relationships 60 3.19 Analysis Steps 62 3.20 Defect Density (Defects/KLOC) 65 3.21 Multiple Regression Analysis 66 4.1 Quality Control Process Summary 70 4.2 HeiTech Measurement and Analysis Process 72 4.3 Individual Chart 79 4.4 Moving Range Chart 79 4.5 Cpk Analysis Distribution Curve 79 5.1 Summary of Activities Perform During Implementing 85 Phase 5.2 Defect Density during UAT 87 xv LIST OF ABBREVIATIONS BHEUU - Bahagian Hal Ehwal Undang-undang CAR - Causal Analysis & Resolution CMMI - Capability Maturity Model Integration CMVC - Configuration Management and Version Configuration COBOL - Common Business-Oriented Language COCOMO - Cost Constructive Model EPG - Engineering Process Group FP - Function Point GHC - Glasgow Haskell Computer HMI - Human Machine Interface IEEE - Institute of Electrical and Engineering Institute KLOC - Kilo Line of Code LOC - Line of Code PCM - Practices and Compliance Management PSC - Project Steering Committee SEI - Software Engineering Institute SLOC - Source Line of Code SPC - Statistical Process Control SRS - System Requirement Specification STREW-H - Software Testing and Reliability Early Warning for Haskell STREW-J - Software Testing and Reliability Early Warning for Java System UAT - User Acceptance Test xvi LIST OF APPENDICES APPENDIX TITLE PAGE A Project Planning 95 B Defect Calssifications 96 C Progress Report 97 D User Manual – GeroneSoft Code Counter Pro V1.32 98 CHAPTER 1 INTRODUCTION 1.1 Organizational Background A public listed company on the main board of the Bursa Malaysia, HeiTech Padu Berhad is Malaysia’s leading ICT solution s and services provider. With more than 10 years at the forefront of the ICT industry, HeiTech’s local testimonies include the delivery of mission critical projects for Malaysian Government agencies such as the National Registration Department, Immigration Department and Road Transport Department. In addition, HeiTech also one of the key player in driving the public sector leveraging on systems integration strength and capabilities. Consequently, HeiTech has further transformed itself, becoming one of the key managed services providers in Malaysia whose products include Padu*Net (Managed Network Services) and Padu*Center (Managed Data Center Services). HeiTech’s portfolio of customers has expanded to the private sector customer base mainly from the financial services industry. 1.2 Core Business After going through strategies changes, HeiTech business pursuit are focused towards the financial sector, health sector, defense sector, education sector as well as the manufacturing, oil and gas and utilities industries. The following areas are the products and services offered in HeiTech Padu Berhad: 18 i. Managed Data Center Services ii. Managed Network & Communication Services iii. System Integration Services iv. Solution & Consultancy Offerings The following business entities listed below have been established to develop, promote and deliver the respective core business of the company under the flagship of HeiTech Padu Berhad: i. HeiTech e*Business Solution Sdn Bhd ii. HeiTech i-Solutions Sdn Bhd iii. HeiTech Managed Services Sdn Bhd 1.3 Practices & Compliance Management Department Background 1.3.1 Practices & Compliance Management Structure Basically as shown in Figure 1.1, Practices & Compliance Management Department (PCM) Consist of three main units which are Practices Development & Consultation, Compliance Management and Internal SOP & Knowledge Management. Table 1.1 shows the key function for each unit in PCM. 19 Figure 1.1: Organizational Structure of Practices & Compliance Management Department Table 1.1: PCM Units Unit Practices Development & Consultation Key Functions § Plan and development of best practices for core business. § Promote HeiWay program for work and culture focus in quality and professionalism. Compliance Management § Work on certification CMMI Level 5 § Evaluate and promote quality tool. § Coverage for compliance for application development and project 20 Unit Key Functions management. § Assessment on the degree of compliance to the agreed standard. Internal SOP & Knowledge § Administration of scope. Management § Review and enhance of SOP to be effective. § Reorganization of knowledge management. § 1.3.2 Establishment of SIS portal. Practices & Compliance Management Scope of Work Basically the scope of the implementation of best practices and compliance would only cover the HeiTech Business Solution (HBS). There are nine major scope of work concern: i. Formulate and continuously improve in-house best practices based on the industry best practices for the following core business disciplines of HBS. ii. Facilitate or provide consultation on HeiTech Defined Best Practices and related industry best practices in Project Management and Software Engineering (Application Development) disciplines. iii. Provide training on HeiTech Defined Best Practices, related industries best practices in the Project Management and Software Engineering (Application Development) disciplines and Relationship Oriented Service Excellence (ROSE) practices as per Customer Services Initiative Program. 21 iv. Provide quality control through compliance assessment and monitoring of HeiTech Define Best Practices (process) implementation. v. Measure and analyze organizational process performance of Project Management and Application Development related processes. vi. Strategize and manage HBS quality roadmap, quality initiatives & quality certification. (e.g. ISO and CMMI) vii. Facilitate the establishment and continuous improvement of HBS inter and intra standard operating procedure. viii. Monitor Knowledge Management and play a role as the library of critical knowledge needed by the user. ix. IT Security Management Practices (focused in application related to the security). 1.4 Project Background 1.4.1 Organizational Measurement Analysis of Defect Density Measurements infuses everyday life and is an important part in every scientific and engineering discipline. Software measurement may serve several purposes, depending on the level of knowledge about a process of product. The very nature of software engineering makes measurement a necessity, because more rigorous methods for product planning, monitoring and control are needed. Otherwise the amount of risk software projects may become extreme, and software product may easily get out of industrial control [3,10,16]. One type of software measurement that is going to be discussed in this project report is defect density. In Organizational Measurement Analysis of Defect Density, the focus will be on measuring the solid value for defect density. In order to perform this measurement, a 22 metric has been proposed which is known as the indirect measurement. Figure 1.2 shows the calculation formula for this measurement. DefectDensity = DefectsFound SystemSize Figure 1.2: Defect Density Calculation Total size of product is a normalizer that allows comparisons between different software entities examples modules, releases and products. Size is typically counted either in Line of Code (LOC) or Function Points (FP). Essentially total size of product will be measure base on the LOC. Metric LOC is used because it is one of the most widely used techniques in cost estimation. It is known as basic metric underlying several cost estimation models by Boehm. Due to the common use, it allows a simple comparison to data from many other projects. [16] A capable practice will be applied in turn to gain the precise result. Measurement and analysis process and procedure will be done base on the HeiTech Measurement and Analysis process that has been define in the HeiTech Measurement and Analysis Plan V-4.0. Three HeiTech project has been selected as case study for implementation purposes. Projects involved are BHEUU-CSMBBG, BHEUU-SKHD and iProcurement-Vendor Management. As for the size of the product, several LOC counter tools will be introduce and the finest tool will be selected to be utilize. CHAPTER 2 SCOPES & OBJECTIVES 2.1 Vision Statement To introduce organizational measurement of defect density and implement it to three selected system at HeiTech. Pertaining metrics LOC as the system size and update the defect classification and perform measurement and analysis process base on HeiTech standard. Applied the result as well as follows the quality implementation of best approaches in software engineering. 2.2 Project Objectives The objectives of this project are: i. To conduct a study on defect density calculation. ii. To gain the precise way to determine size of the product. iii. To do a study on type of defect in order to classify defects found during documentation review and code process. iv. To update the HeiTech Measurement and Analysis Plan. v. To apply the selected measurement using raw data and information from the selected projects. vi. To produce the HeiTech Measurement and Analysis Report base on the results from the implementation phase of measuring the defect density. 24 2.3 Project Scopes The scopes of this project are: i. The project is bounded to the established practices in EPG HeiTech. ii. Identify alternatives solution apart of implementing the defect density calculation. iii. As for defect density calculation, this project only covered three selected project which are BHEUU-CSMBBG, BHEUU-SKHD and iProcurement-Vendor Management. iv. Only defect discovered during UAT are consider for defect density calculation. v. Two types of document will be update and produce which are HeiTech Measurement and Analysis Plan also HeiTech Measurement and Analysis Report. 2.4 Project Deliverables The main project deliverables are standards and documentations related to the analysis and implementation phase shown in Table 2.1: Table 2.1: Project Deliverables No. Deliverables Recipient 1. HeiTech Measurement and Analysis Plan. Mdm. Lenny Ruby Atai 2. HeiTech Measurement and Analysis Report (Defect Mdm. Lenny Ruby Atai density). 3. Project Report/Thesis Mr. Mohd Ridzuan Ahmad Mdm. Lenny Ruby Atai 25 2.5 Project Plan Refer to Appendix A for the project plan. CHAPTER 3 LITERATURE STUDY 3.1 Defect Density Calculation Basically, defect density is a measure of the total knows defects or defects found divided by the size of the software being measured [7]. The defect density calculation is as shown in Figure 3.1 below. Sub-paragraphs below explain the parameters used in defect density formula. DefectDensity = DefectsFound SystemSize Figure3.1: Defect Density Calculation 3.2 Definitions and Classifications 3.2.1 Defect Definitions A defect is defined as errors that are found after reviewing the phase where it was introduced [4]. Defect can be divided into two categories, which are pre-release defect and post-release defect. Pre-release defects are defects discovered before the software product is release while the post-release defects are defects found after the software product has been released. Both errors and defects are considered as fault and generally referred as bug. To differentiate between defect and error, it can be stated that 27 when the developed software has an undesired feature, the software is considered as defect. However, if the software produce an incorrect values, that is considered as error [4, 14]. Fault Defect Error Figure 3.2: Fault, Defect and Error Relationship There are a number of defect types discovered by the HeiTech EPG base on previous projects at their company. These defects have been group according to their main function. 3.2.2 Defect Classifications There are several type of defects found in the project that has been carry out at HeiTech. These defects then are classified by the EPG group in order to gain the feasibility and efficiency to record the defects found in the future. 3.2.2.1 HeiTech Defect Classifications Generally, before a baseline is produced, defect type will be presented in one Excel sheet file and the file is named as Defect Log. At first, defects found are listed without any classifications which known as Defect Log V3.0. The following list shows the common type of defects: 28 i. Syntax ii. Functionality iii. Ambiguity iv. Completeness v. Workflow vi. Standards vii. Data viii. Logic ix. Interface x. Consistency xi. Code optimization xii. Error handling xiii. Testability xiv. Memory handling xv. Traceability xvi. Security However, there are several issue raised regarding this version. Defect list created are general and redundant at some of software development phases. Therefore, as for the solution, Defect Log V4.0 is created [4, 16]. 3.2.2.2 Amendment on Defect Classifications After the baseline, defects discovered has been group base on their main functions which are code and testing defects as well as review defects. This new defect list is recorded as Defect Log V4.0. The new list provided user with the descriptions and example of what the exact type means. Thus, it generates better understanding about defect discovered. 29 Code and testing defects covered all defect discovered during coding and testing phase specifically. Review defects covered defects discovered during documentation review process. Refer Appendix B for defect classifications list. Basically, defect classification was done base on the list of checklist provided by the EPG team. From those checklist type of defect that occurs are extracted out and recorded in a list. Examples of the checklist are as shown below: i. Business or System Requirement Specification Review Checklist ii. Deployment Readiness Review Checklist iii. Software Quality Assurance Review Checklist iv. System Design Description Review Checklist v. Test Report Review Checklist Therefore, if the defects are about completeness of the documentation such as SRS or SDD it will definitely fall into review defect category. Nevertheless, if the defects discovered are about coding completeness or error, it will fall into code and testing defects. 3.2.2.3 Defect Log Defect log basically is a list of defect type discovered that has been recorded in one file. The purpose of defect log is to introduce the common defects that usually found during and after developing projects. Thus defect log provided usually are use as a guideline in order to recognize and recorded defect found amount. As for HeiTech, defect log basically presented in one sheet of Excel file that is Defect Log V3.0. However, after the baseline, defect log has been emended into several Excel sheets which known as Defect Log V4.0. Paragraph 3.2.2.1 explained details about the content of defect log. 30 3.3 System Size System size or software size is one of the vital activities in software engineering that is used to estimate the size of a software application or component in order to be able to implement other software project management activities. Size is an inherent characteristic of software in just like weight is an inherent characteristic of any tangible material. System size can be calculated using several types of different techniques [8]. 3.3.1 Types of Calculation for System Size In addition it is an interesting fact that programmers write a certain amount of line of code per days regardless of the programming language. Concerning about the comparison between high-level language and the low level language does not seem to reflect any relevant features here. Therefore, below is the statement as quoted in [2]: “LOC is not a good productivity measure because it penalizes high-level languages: Assembler programmers produce five statements to a COBOL programmer’s one. But you should not compare COBOL to Assembler: they are as different as night and day. If you compare COBOL programs only to other COBOL programs, and PL/1 to PL/1, then LOC provides a stable comparison tool.” There are many alternatives basic requirement available that can be use as practices. Next sub paragraphs describe in details several selected alternatives basic measurements. 3.3.1.1 Counting Source Line of Code A line of code is any line of program text that is not a comment or blank line, regardless of the number of statements or fragments of statements on the line. This specifically includes all lines containing program headers, declarations and executable and 31 non-executable statements [7]. There are several advantages why line of code is worth counting. The advantages using line of code are stated below [11, 14]: i. It is one of the most widely used techniques in cost estimation. ii. It is the basic metric underlying several cost estimation models by Boehm. iii. Due to the widespread use, it allows a simple comparison to data from many other projects. iv. It is an easy method to measure effort. v. It gives reliable average results, which is crucial especially for huge projects. There are several tools available to count line of code such as GeroneSoft Code Counter Pro V1.32, LOC Counter and Krakatau Essential Project Management (KEPM). Tools available came with a number of different features such as programming language supported license status and system requirements. 3.3.1.2 Counting Token At the beginning we mentioned other alternatives method for measuring size, namely to count tokens. To put it simple we just sum up the tokens in the program code. Then line of codes with many tokens in the program will at least look more complicated than those with fewer tokens and thus required more effort. Halstead developed in his Software Science a weighting-scheme that works as described above and additionally make some interesting statistics. Tokens in the program are divided into two categories: The operator and the operands. The operators are describes as all defined keywords of a language like string, for, if, while, the brackets and character to terminate a statements. The operands are all the words and number that programmers add to write a program, for example the names of variables or numbers. The number of all the operators in a program N1 and the number of all operands N2 is called. 32 N = N1 + N 2 Figure3.3: Equation 1 (E1) As shown in E1 above, the sum of N1 and N2 is N. Moreover, it is equal to the count tokens-size mentioned before. Again Halstead did not count declarations, function headers or labels in his weighting-scheme. He even omitted to count input and output statements probably also with the purpose only to count lines which require a certain effort to write by the programmer. However, Halstead when on and define the η1 which signify the number of different operators appearing in the program and the η2 which signify the number of different operands appearing in the program. Moreover there are again several rules to be tag on in order to count the operators and the operands. For example the operator “-” has been use in most programming language as both unary and binary operator. Therefore, it is a need to define whether to count it once or twice. The sum of η1 and η2 is η and describe by Halstead in vocabulary perspective. Via a language dependent constant c, S and N is switched, dividing N trough c and so link these two metrics shown in E2. S = N /c Figure3.4: Equation 2 (E2) Halstead also defines the metric Volume V shown in E4, which is defined as the number of tokens N multiplied with the dual logarithm of the vocabulary η. To put this formula into words: Each of the η of vocabulary is represented by a binary number. The dual logarithm tells how many digits this binary number has. Vocabulary η = η1 + η 2 Figure 3.5: Equation 3 (E3) 33 VolumeV = N × log2 η Figure 3.6: Equation 4 (E4) η1 = Number of different operators η2 = Number of different operands c = Language dependent constant When the number of digits is multiply with the number of tokens needed for programs, the value of volume V is retrieved. Base on the equation, it can be conclude that metrics S, V and N are related. Therefore, switching from one metrics to another is relevant. Moreover, when there is a change in counting rules, this change will not affect the metrics N and V. In addition, there are several other advantages if we choose Halstead as listed below: i. Do not require in depth analysis of programming structure. ii. Predicts rate of error. iii. Predicts maintenance effort. iv. Useful in scheduling and reporting projects. v. Measure overall quality of programs. vi. Simple to calculate. vii. Can be use for any programming language. viii. Numerous industry studies support the use of Halstead in predicting programming effort and mean number of programming bugs. However, this method also has its weakness. For example the definition in counting LOC does not consider several aspects like experience of programmers, use of tools or modern programming practice, which in turn have a significant influence on software development effort. 34 3.3.1.3 Counting Modules Another method to measure size is by counting groups of lines. Now every program of a certain size has to be split up in parts for the sake of overlooking the whole program and reduce the complexity in the parts. This modularization called and it is obvious to look at the possibility to use module as units for this kind of measurements. The advantage of counting modules and take the result as an indicator of the size of the program is, that it is simple. But there is a problem that makes this method not useful: Estimation on the effort for a module is impossible. Studies show that the average size of a module is about hundred line of code. But the same studies also show that the size of the module varies from ten lines of codes to ten thousand lines of codes. Therefore, this makes an estimation of effort for a number of modules impossible. 3.3.1.4 Counting Functions Functions used in programming as a way of thinking. The principles are the same as for modularization and this thinking is not bound to the programmer’s job but is present in our daily activities. We drive a car using a lot of functionalities where we know what they do but not how they do. We write an essay splitting up the whole theme in sections and chapters to concentrate on some aspects of the theme. We do it because of the limitation of our thinking to overview everything at one time. We are limited to cope with a program more than one hundred line of code without using abstractions. But we do well with a function of ten lines of code. 3.4 HeiTech Defect Density Calculation As discussed in 3.1, defect density is a measure of the total knows defects or defects found divided by the size of the software being measured. HeiTech applied the same computation for their defect density calculation. 35 3.4.1 HeiTech Defect Density In order to calculate value for defect density, parameters involved are number of defect found along with the number of system size. As for the number of defect found, Defect Log V4.0 can be use as a guideline in order to identify the defects. For the number of system size, HeiTech has chosen Source Line of Code (SLOC) as their basic measurement to determine the system size. The SLOC, is software metric also known as the low-level metrics. Basically it will count the number of line in the text of the program’s line of codes. SLOC is used to predict the effort required in order to develop a system or it also can used to measure the productivity of the programmer in the organizations once the product has been produce. As the base metric, several things need to be consider before doing the counting process which are whether to consider the comment, non-executable code and non-delivered code or not. Therefore, in order to count the line of code, comment is advisably excluded from the counting process. The specific guideline on how to write the proper code has been introduce by HeiTech in their Coding Guideline. There will be differences in time and effort if we include the comment. First point to consider is that different programmer will have a different style of commenting. Therefore, Code Guideline has been introduce in order to mend this problem. Furthermore, the amount of comment varies because of the developers programming style, the documentation rules of an organization and the language used. Counted lines of comment could give an influence to the code written. If the lines of comment counted, then the programmers could be tempted to write more line than needed, only to enhance their productivity rate. On the other hand, if we did not count the lines of comment, programmer would be seen more efficient if they omit to document their code. In addition if we look at the other side of perspective, the cost of maintenance and enhancement will increase as the effect of the missing documentation and overall cost also will be higher compare than the case where programmer take some time to document their code 36 The second thing to be consider is the non-executable code. Non-executable code consists of variable like constant, functions header or label should not be counted. Since they does not contain any real logic of the program and if it is does, we cannot consider it as an effort writing it because they are so simple to write. Furthermore, their presence or absence does not affect the functions of the program. The third thing is regarding the non-delivered code. The main idea here is as the same as the non-executable code. Usually large project where the non-delivered code from tools, utilities and even compilers discovered, consist four times the amount of delivered code. Hence it is useful when top down approaches are applies in order to clear the test driver writers. The fourth thing to be considered is the statements and lines. What is the particular concern when it comes to the code that consists of more than one statement? Therefore the physical and logical line of codes is recognized. Physical line of code is type of metrics that counts the line but excludes empty lines and comments hence it is also referred as the SLOC and normally know as the simplest line count. Logical line of code is line of code that consists number of “statements” but specific definition still tied to specific programming language (Examples are C++, C#, Java and .Net). In detailed, logical line of code counts the number of logical lines. Where two or more lines are joined with the “_” line continuation sequence, is counted as one line. A logical line can be considered as a line of code if it has any other content than just a comment or whitespace. Thus all executable lines, as well as declarative lines, are counted in logical line of code. Compiler directives such as #const and #if are counted as code. On the other hand code that is excluded by a False condition in an #if, #then, #else if, #else and #end if block is not counted as code. In fact, it is not counted as whitespace or comment either. Actually it is not part of the program in the analyzed configuration, so it does not really have any meaning. It is included in the physical line count though. 37 3.4.1.1 Advantages Using Metrics Line of Code In order to predict the amount of effort require to develop a program, metrics line of code also provide estimated programming productivity or effort once the software is produced. Although there are more disadvantages than the advantages, there are several reason why this metric still relevant to put into practice. Metric line of code is used because it is one of the most widely used techniques in cost estimation. It is basic metric underlying several cost estimation models by Boehm. Due to the widespread use, it allows a simple comparison to data from many other projects, the historical once included. The alternatives methods to counting of LOC are also fighting with problems and weaknesses. It is an easy method to count effort. In spite of its unreliability for individual programs, it gives reliable average results, which is crucial especially for huge projects. It is reliable for small projects when we quantify the method defining the programming language and the rules how lines of code are counted. The discrepancies caused by including or excluding Job Control Language, administration or clerical work are usually small. It considers in fact the higher productivity of high-level languages. Hence it can avoid lengthy code by organizing reviews, increase, as Stevenson proposes, the pressure of work or introduce adjustment factor for especially verbose programmers. Cost estimation models like COCOMO, which are based on lines of code, show a close agreement between predicted and actual effort. There is a strong correlation between lines of code and effort. And there is also a strong correlation between lines of code and alternative metric or in other words: it is possible to switch from lines of code to another metric and vice versa. 38 3.4.1.2 Disadvantages Using Metrics Line of Code Paragraph below describe details on disadvantages of the metric line of code. First, SLOC contribute to the lack of accountability that suffers from fundamental problem. Next is the lack of consistency with functionality that is proven in several experiments showing that effort is highly correlated with the line of code, whereas functionality is less correlated with the line of code. Imagine there are two type of developer. One is a skilled developer the other one is unskilled developer. Skilled developer referred here as developer that has been went trough several training and knowledge improvement. An observation has been made and it seems that the skilled developer manage to develop a system with same functionality but far less of code. Thus we have a program with the same functionality but different total line of code. Moreover we can conclude that a developer can still be productive by producing a program with less code than with more line of code. Therefore, it shows that line of code is a poor metric to practice in order to distinguish the productivity. On the other hand, the developer’s experiences also influence the metrics line of code. Developers usually will produce a system consisting different amount of the line of code base on their styles and experience. Therefore, typically experience developer usually produces a program with less amount of line of code with the same functionality compare with the inexperience developer. With the advent of GUI-based programming languages and tools such as Visual Basic, programmers can write relatively little code and achieve high levels of functionality. For example, instead of writing a program to create a window and draw a button, a user with a GUI tool can use drag-and-drop and other mouse operations to place components on a workspace. Code that is automatically generated by a GUI tool is not usually taken into consideration when using LOC methods of measurement. This results in variation between languages; the same task that can be done in a single line of code (or no code at all) in one language may require several lines of code in another. 39 In today’s software scenario, software is often developed in more than one language. Very often, a number of languages are employed depending on the complexity and requirements. Tracking and reporting of productivity and defect rates poses a serious problem in this case since defects cannot be attributed to a particular language subsequent to integration of the system. Function Point stands out to be the best measure of size in this case. There is no standard definition of what a line of code is. Do comments count? Are data declarations included? What happens if a statement extends over several lines? – These are the questions that often arise. Though organizations like SEI and IEEE have published some guidelines in an attempt to standardize counting, it is difficult to put these into practice especially in the face of newer and newer languages being introduced every year. A programmer, whose productivity is being measured in lines of code, will be rewarded for generating more lines of code even though he could write the same functionality with fewer lines. The more management is focusing on lines of code, the more incentive the programmer has to expand his code with unneeded complexity. Since lines of code are proportional to the following cost of fixing bugs and maintaining the program in general, this is bad. It is an example of the business proverb: "What you measure is what you get". 3.4.2 Proposed Tools There are several tools that have been proposed to count the line of code. Each of tools proposed contains their own special feature. The sub-paragraph below explains further details for each of proposed tools. 40 3.4.2.1 GeroneSoft Code Counter Pro V1.32 The GeroneSoft Code Counter Pro V1.32 is a tool to count the line of codes in source or text file. Published by GeroneSoft, this software basically able to count C, C++, Java, Delphi/Pascal, COBOL, VB, PHP, ASP, XML, FORTRAN and other programming languages. It also includes total counts for comments, blanks and source lines whether sortable or exportable to CSV, HTML, XML and other spreadsheets such as MS Excel. Table 3.1 describes further details of the selected tool. Figure 3.7: Screenshot on the code counted using GeroneSoft Code Counter Pro V1.32 Table 3.1: Details on the GeroneSoft Code Counter Pro V1.32 Publisher GeroneSoft License Shareware Additional info Supports file without extensions, fixed CPU utilization bug on idle/minimized, major revamp of the user interface System requirement Windows 95 and above, 128Mb RAM, 5Mb Disc Space 41 OS support Win95, Win98, WinME, WinXP, WinNT 3.x, WinNT 4.x, Windows 2000, Windows 2003 Installation support 3.4.2.2 Install and uninstall LOC Counter To obtain consistent metrics for all its programming groups, Microsoft Information Technology (Microsoft IT) came out with the freeware tool that was easy to use and provided consistent framework in which to count line of code in different programming languages. This tool provided the following features: i. Separate measure code by the distinct programming languages that are used in the project. ii. Provide a set of common rules to determine which code is included or executed from the count. iii. Easily connected to the various source control systems (repositories) that are used at Microsoft IT. iv. The tool should count only physical line of code and not logical lines of code. v. The tool should count code according to a set of customizable counting rules. Table 3.2 shows the default rules that Microsoft IT has specified as the counting standard for the tool. 42 Table 3.2: Default Counting Rules Object C# ASP SQL C++ Microsoft XML Scripting ASPX Visual XSD Languages Cascading Basic HTML Style .Net Sheets Yes Yes Yes Yes Yes Yes Yes Generated code No No No No No No No Blank lines No No No No No No No Comments No No No No No No No Metadata No Not Yes Not Not Yes Not applicable applicable User-written code applicable applicable Re-used code No No No No No No No File types .cs .asp .sql .cpp .vd .xml .js .aspx .sp .h .vds .xsd .vbs .css .tbl .idl .frm .htm .cmd .vew .def .cls .html .fn Table 3.3 describes the details description regarding objects used in previous table. Table 3.3: Objects Detail Description Object Descriptions User-written code This includes, declarations, compiler directives, export symbols, variable assignments, and all executable source code. Additionally, this code includes language syntax elements that appear on a line of their own. Generated code This includes the code that is generated by Visual Studio or by any other code generator. Comments This includes single-line comments or multiple-line comments. If the comment is part of a valid line of source code, the comment is counted together with the source line. 43 Object Descriptions Metadata This is the XML data that some programs use to pass messages among subsystem or functions. Re-used code This represents any source code library that an IT application team has not developed. This includes Enterprise Library, the .NET Software Development Kit (SDK), third-party controls, and so on. Figure 3.8 shows the report produced after code counting process executed. This software essentially only supports C programming language which made it least essential if user got multiple programming language used. Figure 3.8: LOC Counter Report Result 44 3.4.2.3 LocMetrics LocMetrics counts total lines of code (LOC), blank lines of code (BLOC), comment lines of code (CLOC), lines with both code and comments (C&SLOC), logical source lines of code (SLOC-L), McCabe VG complexity (MVG), and number of comment words (CWORDS). A physical executable source line of code (SLOC-P) is calculated as the total lines of source code minus blank lines and comment lines. Counts are calculated on a per file basis and accumulated for the entire project. LocMetrics also auto generates a comment word histogram. It is a freeware tool and supports C#, C++ and Java programming language. Figure 3.9: LocMetrics Code Counting Results 3.4.2.4 Krakatau Essential Project Management KEPM is a GUI based tool, which is also an essential component of the CMVC professional’s toolkit. The main functions provided include the unique changed SLOC metrics offered by KEPM, as well as the differences in the metrics between releases. 45 Beside that, it also includes various types of source code metrics such as Halstead Size metrics and numerous LOC metrics. It is licensed software produce by Power Software. However, for user that wishes to seek further on the software, trial version is provided. The interesting part in this software is that it allows comparison analysis instead of single project analysis. 3.4.2.5 Code Counter 3 Code Counter is a GUI tool developed using Visual C++, which can be used for counting the number of source lines, commented lines and blank lines. Basically, for Visual C++ projects the “.dsp” file will be parsed to get the list of files in the project. However Code Counter can also be use for non Visual C++ projects like, a Java project or a C or C++ project developed on Unix and other operating systems, by distributing the list of files, in a “.map” text file. The Code Counter result is stored in CSV or txt format. It is a freeware tools. Figure 3.10 shows the result produce after counting process executed. Figure 3.10: CodeCounter Counting Result 46 3.4.3 Tools Comparison Table 3.4 shows a comparison between all the proposed tools. However GeroneSoft Code Counter Pro V1.32 is selected as the most effective tools because of its various language support capability. Table 3.4: Line of Code Counter Tools Comparison Tools License Language support OS support GeroneSoft Code Counter Pro V1.32 Shareware C, C++, Java, Dephi/Pascal, COBOL, VB, PHP, ASP, XML, FORTRAN LOC Counter Freeware LocMetrics Freeware Karakatau Essential Project Management Shareware Code Counter 3 Freeware C++, C#, ASP, SQL, VB, .NET, Scripting Languages C#, C++, Java, SQL Ada, ASP.NET, C/C++, C#, IDL, Java, JSP, Oracle SQL, Perl, PHP, VB/VB.net, VHDL or XML C++ Win95, Win98, WinME, WinXP, WinNT 3.x, WinNT 4.x, Windows 2000, Windows 2003 Win95, Win98, WinME, WinXP 3.5 Win95, Win98, WinME, WinXP Win95, Win98, WinME, WinXP Win95, Win98, WinME, WinXP HeiTech Measurement and Analysis Plan HeiTech Measurement and Analysis Plan is basically a measurement process guideline that to be used for all projects in HeiTech Padu Berhad. The purpose of this HeiTech Measurement and Analysis Plan is to record the measurement’s objectives and how measurement plan should be carried out in the project. The measurement plan shall also: 47 i. Ensures that activities pertinent to the collection of project measures are considered early in the project and are resolved clearly and consistently. ii. Ensures that project team is aware of the measurement and analysis activities and provide an easy reference to them. iii. Develops a set of metrics that will improve the risks management, quantitative project management and provides an information database to be used for future estimation work effort. This guideline applied to the whole measurement process including planning, implementing and improving. It provides description on activity to be performed in each phase and possible measures or metrics to be collected in an application development project. However, defect will be focused more in this guideline because they are still not properly defined. Thus this will define categories and its measures selected to be used to implement measurement activities in Product Quality issue area [13]. 3.5.1 Functional Correctness The measures of Functional Correctness identify the accuracy that is achieved in product functions and the number of functional defects that are observed. These measures provide an indication of product quality. 3.5.1.1 Code and Testing Defects Table 3.5 describes category and details information in order to implement measurement activities in product quality area. Data item involved is code and testing defects per thousand line of code [4]. 48 Table 3.5: Code and Testing Defects Description Quantifies the number, status and priority of defects reported. This measure is divided into two categories; pre deployment, which covers the defects during UAT and post deployment, covers the defect during maintenance. The number of defects indicates the amount of rework and has a direct impact on quality. Business Goal To provide customers with nearly error free and premium quality deliverables. Objective Measure/Analysis (i) Defect Density observed in Unit Test – 10/KLOC (ii) Defect Density observed in System Test – 7/KLOC Defect density. generated Measurement tools to A graph indicates: support analysis Data item Data source (i) Defect density during code and testing phase (ii) Defect distribution by defect type Code and Testing Defects per KLOC. (i) Test Report (ii) Defect log Periodic Code and Testing Phases. Count actual based on Number of defects reported or captured in defect log. Questions (i) How many (critical) defects have been reported for each component? (ii) Do defect reporting and closure rates support the scheduled completion date of integration and test? 49 (iii) What components have a disproportionate amount of defects, and require additional testing, review or rework? 3.5.1.2 Review Defects Table 3.6 describes category and details information in order to implement measurement activities in product quality area. Data item involved is review defect count per number of pages [4]. Table 3.6: Review Defects Description Quantifies the number, status and priority of defects reported. This measure is divided into two categories; pre deployment, which covers the defects during UAT and post deployment, covers the defect during maintenance. The number of defects indicates the amount of rework and has a direct impact on quality. Business Goal To provide customers with nearly error free and premium quality deliverables. Objective (i) Document review defect density is managed at 0.1 (1 defect per 10 pages) (ii) Defect Density observed in Code Review is 15 per KLOC Measure/Analysis Review defect density. generated Measurement tools to support analysis A graph indicates: 50 Document review defect density by phases. Data item Data source Review defect count per number of pages. (i) Defect log (ii) Review Checklist Findings Periodic Review Phases. Count actual based on Number of defects reported or captured in defect log. Questions (i) How many (critical) defects have been discovered during the review? (ii) Do defect reporting and closure rates support the scheduled completion date of integration and test? (iii) What deliverables have a disproportionate amount of defects, and require additional review or rework? 3.5.2 General Analysis Concept During analyzing, data is converted into information that is used by managers to make decisions relative to project-specific issues. In general analysis concept, indicator has been introduce as the basic building block of measurement analysis. Analysis process is important in order to perform a measurement. It allows a composite issue to be break into smaller fractions in terms of better understanding. An indicator is a basic building block of measurement analysis. Usually represent as a graph or table. Basic measurement indicators consist of three parts [13]: 51 Table 3.7: Parts of indicator No. Parts of indicator 1. An actual value of a measure Examples include hours of effort expended, or or combination of measures 2. Descriptions lines of code produced to date. Expected value of a measure This is a planned value, quantitative objective, or combination of measures baseline or historical value such as planned milestone dates, target level of reliability or required productivity. 3. Decision criteria These include rules of thumb (previous problem that is solved by experienced worker) and statistical techniques used to assess the difference (often called variance) between planned (expected) and actual (measured) values. A comparison between an actual data and an expectation must be made in order to decide whether the result is consequential. For example, for December 1999, 67 percent of the units expected to be complete were not completed, which exceeds the 10 percent threshold as shown in Figure 3.11. 52 Figure 3.11: Example of an Indicator Analysis determines if the project is meeting defined plans and targets. Once a project has committed to a plan, performance can be measured against its plan. Therefore performance is crucially important to pursue with plan. Even if the project begins with a good plan, once the performance begins to deviate from the plan, then the cause for the deviation must be classify and corrective actions should be taken to ensure the success. There are several other disciplines practiced during the analysis phase which are quantitative analysis, structured analysis model and performance analysis procedure. This type of analysis has been described in details on the next sub paragraph. 3.5.2.1 Quantitative Analysis When conducting quantitative analysis on project data the result can be used for both Software Quality Management (SQM) and for Quantitative Process Management (QPM). If the data analyzed are defects detected, the intent is to reduce the defects during the activities that detected defects throughout development, thus satisfying SQM. When 53 out of statistical control conditions occur, the reason for the anomaly is investigated and the process brought back under control, which gratifies QPM. Figure 3.12 shows the relation between the SQM and QPM. Defect prevention activities are conducted to identify the cause of defects and prevent them from recurring. Besides, it also involve analyzing defects that were encountered in the past and taking specific actions to prevent the occurrence of those type of defects in the future which discussed in the next paragraph. These activities are also considered as one mechanism for spreading lessons learned between projects. Figure 3.13 shows the Level 5 Defect Prevention process flow. Perform Training/ Orientation Work in Progress Project Staff Established Plans/Goals Level 4 PAT Management Measures Project Staff Modifications Needed Implement Corrective Action (s) Modifications Needed Defects Computer Resources Conduct Quantitative Analysis Project Staff Management Analysis Team Anomaly Start Lessons Learned Corrective Action Plan Other Reasons PAT – Process Action Team Figure 3.12: Software Quality Management and Quantitative Process Management Flow 54 Start Perform Training/ Kickoff Meeting Analysis Team Identify/Categorize Defects Work in Progress Project Staff Project Staff No Anomaly Record Lesson Learned Defect Prevention Plan Management Analysis Team Conduct Analysis (Quantitative or Other) Anomaly or Other Reason (s) Conduct Causal Analysis Analysis Team Figure 3.13: Defect Prevention Flow Statistical Process Control (SPC) is an efficient method used to monitor a process behavior through the use of control chart. Figure 3.5 shows the usage of control chart in the analysis. Base on the normal distribution, 99% of all normal random values lie within +/-3 standard deviations from the norm, 3-sigma. If a process is mature and under statistical process control, all events should lie within upper and lower control limits. If an events fall out of the control limits the process is said to be out of statistical process control and the reason for this anomaly needs to be investigate for cause and the process brought back under control [13]. 55 Determine Cause of Deviation Upper Control Limit Measurements 3 Standard Deviations (+3 Sigma) Center Line 3 Standard Deviations (-3 Sigma) Lower Control Limit Determine Cause of Deviation Time Figure 3.14: Control Chart Control chart selected, because they separate signal from noise. This makes it easier to recognize whenever anomalies occur. Besides, they identify undesirable trends and point out fixable problems and potential process improvements. Control charts show the capability of the process, so achievable goal can be set. They provide evidence of process stability, which justifies predicting process performance. There are two types of data used in control chart which is variables data and attributes data. Variables data are usually measurements of continuous phenomena. Examples of variables data in software settings are elapsed time, effort expanded, and memory/CPU utilization. Attributes data are usually measurement of discrete phenomena such as number of defects, number of source statements, and number of people. Most measurement in software used for SPC is attributes data. It is important to use the correct data on a particular type of control chart. Below is an example of the real project applying SPC to a real data over a period of time. Table 3.8 explains details about the data type of classification represent in columns in Table 3.9. 56 Table 3.8: Requirement Data Type Descriptions No. Data type Descriptions 1. Sample Series of peer reviews 2. SRSs System Requirement Specification 3. No. Rqmts Number of requirements reviewed for each sample 4. Defects Number of defects detected for each sample 5. Defects/100 rqmts Defects normalized to 100 requirements for each sample Table 3.9: Requirement Peer Review Defects No. Sample SRSs No. Rqmts Defects Defects/100 Rqmts 1. UTL 1 152 5 3.28 2. APP 1 37 4 10.81 3. HMI 1 350 101 28.85 4. MSP 1 421 13 3.8 5. EKM 1 370 25 6.75 6. CMS 1 844 60 7.10 Totals 6 2174 208 The formula for constructing the control chart is as follow [1]. U-chart is the control chart that is implemented. i. Defects/ = 100 Rqmts Number of Defects * 100 / Rqmts reviewed per sample (Calculated for each sample) **Plotted as PLOT ii. CL = Total number of defects / Total number of requirements reviewed * 100 iii. a(1) = Requirements reviewed / 100 (Calculated for each sample) iv. UCL = CL + 3 (SQRT (CL / a(1)) (Calculated for each sample) 57 LCL v. = CL – 3 (SQRT (CL / a(1)) (Calculated for each sample) Table 3.10 explains details about the data type of classification represent in columns in Table 3.11. The calculations are shown in Table 3.11. Whenever LCL is negative, it is set to zero. Control chart for the calculation is shown in Figure 3.6. Table 3.10: Calculation Data Type Descriptions No. Data type Descriptions 1. PLOT Defects per 100 requirements is the plot on the chart 2. CL Center line, an average 3. a(1) A variable calculated for each sample 4. UCL Upper control limit calculated for each sample 5. LCL Lower control limit calculated for each sample Table 3.11: Calculation for Requirements Peer Review Defects No. Sample PLOT CL UCL LCL a(1) 1. UTL 3.28 9.56 17.09 2.04 1.52 2. APP 10.81 9.56 24.82 0 0.37 3. HMI 28.85 9.56 14.52 4.60 3.5 4. MSP 3.8 9.56 14.09 5.04 4.21 5. EKM 6.75 9.56 14.39 4.74 3.7 6. CMS 7.10 9.56 12.76 6.37 8.4 58 Figure 3.15: Control Chart for Requirements Peer Review Defects From the observation in the third sample, there are anomaly occurred. Fundamental analysis revealed that data for that sample were from human interface specifications, while all others were applications. Figure 3.15 shows the control chart for requirements peer review defect with the HMI. Control charts need similar data for similar process. The HMI sample was removed and the data charted again as shown in Figure 3.16. Figure 3.16: Control Chart for Requirements Peer Review Defects without HMI The process is now under statistical process control. The root cause is data congregate from different activities cannot be used on the same statistical process on control charts. The process for defining and implementing human machine interfaces is 59 different for that of defining and implementing applications code as are the teams and methodologies. The defect prevention is against the process of collecting data for SPC control charts. 3.5.2.2 Structure Analysis Model Structured analysis model is used to define the relationship between individual issues. Figure 3.17 below illustrate structured analysis model. This model shows the typical relationships among common issue areas and sets the stage for defining appropriate measurement indicator for analysis activities. Each of the arrows in the figure represents a relationship. Figure 3.17: Issue Analysis Model Showing the Relationship among Common Issue Areas 60 Technology Effectiveness Product Size and Stability Functional Size and Stability 1 Process Performance Technology Suitability Process Compliance Technology Votality Process Efficiency Impact Process Effectiveness 4 2 Physical Size and Stability 3 4 7 Financial Performance 4 9 Personnel Milestone Performance 5 Schedule and Progress Environment and Support Resources Resource and Cost Work Unit Progress 6 10 Incremental Capability 10 Customer Support 8 Customer Satisfaction Efficiency DependabilityReliability Customer Feedback 10 Product Quality Usability Portability Functional Correctness SupportabilityMaintainability Figure 3.18: Model of Common Software Issue Relationships The following relationships correspond to the numbered circles in the Figure 3.18: i. Functional size represents the amount of functionality that the system must provide. This is usually determined by the requirements. Functional size is a primary determined by physical size (the amount of product that must be developed or maintained). ii. The impact and volatility of any new technology also influences product size. Most innovative technical attempt to minimize the amount of new product that must be implemented for a given function. Examples of technical approaches include COTS, common architectures, and reusable components. If the effectiveness of the approaches does not yield all of the desire benefit, more of the system must be developed than planned. 61 iii. Increases in product size usually result in the need for additional personnel. iv. Process performance usually affects the overall need for personnel resources, and influences development schedules and product quality. A more capable organization performs better, assuming that other factors are constant. v. Adding more personnel impacts schedule and progress. If personnel are added early in the project and if the appropriate training and communications are in place the schedule may be shortened. If the personnel are added later, schedule delays may occur because it is difficult to add the unplanned staff to an ongoing project. Schedule shortfalls are associated with milestone slips, delays completing lifecycles activities and products, and downsized build and release requirements. vi. Schedule shortfalls can cause product quality problems including defects in the product. In order to meet the schedule, test efforts may be curtailed or problems may not be corrected. Generally, higher system complexity makes maintenance process harder plus requires more time to fix errors. vii. Latent quality problems represent rework that requires additional effort make current or future releases acceptable for user. The project manager will usually make a delivery decision base on the number of open problems. High-priority problems may be fixed while others may be deferred to the operations and maintenance phase. viii. Rework increases the effort required to complete the project as originally specified. Projects are often forced to modify or eliminate some mission requirements to stay within cost and schedule constraints. ix. For software-intensive systems, personnel effort (including rework) is usually the primary determinant of project cost. Cost control can be achieved only by controlling other upstream factors. x. Resources, schedule, and product quality all impact customer satisfaction. 62 3.5.2.3 Performance Analysis Procedure Analysis consists of four steps in turn to conduct an analysis which are Compare Plan vs. Actual, Assess Impact, Predict Outcome and Evaluate Alternatives. The process flows are as shown in Figure 3.19: Figure 3.19: Analysis Steps i. Compare Plan Versus Actual: Problems are identified by quantifying the difference between plans and actual. If the difference exceeds the threshold acceptable to management, then the solution should be investigated further. Both the trend and the absolute magnitude of the difference should be analyzed. If the variance has been growing steadily over time, it should be investigate; even if it has not exceeded a pre-defined threshold. ii. Asses Impact: Localize the source of detected variance and evaluate scope. It may require additional focused data collection, but usually can be satisfied with existing data. Sometimes a substantial difference between planned and actual values may caused by outliers, which are values that do not appear to be consistence with other data. 63 iii. Predict Outcome: Data from statistical and other techniques are to be used to manage the project and to predict whether project will be able to achieve its quality and process performance objectives. iv. Evaluate Alternatives: If the predicted outcome does not satisfy project objectives, alternative courses of action must be investigated. Measurement assists in evaluating project outcomes given different scenarios and actions. 3.6 Analysis of Other Related Measurement Plan This paragraph identifies the other related measurement plan, metric used and predicament faced in order to gain precise result for the defect density. 3.6.1 Using In-Process Metrics to Predict Defect Density in Haskell Programs In a Haskell Program, method uses a suite of internal, in-process metrics to estimate defect density is proposed. The suite of metrics is called the Software Testing and Reliability Early Warning for Haskell (STREW-H). The candidate metrics selected for initial consideration in STREW-H range from testing metrics to structural metrics to compiler warnings. Feasibility study using a subset of STREW-H metrics to estimate defect density was performed using metrics from seven versions of the open source Glasgow Haskell Compiler source code. 64 3.6.2 STREW-H Research has been done on the early estimation of reliability growth of Java programs using in-process testing. The early estimation provides feedback to developers so that they can correct faults during development process and can increase the testing efforts, if necessary, to provide added confidence in the software. Method used is called Software Testing Reliability Early Warning for Java systems (STREW-J). STREW-H metric suite was proposed base on the research done. Because of the differences in the language paradigms, some of the metrics are not as applicable with functional language. STREW-J metric suite was utilized as a starting point to the STREW-H. Here, several metric suites have been eliminated because it is not applicable for functional language and additions is made base on expert opinion. Expert opinion here, refer to 12 Haskell researchers at Galois Connection Inc. and members of the Progmatica team at The OGI School of Science & Engineering at OHSU (OGI/OHSU). 3.6.3 Initial Set of Metrics for STREW–H version 0.1 An initial set of metrics for the STREW-H version 0.1, as follows: i. Number of test case asserts / source lines of code ii. IO monadic lines of code / source lines of code iii. Number of type signatures / source lines of code iv. Number of overlapping patterns / source lines of code v. Number of duplicating exports / source lines of code vi. Number of missing fields / source lines of code vii. Number of missing methods / source lines of code viii. Number of incomplete patterns / source lines of code ix. Number of missing signatures / source lines of code 65 3.6.4 x. Number of name shadowing / source lines of code xi. Number of unused binds / source lines of code xii. Number of unused imports / source lines of code xiii. Number of unused matches / source lines of code xiv. Number of test cases / source lines of code xv. Test lines of code / source lines of code Feasibility Study on STREW-H In order to estimate the defect density, a study has been conducted to analyze the potential of using STREW-H metric suite. The open source Glasgow Haskell Compiler (GHC) was chosen as the initial test system, since there were seven versions available. For the feasibility study purpose, four metrics was selected. The selection is base on the expert recommendation and relation with the STREW-J metrics suite. The metrics include: i. IO monadic lines of code / source lines of code (T1) ii. Number of test cases / source lines of code (T2) iii. Test lines of code / source lines of code (T3) iv. Number of type signatures / source lines of code (T4) A multiple regression analysis was performed on these four metrics to determine if there were indicative of the number of defects that were found for each version. Six of seven versions of GHC were randomly chosen to formulate the coefficients in regression model. Figure 3.20 shows calculation used to predict the defect density. DefectDensity = 0.08 + 0.0113 ∗ (T1) + 0.0002 ∗ (T2) + 0.607 ∗ (T3) 0.0762 ∗ (T4) Figure 3.20: Defect Density (Defects/KLOC) 66 Figure 3.21 below show the actual defect densities with the regression equation used in the model. In the graph, the estimate defect density for seventh version was 0.04 defects/KLOC, while the actual defect density was 0.07 defects/KLOC. This equation was shown good estimate of defect density in other versions of the system. Figure 3.21: Multiple Regression Analysis However, there is a limitation of the STREW-H method of estimating defects density. The limitation is that it is not base on any operational profile of the system. While utilizing operational profiles to estimate system defect density would be beneficial, it is often cost and time prohibitive. Others found that non-operational testing is related to field quality, and thus there is potentially value in this method. Table 3.12: Data from seven version of GHC Ver. 4.08 Source Mon. Test Test Type Defect KLOC Inst. Cases LOC Sigs. density 99.73 1023 226 1444 11737 0.49 5.00.2 140.67 2789 0 0 16739 0.33 5.02.2 157.84 2526 0 0 22338 0.29 5.04 211.42 6453 76 1749 28678 0.04 5.04.3 205.95 6453 76 1749 27991 0.08 6.01 216.13 7258 76 1749 28600 0.07 67 3.6.5 Upcoming Work Having an early warning system to estimate defect density would aid developers by giving them an indication as to potential problems in the system. Metric that is readily available in any system can be leverage to help provide this defect density estimation. An initial feasibility was performed using a subset of metrics from the STREW-H. 3.7 Comparison between Any Related Measurement Plan Sub paragraph below describe the advantages and disadvantages of the existing HeiTech Measurement Plan V3.0. 3.7.1 Advantages of the HeiTech Measurement Plan i. Defect density measurement was introduced in order to helps identify candidates for additional inspection or testing or for possible reengineering or replacement. ii. 3.7.2 Besides it also help to identify defect prone component. Disadvantages of the HeiTech Measurement Plan i. Although there is measurement for defect density, but the result received seems to be imprecise; ii. This situation happened due to the improper ways practiced in order to calculate the total numbers of LOC. 68 3.7.3 Rework on HeiTech Measurement Plan As for the amendment, PCM team has come out with another latest version of measurement plan which is HeiTech Measurement Plan V4.0. Code counter tools has been introduced in order to improve and standardize the code counting technique. Refer next chapter for further details regarding the amendment. CHAPTER 4 PROJECT METHODOLOGY 4.1 Project Management Information System (PROMISE) Project Management Information System also known as PROMISE is HeiTech Project Management Standard and Methodology. Management process in PROMISE consists of nine knowledge areas which are Integration Management, Scope Management, Time Management, Cost Management, Quality Management, Human Resource Management, Communication Management, Risk Management and Procurement Management. 4.1.1 Quality Management Quality Management knowledge area contains three project processes which are Quality Planning, Quality Assurance and Quality Control. Figure 4.1 shows the process flow for the Quality Control. Quality control involves monitoring specific project results to determine if they comply with relevant quality standards, project performance objectives and identifying ways to eliminate causes of unsatisfactory results. Therefore, it should be perform across the project. 70 Refer to Figure 4.1, the fourth process stated about conduct quality control using measurement tools and techniques. Quality tools and techniques can be used to make sense from quality data, and presenting quality information for the project team to take action. Therefore in the fifth process, Measurement and Analysis Report (for development project) and Measurement and Analysis Report (for maintenance project) are produce according to the measures and objectives in the HeiTech Measurement and Analysis Plan. Figure 4.1: Quality Control Process Summary 71 4.2 Measurement and Analysis Process As shown in Figure 4.2, HeiTech Measurement and Analysis Process is divided into three phases which are planning phase, implementing phase and improving phase. Each phase posses their own activities. Tools involved in this measurement process are explained at the end of this chapter. Figure 4.2: HeiTech Measurement and Analysis Process 4.3 Planning Phase Planning phase consist of identifying scope and define procedure. The sub paragraphs describe details of it. 72 4.3.1 Identify Scope In scope identifying, the specific explanation on why the measurement is needed is clarified. After the needs for measurement established, the activity of identifying the scope begin. This includes identification of objectives that measurement is to support, issues involved in order to meet these objectives, and measures that provide insight into these issues. There are several task required in order to complete the identifying scope activities. Details about the task are explained in Table 4.1. Table 4.1: Identify Scope Activity No Tasks 1. Descriptions Identify the needs for (a) A measurement need could be as broad or focus. measurement. (b) Identify the audiences/users of the measurement data. (E.g. EPG, project manager and process group) (c) All audiences/users should be identified. (Those who will receive the measurement report, those who will make decisions base on the data and those who will provide the data are included) 2. Identify the objective of Ensure that measurement is aligned with and will the organization or project support decision-making with respect to the business that each measurement objectives need is going to address. 3. of the organization and project’s performance objectives. Identify the method that Ensure that there are methods in order to achieve the will be used to achieve the objectives, and the objectives are feasible. objectives stated in previous tasks mentioned. 4. Identify the issues to be These issues should refer the products, processes, 73 No Tasks Descriptions managed, controlled, or and/or resources that are traceable and related to the observed. methods identified in Task 3. Several common issues are size, cost, schedule and quality. 5. each This goal may involve measurement to help Translate issue into individuals understand an issue (i.e. to evaluate, precise, quantifiable. predict, or monitor) or to address a need for measurement improvement (i.e. to increase, reduce, achieve or stabilize). 6. Identify a list questions for These questions should involve products, processes, each goal when improvement etc. Answers to the questions should that, answered, will determine indicate the status of progress toward goals, which will in turn provide insight into the issues. if the goal is achieved. 7. Each question, identify The Identify Scope Activity identifies questions and things to be measured that measures could determine that indicate the organization’s objectives. answers to the questions. progress towards the The scope of the measurement effort is limited to measure necessary for making decisions with respect to the objectives, i.e. useful data. The process continues on to the define Procedures Activity with a set of measures. 4.3.2 Define Procedure After identified the measurement, operational definitions and procedures of measurement process are constructed and documented. There are several tasks to be considered in Define Procedures Activity which are: i. Define each of the identified measures completely, consistently and operationally. 74 ii. Define the collection methods including how the data are to be recorded and stored. iii. Define the analysis techniques to be used to determine status relative to the identified measurement goals. These analysis techniques will most likely evolve and expand through each reporting cycle as the user gains more insight into the data. The next paragraphs describe details and sequence of task needed in order to define the procedure for measurement process purposes. 4.3.2.1 Define the Measures i. Required design of questionnaires, interviewing techniques and others. ii. As part of plans and procedures, for the measurements process, the information is kept and well documented. 4.3.2.2 Define the Counting Procedure i. Required documented the gathering procedure that has been carry out for each measurement. ii. Ensure that all the procedure involved is clearly defined and repeatable; include the source of the data, collection responsibilities, frequency of collection and any special tools required. 75 4.3.2.3 Assemble Forms or Procedure to Record the Measures i. Form created, should include administrative data such as project’s name, date, configuration management data and data collector. ii. Example of forms developed included in the Measurement Analysis Report, which is the Defect Log template. 4.3.2.4 Select and Design Database or Spreadsheet for Storage i. Database or spreadsheet is design to store the information collected. ii. Hence, it also facilitate to store data in the organized an efficient way. iii. If there is any expand on the database, the long-term performance can be monitor. 4.3.2.5 Define Analysis Method i. Identification of any potential or anticipated interactions between various measures. Analytical methods can involve the correlation of related measures using a scatter diagram to plot one measure versus another. ii. Identification of a basic set of warning signals that should arise when the collected data are significantly different from the plan or estimate. This could involve trend analysis of actual versus estimated measures and/or control charts indicating thresholds and warning limits. iii. Illustration or examples of conclusions that can be made from the data report with respect to the goal. 76 4.3.2.6 Define Potential Relationship among the Measures and Construct Measurement Reporting Format i. These formats should include charts and graphs to present the measures and answer questions about each issue/goal. ii. Charts and grafts allow us to monitor the measures in statistical point of view instead of it is presentable. 4.3.2.7 Define Mechanism for Feedback Listed next are the criteria needed in order to define the mechanism for feedback: i. Show how the report is distributed to the difference audience including the distributing list. ii. Identifying the needs for any regular meetings to present measurement reports, including typical agendas, attendances, purposes and outcomes. iii. Specify how the Measurement Analysis Report can be use and effect decisions. iv. Define procedure for receiving feedback from the audience for example marked-up hard copy or formal change request. 4.4 Implementing Phase The following two activities implement the documented measurement procedures. Based on the execution of these activities, the documented procedures for measurement should be reviewed and revised to meet the evolving needs of the project or organization. 77 4.4.1 Data collection Data collection is a process of preparing and gathering data. Clear definition of data items collected helps ensure consistence data. The collection procedures implementation is defined in the Define Procedures Activity. Using the operational definition of the measurement procedures, data are collected, recorded and stored. Data verification and validation must consider both the accuracy of the data as it recorded and the reliability with which it is transmitted. All data should be identified with it collection date and its source, to align data with project event and to correlate data from different sources. Configuration management of data delivery versions and dates should be implemented for electronic data sets. Explain below is common issue regarding data verification. i. Data currency: Does the data relate to the project activities currently underway? Does received data match the schedule? ii. Data aggregation structures and attributes: Are values in the fields for aggregating data records consistent across records? (E.g. defect classifications) Are values in the fields for aggregating data records consistent across project teams or organizations? iii. Units of measures: Are the same units of measure being used across all project teams or organizations, for example, hour versus days and SLOC versus KSLOC? iv. Data item contents: Are any data values outside the acceptable ranges? Is the format of any data item’s values incorrect? (E.g. data type and decimal positions) v. Data completeness: Is needed measurement data provided for each issue? Are data items missing within measurement data? Is project-context data delivered? Is this the data agreed upon? 78 vi. Changes to existing data: Have plan values changed expectedly? (E.g. planned start and end dates) As for the adequacy purposes, data are validated and the procedures are reviewed. Once the data is validated and complied, the data analysis team can further develops a statistical representation of the data such as graphs, chart, tables and diagrams. Meanwhile, any inconsistency regarding the data should be resolved through communication with project team. Issues such as missing data, significant changes in values or changes in the data structure should always be discussed with the project team for complete 4.4.2 Data Analyzing Data analyzing is a process gathering, modeling and transforming data with the goal of stressing out the useful information, suggesting conclusions, and supporting decision-making. Data analyzing activity consist of analyzing the data that need to be documented in a report called HeiTech Measurement & Analysis Report. The report then will be reviewed with Project Steering Committee. Subsequently, analyzed data of each project will be presented in the Measurement & Analysis Report template. Report obtained should be produce according to the measures stated in the project’s Measurement & Analysis Plan. Stated in the next paragraph is the example of the needed analysis. i. Analyze the chart for process stability: There are 2 parts in IMR chart which shown in Figure 4.3 and Figure 4.4. Figure 4.3 shows that the individual data point is plotted with +/-3 sigma control limits (UCL, LCL superimposed). For special cause analysis, events outside the control limits are selected candidates. A fishbone diagram or a Pareto chart is examples of tools that can be used to analyze the event outside the control limits and verify it as a special cause. If it is a special cause, then, the data point can be removed and the chart re-run once the special cause has been identified and understood. The second 79 part is a plot of the moving range of data points that represents the variation in the data. Figure 4.4 shows the indication of improved stability and reduced variation of the process by the narrowing trend. Figure 4.3: Individual Chart Analyze the chart for process capability: Cpk analysis diagrams produce a bell shaped curve with the required upper and lower specification limits (USL, LSL) superimposed. In addition, the Cpk index is calculated. If Cpk is equal to or greater than 1, then the process is considered capable of functioning within the required specifications. If Cpk is less than 1, then the process is more random and it is not capable. Figure 4.4 shows the process that needs improvement. The improvement can be accomplished by adjusting the process so the process average is centered between the required specification limits and Figure 4.5: Cpk Analysis Distribution Curve 12.8 12.3 11.8 11.3 10.8 10.3 9.82 9.33 8.84 8.35 7.86 7.36 6.87 6.38 5.4 5.89 4.91 4.42 3.92 3.43 2.94 2.45 1.96 1.47 0.98 -0 0.49 -1 -0.5 -2 -1.5 -3 -2.5 then by reducing process variation. -3.4 ii. Figure 4.4: Moving Range Chart 80 4.4.3 Result Communicating HeiTech Measurement & Analysis Report will be submitted and presented as part of monthly progress reporting to the Project Steering Committee. Any issues raised which converse in 4.3.1 Data Collection, was discussed through communication with project team. Report contains of analysis of metrics according to the measures planned in the project’s Measurement & Analysis Plan (PMP). Feedback is collected to update or improve the measurement procedures. 4.5 Improving phase Improving phase concerning about process evolving. There are several processes involve in this phase which describes in sub paragraph below. 4.5.1 Process evolving Process evolving used to improve the process and ensure that there is a structure way of incorporating issues and concerns into improvement of the process. The Project Steering Committee receives the measurement reports and makes decisions based on the data. Decision may involve re-planning, corrective action, or simply moving out without changes. Besides, the activities also assess the adequacy of the measurement process itself. As decisions are made, the audience provides feedback on the adequacy of the data available. Additional issues may raised, or indicate that certain data are no longer available. This feedback, in addition to the feedback from the other activities is addressed during the next reporting cycle, by re-identifying and redefining the measurement process. Feedback and changes will be informed to the EPG. EPG then, will manage and improve the feedback and changes. Data analysis team will be collect any changes or 81 feedback necessary to the measurement procedure and evaluate what kind of changes should be made for the next reporting cycle. Managers need to determine if they have adequate information to evaluate progress with respect to their goals. Moreover they also need to evaluate whether the procedure defined are adequate. 4.6 Measurement Tools There are several classes of tools used in order to carry out the measurement process. Details about selected tools show in next sub paragraphs. 4.6.1 Classes of Tools and its Support Functions Table 4.2 shows the classes of tools and its support function that involves in the measurement process generally and defect density specifically. Table 4.2: Type of Tools and its Support Functions Type of Tool Support Functions Database, graphing and Store and manage measurement reporting data to graphical and text-base reports produce. Statistical analysis Provide enhanced analytical such as regression capabilities. Data Collection Automatically extract measurement data from elements of the engineering process. 82 4.6.2 Guidelines for Measurement Support Tools There are several guidelines that have been introduced in HeiTech Measurement and Analysis Plan in turn to select the measurement tools: i. Select tools that can be customized to meet specific project needs. ii. Evaluate tools that are already available within the organization. iii. When selecting engineering tools, consider their ability to provide data about the engineering tasks and products they support. iv. Do not tailor a measurement program based solely on the availability of data from existing tools. v. Select tools that automate the mechanics of data-handling efforts such as collecting, processing, analyzing, exporting/importing, and reporting. vi. Select tools that run on a common platform. CHAPTER 5 PROJECT DISCUSSIONS 5.1 Defect Log Documentation As point out in Chapter 3, Defect classification is importunate in order to produce the measurement and analysis report. The standardization of defect is needed so that an effective measurement could be done. Before the baseline, defect types discovered are recorded in one list. This list also known as Defect Log V3.0. However after the amendment, defect discovered through projects developed has been recorded in new list which is known as Defect Log V4.0. Defect type discovered has been group base on their main function. In general, defect type can be divided into two types, which are review defect type and code defect type. Review defect is defects that found during the documentation review process and code defect is defects that found in the programming. Refer Appendix B for defect classifications list. Defect classification was done base on the list of checklist provide by the EPG team. From those checklist type of defect that occurs are extracted out and recorded in a list. Examples of the checklist used in HeiTech are as shown below: i. Business or System Requirement Specification Review Checklist ii. Deployment Readiness Review Checklist iii. Software Quality Assurance Review Checklist 84 iv. System Design Description Review Checklist v. Test Report Review Checklist Therefore, if the defects are about completeness of the documentation such as SRS or SDD it will definitely fall into review defect category. Nevertheless, if the defects discovered are about coding completeness or coding error, it will fall into code and testing defects. 5.2 Establishing Measurement and Analysis As describe in Chapter 4, there are several phases involve in turn to establish the measurement and analysis process. In next sub paragraph, details about the process conducted are explained. 5.2.1 Planning Phase In planning phase, objective for the defect density measurement has been emphasize which are stated below in Table 5.1 below. Table 5.1: Defect Density Objectives Defect Classification Code and Testing Defects Review Defect Objectives § Defect Density observed in Unit Test – 10/KLOC § Defect Density observed in System Test – 7/KLOC § Document review defect density is managed at 0.1 (10 defects per pages) § Defect density observed in Code Review is 15 per thousand line of code 85 5.2.2 Implementing Phase Figure 5.1 shows the summary of the activities perform during implementing phase which is data collection, data analyzing and result communicating. Data Collection, Data Analyzing & Result Communicating Implement current measurement plan & procedure Collect feedback to update plans & procedure for the next reporting cycle Figure 5.1: Summary of Activities Perform During Implementing Phase 5.2.2.1 Defect Density Data Collection Basically, there are three projects that involve in data collecting that will be used for the defect density measurement. The projects are BHEUU-CSMBBG, BHEUU- SKHD and iProcurement-Vendor Management. Since there are several constrains in data collection, the defect density measurement is stress out on the testing phase generally and UAT level specifically. Programming language used by these three selected project is Java. 86 Shown in Table 5.2 was data requested for defect that discovered during testing phase. Data received then, validated and compiled. Next, they are recorded in the Excel file accordingly and result is shown in Table 5.2. Further statistical are done in data analyzing process which described in the next sub-paragraph. Table 5.2: Defect Density Data Collection 5.2.2.2 Defect Density Data Analyzing In defect density data analyzing, all data collected is put together in the control chart as one of the statistical techniques discussed in Chapter 3. Figure 5.2 shows the chart developed after the defect density calculation has been done. 87 Control Chart - Defect Density During UAT 0.00150 Defect D ensity Actual Data Mean 0.00100 3 Sigma-UCL 0.00084 0.00069 0.00050 2 Sigma-UCL 2 Sigma-LCL 0.00006 0.00000 -0.00050 3 Sigma-LCL eProcurement-Vendor Management 1 Sigma-UCL BHEUU-SKHD BHEUU-CSMBBG 1 Sigma-LCL Module Figure 5.2: Defect Density during UAT Needed analysis has been carried out that is analysis the chart for process stability. Figure 5.2 shows the plotted individual data point with +/-3 sigma control limits. Its obviously show that all the three project stay within the control limit. This shows that those projects are still in control. The pattern forming can not really be predicted because of the small number of projects. However, the sudden change within the zone can be conclude happen between iProcurement-Vendor Management and BHEUU-SKHD. Analysis for process capability is not applicable at the moment because there are not enough data for it. However, the chart did shows that the process lies between upper and lower control limits and it is under control. 5.2.2.3 Defect Density Result Communicating Basically, in this process all the defect density result and analysis is put in an Excel sheet together with other measurement that finally formed in the HeiTech Measurement and Analysis Report. As for organizational measurement of defect density data was recorded in file name Organizational Data-Defect V2.0. The complete report usually submitted and presented as part of the monthly progress reporting to the Project Steering Committee. 88 5.2.3 Project Constraints There is several constraints face in turn to accomplish the organizational measurement of defect density. The constraints are as follow: i. Incomplete Data: This is one of the major problems facing by the EPG group in order to come up with one solid result and analysis. Data collected usually incomplete. For example code defect in testing level, usually the data received are just for UAT, there are zero data for Unit Testing and System Testing. ii. Specific Tool to Count SLOC: In order to get the solid result between data compared (defect density). System size measured need to be counted using specific tools so that the result produce is more accurate. iii. Limited Amount of Time: Because of the previous problem, it effected also time. Measurement can not be done efficiently and effectively because the pending data. Moreover, in order to carry out organizational measurement its need one complete data in each of the project so that the pattern form could be seen and if the same problem occurred next time it could be predicted easily. CHAPTER 6 CONCLUSION 6.1 Towards Capability Maturity Model Integration Level 5 in HeiTech Since HeiTech has been certified in CMMI level 3, hard works need to be put in order to gain the CMMI level 5 which is part from their plan. Currently, the focus is still on enhancing and maintaining the remaining process, while on the other hands, proactive action also need to be taken, especially on the establishment of the Measurement and Analysis process, together with the Causal Analysis & Resolution and Defect Prevention. 6.2 Training Experience Five month training had been completed in HeiTech Padu Berhad and there were so many experiences gained during the training duration. Specifically, there was a lot of knowledge on how measurement and analysis activities are conducted in the organization and experiences gained during the working period with the EPG team in the Practices & Compliance Management Department. Opportunity had been given by attending every session of meeting with the consultant that has been consulted by the EPG team on the CMMI process. Practically, every problems faced were discussed along with possible corrective actions to be taken. 90 Beside that, HeiTech had also introduced the IBM Rational Portfolio Manager tool to all project managers and project teams. Again, the opportunity had been given upon attending this interesting training. These tools enabled project managers and project teams to work efficiently. However, several sessions of training need to be done or conducted before they can put all their hands on this new tool. Next, it was the Measurement and Analysis Workshop given by ECCI consultant. The workshop attended was really helpful since defect density is known as one of metrics in the Measurement and Analysis process area. It was a one-day workshop that talked about Metrics and Measures, Measurement and Analysis Process, Measurement and Analysis Setup, SPC and Statistical Technique, SPC Process and Technique and Data Analysis. A day after the Measurement and Analysis Workshop, there was another workshop on Causal Analysis & Resolution (CAR) and Defect Prevention. That was the continuation of the previous workshop. In this workshop, it discussed about the CAR Process, Interpretation and Setup. Besides, they also introduced and briefed on the level of CAR and Defect Prevention. It was when better analysis techniques base on defect or problem occurred were introduced to all participants. There are several methods and techniques that had been introduced, such as Pareto Chart, Control Chart, Pie Chart, Bar Graph and SWOT Analysis. The opportunity to join one of the presentations on how to find a root caused from recurring problems faced in project or organization during the workshop attended ended up very well because it gave each participant the opportunity to use the analysis techniques that had been introduced earlier. 6.3 Recommendation Since there are no specific tools used in the organization in order to get the count for system size. Therefore, result received from different project could not be analyze because each programmers have different styles of counting or measure their system size. A recommendation has been made regarding code counter tools. There are five code 91 counter tools introduced. All of them contain their own feature list of recommended tools are as listed below: i. GeroneSoft Code Counter Pro V1.32 ii. LOC Counter iii. LocMetrics iv. Krakatau Essential Project Management v. Code Counter 3 Among all of those software, GeroneSoft Code Counter Pro V1.32 is recommended the best because the capability of it to support multiple programming language. Further details about all the software features are described in Chapter 3. 6.4 Conclusion Gaining CMMI Level 5 certification requires a lot of hard work and cooperation from everybody in the projects or organization. Even after several workshop attended, there are still couple of team members from the project did not have any awareness or knowledge about CMMI. The same thing can be said about Measurement and Analysis process. The thing is, sooner or later they need to know about it anyway. It is not just about the standard, it is about the guide that will keep everybody on proper track where there are less risk to be taken later. REFERENCES 1. William A. Florac, Robert E. Park, Anta D. Carleton. Practical Software Measurement: Measuring for Process Management and Improvement. Software Engineering Institute. April 1997 2. Stevenson, T.C.: Software Engineering Productivity. London, etc.: Chapman & Hall. 1997 3. HeiTech Measurement and Analysis Plan V-4.0. HeiTech Padu Berhad. December 2008 4. Behrooz Parhami. Defect, Fault, Error,…,or Failure?. IEEE Transaction on Reliability, Vol 46, No 4. December 1997 5. Dave Zubrow. Measurement with a Focus: Goal-Driven Software Measurement. Software Engineering Institute. September 1998 6. Project Management Information System (PROMISE) http://www.ipractices.heitech.com.my/promise 7. Nachiappan Nagappan, Thomas Ball. Use of Relative Code Churn Measures to Predict System Defect Density. St Louis, Mo, USA. May 2005 8. Les Hatton. Estimating source lines of code from object code: Windows and Embedded Control Systems. University of Kingston. August 2005 93 9. Yashwant K. Malaiya, Jason Denton. Estimating Defect Density Using Test Coverage. Colorado State University, Fort Collins. 2005 10. Sandro Mrasca. Software Measurement. Università dell'Insubria, Coma Italy. 1995 11. Nachiappan Nagappan, Thomas Ball. Static Analysis Tools as Early Indicators of Pre-Release Defect Density. St Louis, MO, USA. 2005 12. Brad Clark, Dave Zubrow. How Good is the Software: A Review of Defect Prediction Techniques. Carnegie Mellon University. 2001 13. CMM Level 4 Quantitative Analysis and Defect Prevention With Project Examples http://www.mitre.org/work/tech_papers/tech_papers_00/florence_cmm_level/flore nce_cmm_level4.pdf 14. Cem Kaner, Walter P. Bond. Software Engineering Metrics: What Do They Measure and How Do We Know?. 10th International Software Metrics Symposium. 2004 15. Quantitative Quality Management through Defect Prediction and Statistical Process Control http://www.cse.iitk.ac.in/jalote/papers/2WCSQPaper.pdf 16. Practical Software Measurement: Measuring for Process Management and Improvement http://www.sei.cmu.edu/pub/documents/97.reports/pdf/97hb003.pdf 17. 12 Steps to Useful Software Metrics http://www.westfallteam.com/software_metrics,_measurement_&_analytical_meth ods.htm 94 18. Requirements Volatility and Defect Density http://www.cs.colostate.edu/~malaiya/reqvol.pdf 19. “Why is the defect density curve U-shaped with component size?” http://www.leshatton.org/Documents/Ubend_IS697.pdf 20. William A. Florac. Software Quality Measurement: A Framework for Counting Problems and Defects. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania. September 1992 21. Defect Density Estimation Through Verification and Validation http://agile.csc.ncsu.edu/devcop/papers/Sherriff_HCSS.pdf 95 APPENDIX A PROJECT PLAN Industrial Attachment Schedule Organizational Measurement Analysis of Defect Density 96 APPENDIX B DEFECT CLASSIFICATIONS Code and Testing Defect No. Defect Type Description 1. Standards Structure of the program developed is not pursue with the expected organization standard. Initiate in unit testing level. 2. Logic error The logic is incorrect, unclear, or incomplete. Initiate in unit testing level. 3. Computation error A calculation is incorrect, unclear, or incomplete. Initiate in unit testing level. 4. Code optimization Concerning techniques used in order to produce an optimal system (e.g. reduce total memory usage and reduce total instruction path length to complete the program). Initiate in system testing level. 5. Error handling There is no error handling found in the applications. Basically it refers to the anticipation, detection, and resolution of programming, application, and communications errors. Initiate in unit testing level. 6. Memory handling Memory leak detected due to no memory handling. Initiate in unit testing and system testing level. 7. Security System developed is not secure. The vulnerabilities of the system are not identified. Initiate in unit testing, system testing and acceptance testing level. 8. Traceability There is no relationship established between two or more products of the development process. Initiate in system testing level. No. Defect Type Description 9. Testability Program developed is testable in terms of principles, techniques and the documentation regarding software testing. Initiate in unit testing level. 10. Functional specification Coded per the functional specifications, but the function describe in the specification is incorrect, unclear, or incomplete. Initiate in unit testing, system testing and acceptance testing level. 11. Design Coded per the design specs, but the design is incorrect, unclear or incomplete. Initiate in unit testing level. 12. Architecture Selected architectural components do not work together as expected or planned. Initiate in unit testing level. 13. Interface Coded per specification, but the human interface (e.g. screens, reports, input documents) defined in the specifications is incorrect, unclear, or incomplete. Initiate in unit testing, system testing and acceptance testing level. 14. Database design Unexpected results achieved due to the improper database design (e.g. incorrect primary key definition, incorrect data type). Initiate in system testing level. Review Defect No. Defect type Descriptions 1. Structure of the documentation is not pursue with the Standards expected organization standard. In order to carry out planning, requirement, design and code walkthrough review, those phase need to follow all expected standard. Explicit technical criteria, methods, process and practices are anticipated. 2. Completeness Document covered partial of the significant type of the quality requirements, standards or data. Project planning review: Project scope or appropriate method to complete the project is not explicitly defined. Requirement review: Partial of the requirement missing or did not meet the expected requirement (out of the scope). Design review: Architecture, components, modules, interfaces and data are partially defined. Code walkthrough review: Partial of the requirement met, incomplete coding and all associated documents. 3. Consistency Document is not written in consistence, clear and concise language or format. In order to carry out planning, requirement, design and code walkthrough review, those phase need to be in consistence structure of documentation in terms of the clarity of the No. Defect type Descriptions language and format used. 4. Correctness Document consists of sub defect as explain below considered as correctness defect. 5. Ambiguous Multiple meaning in one term discovered. A word, term, notation, sign, symbol, phrase, sentence or any other form used in writing is interpreted in more than one way. For example the word “light” can mean not very heavy or not very dark. 6. Syntax Concerning the grammatical rules of a language and the way in which words are arranged to form phrases and sentences. Syntax concerning about grammatical arrangement of words in a sentence. It is primarily concern with structure of sentences. For example words used should have a relationship between each other in order to come out with a meaningful sentence. 7. Semantic Due to the incorrect logical statement found in the documentation (e.g. algorithms, calculations etc). A semantic error is a violation of the rules of meaning of a natural language. For example “Software Requirement Specification (SRS) describes the test preparations, test cases, and test procedures to be used to perform qualification testing of a system”. 8. Comprehension Document is not written in comprehend language and barely understand. No. Defect type Descriptions Comprehension concern on how the sentences are put together in terms of understandability. Sentences written should be in simple and clear structure. 9. Configuration A proper management for the documentation in terms of convenient process. Documentation should be traceable and linked (document control). A management of documents to a much higher degree of reliability for security, version, visibility, availability and most importantly controlled reliable audit trail. 10. Requirement Requirement should be clear, specific with no uncertainty, measurable in terms of specific values, testable and complete without any contradictions. Defect requirements refer to requirement that is define with insufficient details in order to code and test it. Besides, if it is accurately define, complete test cases can be written base from it. 11. Maintainability Document written is unmodified in term of flexibility and restructuring. Maintainability concern on how feasible the documentation is written in terms of modification. For example if certain requirement need to be change, it will not effect the whole changes in the documentation. 97 APPENDIX C PROGRESS REPORT CENTRE FOR ADVANCE SOFTWARE ENGINEERING (CASE) UNIVERSITI TEKNOLOGI MALAYSIA Prepared For: Mohd Ridzuan Ahmad Of Progress Report Industrial Attachment 2 (13th October 2008 – 13th March 2009) Prepared By: Nurdatillah Bt. Hasim (MC071313) MSC SOFTWARE ENGINEERING REAL TIME (RTSE) Academic Mentor Industrial Mentor Lenny Ruby Atai Mohd Ridzuan Ahmad HeiTech Padu Berhad Level 12, HeiTech Village, Persiaran Kewajipan, USJ 1, UEP Subang Jaya, P.O Box 47509 Subang Jaya, Selangor Darul Ehsan, Malaysia. Phone No: 016-3067182 Fax No: 03-80243140 OCTOBER 2008 Week 1 Date/Day Activities 13 October 2008 Monday § Registration at HeiTech Padu Berhad. 14th October 2008 Tuesday § Study on existing standard of measurement analysis plan. § Study on existing standard of measurement analysis report. th 15th October 2008 Wednesday 16th October 2008 Thursday § Search any relevant journal on the internet regarding software metric. § Do some revision on related topic. 17th October 2008 Friday § Continue revision. Notes Signature Refer to Heitech Application Development Methodology and Standard in ADVISE (Application Development Information System) Week 2 Date/Day Activities 20th October 2008 Monday § Received project title. § Study the problem given. Project title: Organizational Measurement of Defect Density. Notes 21st October 2008 Tuesday nd 22 October 2008 Wednesday 23rd October 2008 Thursday 24th October 2008 Friday § Start structuring the Gantt chart. § Study on defect density. § Search any relevant journal on the internet regarding defect density. Start writing for the proposal. Signature Week 3 Date/Day 27th October 2008 Monday th 28 October 2008 Tuesday 29th October 2008 Wednesday th 30 October 2008 Thursday 31st October 2008 Friday Activities Notes PUBLIC HOLIDAY Professional Talk at CASE Professional Talk at CASE § Writing the proposal § Review the proposal § Submit the proposal Signature NOVEMBER 2008 Week 1 Date/Day Activities 3rd November 2008 Monday 4th November 2008 Tuesday § Study on common defect type discovered during pre-deployment and post deployment review. 5th November 2008 Wednesday § Submit to the supervisor for review purpose. 6th November 2008 Thursday 7th November 2008 Friday § Do a correction and several changes on the reviewed document. Notes Prepare a documentation consist of defect type classification. Signature Week 2 Date/Day th Activities 10 November 2008 Monday § Meeting consultant. 11th November 2008 Tuesday 12th November 2008 Wednesday 13th November 2008 Thursday 14th November 2008 Friday § Thesis writing. § Meeting consultant. with with Notes Signature the Topic discussed: CMMI Level 3 Process Review Literature Analysis phase review: the Topic discussed: CMMI Level 3 Process Review Things to be point out: - Defect type classification need to be more detail (should consider sub defect type) Week 3 Date/Day Activities 17th November 2008 Monday th 18 November 2008 Tuesday § Edit the document regarding defect classification from requirement issued on the last week meeting. 19th November 2008 Wednesday § Submit the document regarding defect classification to the supervisor for review. § Thesis writing. Literature Analysis phase 20th November 2008 Thursday st 21 November 2008 Friday Notes Sub defect type has been add on; Signature Defect description has been rephrased; Several examples have been included for better understanding. review: Week 4 Date/Day Activities 24th November 2008 Monday 25th November 2008 Tuesday 26th November 2008 Wednesday § Thesis writing. 27th November 2008 Thursday 28th November 2008 Friday § Thesis writing. § Meeting consultant. with Notes Literature Analysis phase Signature review: the Topic discussed: CMMI Level 4 and 5 Process Definition Literature Analysis phase review: DECEMBER 2008 Week 1 Date/Day 1st December 2008 Monday Activities § Study on tools available for line of code counting. 2nd December 2008 Tuesday 3rd December 2008 Wednesday 4th December 2008 Thursday 5th December 2008 Friday Notes Signature § Update the thesis on Chapter 3: Literature Review § LEAVE § Unit meeting. § Discussed more on the objective of the Practices & Compliances Management team § Update the thesis on Chapter 3: Literature Review § Study on tools available § Update the thesis on for line of code counting. Chapter 3: Literature Review Week 2 Date/Day 8th December 2008 Monday 9th December 2008 Tuesday 10th December 2008 Wednesday 11th December 2008 Thursday th 12 December 2008 Friday Activities Notes AIDIL ADHA LEAVE SULTAN SELANGOR’S BIRTHDAY § Study on tools available § for line of code counting. Found several tools that can be considered for Heitech practices Signature Week 3 Date/Day 15 December 2008 Monday § 16th December 2008 Tuesday § 17th December 2008 Wednesday th 18 December 2008 Thursday 19th December 2008 Friday § th Activities Update on the ADVISE site base on new template that has been developed. PAT Meeting (member on process improvement updates) Update on the ADVISE site base on new template that has been developed. Notes § Signature Update on link and content of the ADVISE site Week 4 Date/Day 22nd December 2008 Monday 23rd December 2008 Tuesday 24th December 2008 Wednesday 25th December 2008 Thursday 26th December 2008 Friday Activities Notes Signature LEAVE EMERGENCY LEAVE (Auntie’s funeral) CHIRSTMAS LEAVE Week 5 Date/Day 30 December 2008 Monday 31st December 2008 Tuesday th § Activities Notes EMERGENCY LEAVE (Grandmother’s funeral) Study on tools available § Update the thesis on for line of code counting. Chapter 3: Literature Review Signature JANUARY 2009 Week 1 Date/Day 1 January 2009 Thursday 2nd January 2009 Friday st Activities Notes Signature NEW YEAR § Study on the methodology involved in order to conduct measurement and analysis process § Revise on the process management flow Activities § Study on the methodology involved in order to conduct measurement and analysis process Notes § Revise on the defect prevention flow § Update the thesis on Chapter 4: Project Methodology Signature Activities § Measurement and analysis process (Implementing) Notes § Update the thesis on Chapter 4: Project Methodology Signature Week 2 Date/Day 5th January 2009 Monday th 6 January 2009 Tuesday 7th December 2009 Wednesday th 8 December 2009 Thursday 9th December 2009 Friday Week 3 Date/Day 12th January 2009 Monday 13th January 2009 Tuesday 14th January 2009 Wednesday 15th January 2009 Thursday th 16 January 2009 Friday MEDICAL LEAVE § Measurement and analysis process (Implementing) § Update the thesis on Chapter 4: Project Methodology Week 4 Date/Day 19th January 2009 Monday 20th January 2009 Tuesday st 21 January 2009 Wednesday 22nd January 2009 Thursday rd 23 January 2009 Friday Activities Notes Preparation for the IBM Rational Portfolio Manager Training Signature IBM Rational Portfolio Manager Training (Administrator) IBM Rational Portfolio Manager Training (Project Manager) Week 5 Date/Day 26th January 2009 Monday 27th January 2009 Tuesday 28th January 2009 Wednesday 29th January 2009 Thursday 30th January 2009 Friday Activities Notes CHINESE NEW YEAR § Measurement and analysis process (Implementing) § Update the thesis on Chapter 4: Project Methodology Signature FEBRUARY 2009 Week 1 Date/Day 2 February 2009 Monday 3rd February 2009 Tuesday nd 4th February 2009 Wednesday th 5 February 2009 Thursday 6th February 2009 Friday Activities § Study on Schedule Variance (SV) and Earn Value (EV) calculation § Create Report template for SV and EV calculation Notes Signature § Using IBM RPM Tools Thesis chapter completion Week 2 Date/Day 9th February 2009 Monday 10th February 2009 Tuesday 11th December 2009 Wednesday th 12 December 2009 Thursday 13th December 2009 Friday Activities § Study on Schedule Variance (SV) and Earn Value (EV) calculation § Create Report template for SV and EV calculation § Action plan meeting § Study on Schedule Variance (SV) and Earn Value (EV) calculation Notes § Using IBM RPM Tools Signature § Preparation for IBM RPM project team training § Using IBM RPM Tools § Create Report template for SV and EV calculation Week 3 Date/Day 16th February 2009 Monday 17th February 2009 Tuesday Activities Notes Thesis chapter completion Signature Date/Day 18th February 2009 Wednesday Activities § IBM RPM Tools Training (Batch 1) 19th February 2009 Thursday 20th February 2009 Friday § Update report template on SV and EV Notes § Timesheet management: How to fill up timesheet using IBM RPM § Work progress reporting: Communication via IBM RPM § Using IBM RPM Tools Signature Week 4 Date/Day 23rd February 2009 Monday th 24 February 2009 Tuesday 25th February 2009 Wednesday th 26 February 2009 Thursday 27th February 2009 Friday Activities Notes Thesis chapter completion MEDICAL LEAVE Thesis chapter completion Signature MARCH 2009 Week 1 Date/Day 2 March 2009 Monday 3rd March 2009 Tuesday 4th March 2009 Wednesday 5th March 2009 Thursday th 6 March 2009 Friday Activities Notes Signature nd EMERGENCY LEAVE Thesis chapter completion Week 2 Date/Day 9th March 2009 Monday 10th March 2009 Tuesday 11th March 2009 Wednesday 12th March 2009 Thursday th 13 March 2009 Friday Activities Notes PROPHET MUHAMMAD’S BIRTHDAY Measurement & Analysis Workshop Causal Analysis & Resolution and Defect Prevention Workshop Thesis chapter completion Signature 98 APPENDIX D USER MANUAL GeroneSoft Code Counter Pro V1.32 Tutorial Quick Run Through It's easy to use the program, just follow these simple steps: 1. Run the program. You should see the setup screen. 2. In the "Source Code Folder" text field, add the folder(s) where your code is located (or click on the "Select" button to browse for the folder) shown in Figure 1. Figure 3: Open Folder and Locate the Selected Fie 3. Select the types of files to count in the list box on the right side of the window. You may select more than one type but you should select at least one. 4. Click on the "Start Count" button to start counting. The Results screen should then appear after a few moments. You may also click on the "Results" tab to see a running count. 5. View the "Results" tab. Other things you may want to do from here: 1. To change options for the count, go to the "Setup" tab and select the "Options" subtab. 2. To sort the results, click on the corresponding column heading. Clicking on the heading multiple times toggles the sort order (ascending or descending order). 3. To view individual files, double click the row and a syntax-highlighted view of your source code will be shown. 4. To copy the results to the clipboard, just press "CTRL-C" or select "Edit->Paste to Clipboard" on the main menu. You may then paste the clipboard contents in your favorite spreadsheet (like MS Excel). 5. To search for certain files, just presses "CTRL-F" and a find toolbar will appear at the bottom of the results. 6. To group results by file type/extension, right click on the results grid and check the "Group by File Type" option. 7. To print the results, click on the "Print" button. You can set your default settings and preview the printout before actual printing. 8. To save the results, just press the "Save" button. The results table will be saved in the formal you select (XML, CSV, XLS or HTML). Loading and Saving Profiles: 1. You may want to save the current settings of the program (source folder, file types to count and other info) so that you don't need to enter the details every time you wish to count. This is useful if you want to have checkpoint counts of the same files for every release or so. 2. To save the current profile, just enter your desired settings and click on the "Save" button on the Setup screen. You will be asked for the profile name to use. 3. To load a previously saved profile, just click on the "Load" button and select the profile to use. Settings of the program will be automatically set to the profile you load. Creating your own file types: 1. If you want to count a certain file but it isn't supported by the program out of the box, you can create your own file type/comment style and allow it to properly count those files. 2. To create your own file type, simply go to the "Config" tab on the main screen. Add a new extension by clicking on the "Add..."button in the "Extension to Modify" list. Enter the new extension (i.e. "taz" without the quotes). 3. Now, edit the file description found on the right hand side of the screen (i.e. "Twisted Abnormal Zifro"). 4. In v1.28 and above, you can also specify a column limit value. This value is the column number where any characters in the file after this column are discarded. (This is useful for Fortran or PL/I programs that have no usable characters after column 72 for example).If there is no such limit, just set the value to zero (0). 5. If the file uses the standard comment types used in most programs, simply check the comment(s)it uses and then click on the "Save" button to save your new file type. 6. If your file uses a "non-standard" comment type, simply click on the "New..."button in the comment styles area and enter the description of the comment (i.e. "Dot-Dash-Dot Comment"). Finally, enter the type of comment (single line or multiple line block comment), and the characters the comment uses (i.e. start:"-.-"). 7. If the comment characters need to appear on a certain column (i.e for COBOL or Fortran programs), specify the column number in the "Fixed Column?" field. If comment characters can be anywhere, just set this field to zero (0). 8. Again, save your changes/settings by clicking on the "Save..."button. The new extension will now appear in the master list of extension types -You're now ready to count "taz" files! 9. Note: As of v1.32, there is now support for counting files without an extension (i.e. "configure" or "readme"). To do this, you need to set the extension name/type to "<none>"(without the quotes). This is a special name that will be recognized by the software to match all files without an extension. Command Line Usage Starting with version 1.2,Code Counter Pro can now count lines automatically through a basic command line interface. Usage of the program is as follows: ccounter.exe -auto <profile file><output file> Where: auto This is needed to inform the program to run in automatic mode. If not specified, the rest of the parameters are ignored and the program is run in normal GUI mode. <profile file> This is the filename of the profile you wish to use for this count process. (You need to make this file by running the full GUI version of the program first and saving your profiles). <output file> The CSV output file you wish to generate. If any of these parameters are missing, the program will generate an error message. Important notes: 1. 2. 3. The program does not run fully in Console mode so the Windows system should still be available when the program is run.(It only skips the user interface interactions and runs the program automatically). By default, the <profile file>and <output file>use the directory where the “ccounter” executable is run. If your profiles or output files should be in separate directories, you need to specify the full paths. The program will not check if the output file is already existing or not and will simply overwrite the file you provide. Please make sure that your output file is correct. File Exclusion List The file exclusion list is a new feature starting from v1.24 of the software. 1. In essence, this feature allows more fine-grained exclusion of certain files from the whole list of files found. For example, you may have configured the software to count all *.pas files but want to exclude *.pas files that start with a certain prefix like "common_". 2. In this case, you can specify a regular expression to exclude these types of files by entering it in the "Exclude" field. For example, in this case the value in the field should be "common_.*\.pas".(In English, this means, do not include files that start with "common_" and has the extension of "pas". 3. Please note that the exclusion field will match the whole filename -including the full directory path (not just the filename itself). This allows you to also filter out directory paths that you don't want included. 4. For a more detailed reference on how regular expressions work, please see this Regular Expression Reference. Regular Expression Reference Simple matches Any single character matches itself, unless it is a metacharacter with a special meaning described below. A series of characters matches that series of characters in the target string, so the pattern "big" would match "big'' in the target string. You can cause characters that normally function as metacharacters or escape sequences to be interpreted literally by 'escaping' them by preceding them with a backslash "\", for instance:metacharacter "^"match "^","\\"match "\"and so on. beginning of string, but "\^"match character Examples: foobar Matches string 'foobar' \^FooBarPtr Matches '^FooBarPtr' Escape sequences Characters may be specified using a escape sequences syntax much like that used in C and Perl:"\n'' matches a newline,"\t'' a tab, etc. More generally,\xnn, where nn is a string of hexadecimal digits, matches the character whose ASCII value is nn. If you need wide (Unicode) character code, you can use '\x {nnnn}', where 'nnnn'-one or more hexadecimal digits. \xnn \x{nnnn} \t \n \r \f \a \e char with hex code nn char with hex code nnnn (one byte for plain text and two bytes for Unicode) tab (HT/TAB),same as \x09 newline (NL),same as \x0a car.return (CR),same as \x0d form feed (FF),same as \x0c alarm (bell)(BEL),same as \x07 escape (ESC),same as \x1b Examples: foo\x20bar \tfoobar Matches 'foo bar' (note space in the middle) Matches 'foobar' predefined by tab Character classes You can specify a character class, by enclosing a list of characters in [], which will match any one character from the list. If the first character after the "[''is "^'', the class matches any character not in the list. Examples: foob[aeiou]r Finds strings 'foobar', 'foober' etc. but not 'foobbr', 'foobcr' etc. foob[^aeiou]r Finds strings 'foobbr', 'foobcr' etc. but not 'foobar', 'foober' etc. Within a list, the "-'' character is used to specify a range, so that a-z represents all characters between "a'' and "z'', inclusive. If you want "-'' itself to be a member of a class, put it at the start or end of the list, or escape it with a backslash. If you want ']' you may place it at the start of list or escape it with a backslash. Examples: [-az] [az-] [a\-z] [a-z] [\n-\x0D] [\d-t] []-a] Matches 'a', 'z' and '-' Matches 'a', 'z' and '-' Matches 'a', 'z' and '-' Matches all twenty six small characters from 'a' to 'z' Matches any of #10,#11,#12,#13. Matches any digit,'-'or 't'. Matches any char from ']'..'a'. Metacharacters Metacharacters are special characters, which are the essence of Regular Expressions. There are different types of metacharacters, described below. Metacharacters - line separators ^ $ \A \Z . start of line end of line start of text end of text any character in line Examples: ^foobar foobar$ ^foobar$ foob.r Matches string 'foobar' only if it's at the beginning of line Matches string 'foobar' only if it's at the end of line Matches string 'foobar' only if it's the only string in line Matches strings like 'foobar','foobbr','foob1r'and so on The "^" metacharacter by default is only guaranteed to match at the beginning of the input string/text,the "$" metacharacter only at the end. Embedded line separators will not be matched by "^'' or "$''. You may however, wish to treat a string as a multi-line buffer, such that the "^'' will match after any line separator within the string, and "$'' will match before any line separator. You can do this by switching on the modifier /m. The \A and \Z are just like "^'' and "$'', except that they won't match multiple times when the modifier /m is used, while "^'' and "$'' will match at every internal line separator. The ".'' metacharacter by default matches any character, but if you switch off the modifier /s, then '.' won't match embedded line separators. Regular expressions work with line separators as recommended at www.unicode.org (http://www.unicode.org/unicode/reports/tr18/): "^" is at the beginning of a input string, and, if modifier /m is On, also immediately following any occurrence of \x0D\x0A or \x0A or \x0D (if you are using Unicode version of TRegExpr, then also \ or \x0B or \x0C or \x85). Note that there is no empty line within the sequence \x0D\x0A. "$" is at the end of a input string, and, if modifier /m is On, also immediately preceding any occurrence of \x0D\x0A or \x0A or \x0D (if You are using Unicode version of TRegExpr, then also \x2028 or \x2029 or \x0B or \x0C or \x85).Note that there is no empty line within the sequence \x0D\x0A. "." matches any character, but if You switch Off modifier /s then "." doesn't match \x0D\x0A and \x0A and \x0D (if You are using Unicode version of TRegExpr, then also \x2028 and \x2029 and \x0B and \x0C and \x85). Note that "^.*$" (an empty line pattern) does not match the empty string within the sequence \x0D\x0A,but matches the empty string within the sequence \x0A\x0D. Multilane processing can be easily tuned for Your own purpose with help of TRegExpr properties LineSeparators and LinePairedSeparator, you can use only Unix style separators \n or only DOS/Windows style \r\n or mix them together (as described above and used by default) or define Your own line separators! Metacharacters - predefined classes \w \W \d \D \s \S An alphanumeric character (including "_") A nonalphanumeric A numeric character A non-numeric Any space (same as [\t\n\r\f]) A non space You may use, and within custom character classes. Examples: foob\dr foob[\w\s]r Matches strings like 'foob1r',''foob6r'and so on but not 'foobar', 'foobbr' Matches strings like 'foobar','foob r','foobbr'and so on but not 'foob1r','foob=r' TRegExpr uses properties SpaceChars and WordChars to define character classes \w,\W,\s,\S, so you can easily redefine it. Metacharacters - word boundaries \b \B Match a word boundary Match a non - (word boundary) A word boundary (\b) is a spot between two characters that has a one side of it and a on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a. Metacharacters - iterators Any item of a regular expression may be followed by another type of metacharacters iterators. Using this metacharacters you can specify number of occurrences of previous character, metacharacter or sub-expression. * + ? {n} {n,} {n,m} {n}? {n,}? {n,m}? zero or more ("greedy"),similar to {0,} one or more ("greedy"),similar to {1,} zero or one ("greedy"),similar to {0,1} exactly n times ("greedy") at least n times ("greedy") at least n but not more than m times ("greedy") exactly n times ("non-greedy") at least n times ("non-greedy") at least n but not more than m times ("non-greedy") So, digits in curly brackets of the form {n,m},specify the minimum number of times to match the item n and the maximum m. The form {n} is equivalent to {n,n} and matches exactly n times. The form {n,} matches n or more times. There is no limit to the size of n or m, but large numbers will chew up more memory and slow down i.e. execution. If a curly bracket occurs in any other context, it is treated as a regular character. Examples: foob.*r foob.+r foob.?r fooba{2}r fooba{2,}r fooba{2,3}r matches strings like 'foobar','foobalkjdflkj9r'and 'foobr' matches strings like 'foobar','foobalkjdflkj9r'but not 'foobr' matches strings like 'foobar','foobbr'and 'foobr'but not 'foobalkj9r' matches the string 'foobaar' matches strings like 'foobaar','foobaaar','foobaaaar'etc. matches strings like 'foobaar',or 'foobaaar'but not 'foobaaaar' A little explanation about "greediness". "Greedy" takes as many as possible,"non-greedy" takes as few as possible. For example, 'b+' and 'b*' applied to string 'abbbbc' return 'bbbb', 'b+?' returns 'b', 'b*?' returns empty string, 'b{2,3}?' returns 'bb', 'b{2,3}'returns 'bbb'. You can switch all iterations into "non-greedy" mode (see the modifier /g). Metacharacters - alternatives You can specify a series of alternatives for a pattern using "|'' to separate them, so that fee|fie|foe will match any of "fee'', "fie'', or "foe'' in the target string (as would f(e|i|o)e). The first alternative includes everything from the last pattern delimiter ("('',"['',or the beginning of the pattern) up to the first "|'', and the last alternative contains everything from the last "|'' to the next pattern delimiter. For this reason, it's common practice to include alternatives in parentheses, to minimize confusion about where they start and end. Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when matching foo|foot against "barefoot'', only the "foo'' part will match, as that is the first alternative tried, and it successfully matches the target string. (This might not seem important, but it is important when you are capturing matched text using parentheses.) Also remember that "|''is interpreted as a literal within square brackets, so if you write [fee|fie|foe] you're really only matching [feio|]. Examples: foo(bar|foo) matches strings 'foobar' or 'foofoo'. Metacharacters - subexpressions The bracketing construct (...) may also be used for define i.e. sub-expressions (after parsing you can find sub-expression positions, lengths and actual values in MatchPos, MatchLen and Match properties of TRegExpr, and substitute it in template strings by TRegExpr.Substitute). Sub-expressions are numbered based on the left to right order of their opening parenthesis. First sub-expression has number '1'(whole i.e. match has number '0'-You can substitute it in TRegExpr.Substitute as '$0'or '$&'). Examples: (foobar){8,10} foob([0-9]|a+)r Matches strings which contain 8,9 or 10 instances of the 'foobar' Matches 'foob0r','foob1r','foobar','foobaar','foobaar'etc. Metacharacters - backreferences Metacharacters \1 through \9 are interpreted as backreferences.n>matches previously matched subexpression #<n>. Examples: (.)\1+ (.+)\1+ (['"]?)(\d+)\1 matches 'aaaa'and 'cc' also match 'abab'and '123123' matches '"13"(in double quotes),or '4'(in single quotes)or 77 (without quotes) Modifiers Modifiers are for changing behavior of TRegExpr. There are many ways to set up modifiers. Any of these modifiers may be embedded within the regular expression itself using the (?...)construct. Beside, you can assign to appropriate TRegExpr properties (ModifierX for example to change /x, or ModifierStr to change all modifiers together). The default values for new instances of TRegExpr object defined in global variables, for example global variable RegExprModifierX defines value of new TRegExpr instance ModifierX property. i Do case-insensitive pattern matching (using installed in you system locale settings), see also InvertCase. m Treat string as multiple lines. That is, change "^'' and "$'' from matching at only the very start or end of the string to the start or end of any line anywhere within the string, see also Line separators. s Treat string as single line. That is, change ".'' to match any character whatsoever, even a line separators (see also Line separators), which it normally would not match. g Non-standard modifier. Switching it off you'll switch all following operators into nongreedy mode (by default this modifier is On). So, if modifier /g is Off then '+' works as '+?','*' as '*?' and so on x Extend your pattern's legibility by permitting whitespace and comments (see explanation below). The modifier /x itself needs a little more explanation. It tells the TRegExpr to ignore whitespace that is neither backslashes nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The #character is also treated as a metacharacter introducing a comment, for example: ( (abc)#comment 1 |#You can use spaces to format r.e.-TRegExpr ignores it (efg)#comment 2 ) This also means that if you want real whitespace or #characters in the pattern (outside a character class, where they are unaffected by /x), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making regular expressions text more readable.