Chapter 6 Software Testing – Strategies 1 • Software testing was the first software quality assurance tool applied to control the software product’s quality before its shipment or installation . • At first, testing was confined to the final stage of development, after the entire package had been completed. • Later, as the importance of early detection of software defects became important, SQA professionals were encouraged to extend testing to the partial in-process products of coding, which led to software module (unit) testing and integration testing. 2 • Software testing is undoubtedly the largest consumer of software quality assurance resources. • In a survey performed in November 1994, Perry (1995) found that on average, 24% of the project development budget was allocated to testing. • With respect to time resources, an average of 27% of project time was scheduled for testing. • The survey’s participants also indicated that they planned to allocate substantially more time (45% on average) to testing but that the pressures typically arising toward the close of projects generally forced project managers to reduce the testing time scheduled. 3 Software tests – definitions • Testing is the process of executing a program with intention of finding errors. Activities can be – code checks performed by a team leader, – trial runs of the software performed by a colleague, – tests carried out by a testing unit. • Much more formal and controlled are the two definitions for testing suggested by IEEE: – The process of operating a system or component under specified conditions, observing or recording the results, and making an evaluation of some aspect of the system or component. – The process of analyzing a software item to detect the differences between existing and required conditions (that is, bugs) and to evaluate the features of the software item. • Software testing is a formal process carried out by a specialized testing team in which a software unit, several integrated software units or an entire software package are examined by running the programs on a computer. All the associated tests are performed according to approved test procedures on approved test cases. 4 Comparison of the key characteristics of software testing with those of other software quality assurance life cycle tools: • Software test plans are part of the project’s development and quality plans, scheduled in advance and often a central item in the development agreement signed between the customer and the developer. • In other words, ad hoc examination of software by a colleague or regular checks by the programming team leader cannot be considered software tests. • An independent team or external consultants who specialize in testing are assigned to perform these tasks mainly in order to eliminate bias and to guarantee effective testing by trained professionals. • It is generally accepted that tests performed by the developers themselves will yield poor results, as those individuals who developed the original product will find it difficult to reveal errors that they were unable to identify earlier. 5 • Any form of quality assurance activity that does not involve running the software, for example code inspection, cannot be considered as a test. • The testing process performed according to a test plan and testing procedures that have been approved as conforming to the SQA procedures adopted by the developing organization. • The test cases to be examined are defined in full by the test plan. No omissions or additions are expected to occur during testing. • In other words, once the process has begun, the tester is not allowed to exercise discretion by omitting a test case he or she considers redundant or by adding a new test case, promising though it may be. 6 Software testing objectives: Direct objectives – To identify and reveal as many errors as possible in the tested software. – To bring the tested software, after correction of the identified errors and retesting, to an acceptable level of quality. – To perform the required tests efficiently and effectively, within budgetary and scheduling limitations. Indirect objective – To compile a record of software errors for use in error prevention (by corrective and preventive actions). • If your goal is to show the absence of errors you won’t discover many. If your goal is to show the presence of errors, you will discover a large percentage of them. (Myers, 1979) • Bug-free software is still a utopian aspiration. • Therefore, acceptable level of quality means that a certain percentage of bugs, tolerable to the users, will remain unidentified upon installation of the software. 7 Software testing strategies : • Although test methodologies may vary, often greatly, these are applied within the framework of two basic testing strategies: – To test the software in its entirety, once the completed package is available; otherwise known as “big bang testing”. – To test the software piecemeal, in modules, as they are completed (unit tests); then to test groups of tested modules integrated with newly completed modules (integration tests). This process continues until all the package modules have been tested. Once this phase is completed, the entire package is tested as a whole (system test). This testing strategy is usually termed “incremental testing”. 8 Incremental Testing: • Furthermore, incremental testing is also performed according to two basic strategies: bottom-up and top-down. • Both incremental testing strategies assume that the software package is constructed of a hierarchy of software modules. • In top-down testing, the first module tested is the main module, the highest level module in the software structure; the last modules to be tested are the lowest level modules. • In bottom-up testing, the order of testing is reversed: the lowest level modules are tested first, with the main module tested last. 9 10 Bottom-up versus top-down strategies • The main advantage of the bottom-up strategy is the relative ease of its performance, whereas the main disadvantage is the lateness at which the program as a whole can be observed (that is, at the stage following testing of the last module). • The main advantage of the top-down strategy is the possibilities it offers to demonstrate the entire program functions shortly after activation of the upperlevel modules has been completed. • In many cases, this characteristic allows for early identification of analysis and design errors related to algorithms, functional requirements, and the like. • Clearly, testers should follow the developers’ approach because it is crucial that testing will be performed immediately after a module has been coded. • Implementation of a testing strategy that differs from the development strategy will cause substantial delays in scheduling of the tests. 11 Big bang versus incremental testing The disadvantages of Big bang testing: • If the program is not small and simple, application of big bang testing has severe disadvantages. • Identification of error becomes quite cumbersome with respect to immense quantities of software. (Relatively low rate of big bang error identification) • When confronted with an entire software package, error correction is often difficult. • Requires consideration of the possible effects of the correction on several modules at one and the same time. • Estimation of the required testing resources and testing schedule is rather fuzzy. • The prospects of keeping on schedule and within the budget are substantially reduced when this testing strategy is applied. 12 Incremental Testing: The advantages of Incremental testing: (1) Incremental testing is usually performed on relatively small software modules, as unit or integration tests. (2) This makes it easier to identify higher percentages of errors when compared with testing the entire software package. (3) Identification and correction of errors is much simpler and requires fewer resources because it is performed on a limited volume of software. (4) In incremental testing, a great part of the errors are identified and corrected at an earlier stage of development and testing, which prevents “migration” of escaped defects to a later, more complex stage in the development where their correction would require significantly greater resources. • The only disadvantage of Incremental testing is the need to carry out numerous testing operations for the same program (big bang testing requires only a single testing operation). 13 Software Test Classification according to testing concept: • There is an ongoing debate over whether testing the functionality of software solely according to its outputs is sufficient to achieve an acceptable level of quality. • Some claim that the internal structure of the software and the calculations (the underlying mathematical structure, also known as the software mechanism) should be included for satisfactory testing. • Based on these two opposing concepts or approaches to software quality, two testing classes have been developed: Black box (functionality) testing: • Identifies bugs only according to software malfunctioning as they are revealed in its erroneous outputs. • In cases that the outputs are found to be correct, black box testing disregards the internal path of calculations and processing performed. White box (structural) testing: • Examines internal calculation paths in order to identify bugs. • Although the term “white” is meant to emphasize the contrast between this method and black box testing, the method’s other name – “glass box testing” better expresses its basic characteristic, that of investigating the correctness of code structure. 14 (McCall model extended version to cover the classification of the tests carried out to ensure full coverage of the respective requirements.) 15 16 White Box Testing: Data processing and calculation correctness Tests: In order to perform tests (“white box correctness test”), every path must be examined. This type of verification allows us to decide whether the processing operations and their sequences were programmed correctly for the path in question. Maintenance Tests: Maintainability tests refer to special features, such as those installed for detection of causes of failure, module structures that support software adaptations and software improvements, etc. Software Qualification Tests: Focus here shifts to the examination of software code (including comments) compliance with coding standards and work instructions. Reusability tests : Examines the extent that reused software is incorporated in the package and the adaptations performed in order to make parts of the current software reusable for future software packages. 17 Correctness tests and line coverage • The line coverage concept requires that, for full line coverage, every line of code be executed at least once during the process of testing. • The line coverage metrics for completeness of a line-testing (“basic path testing”) plan are defined as the percentage of lines indeed executed that is, covered during the tests. • In a flow chart, diamonds present the options covered by conditional statements (decisions), whereas rectangles or a succession of rectangles represent the software sections connecting those conditional statements. • In program flow graphs, nodes represent software sections and thus replace one or more flow chart rectangles. • The edges indicate the sequence of software sections. Nodes having two or more leaving edges represent conditional statements. 18 Example – the Imperial Taxi Services (ITS) taximeter • Imperial Taxi Services (ITS) serves one-time passengers and regular clients (identified by a taxi card). The ITS taxi fares for one-time passengers are calculated as follows: (1) Minimal fare: $2. This fare covers the distance traveled up to 1000 yards and waiting time (stopping for traffic lights or traffic jams, etc.) of up to 3 minutes. (2) For every additional 250 yards or part of it: 25 cents. (3) For every additional 2 minutes of stopping or waiting or part thereof: 20 cents. (4) One suitcase: no charge; each additional suitcase: $1. (5) Night supplement: 25%, effective for journeys between 21.00 and 06.00. 19 20 21 • Software should make it easier for the team leader to check the software, for the replacement programmer to comprehend the code and continue coding tasks, and for the maintenance programmer to correct bugs and/or update or change the program upon request. 22 Advantages and disadvantages of white box testing: The main advantages of white box testing are: • Direct statement-by-statement checking of code enables determination of software correctness as expressed in the processing paths, including whether the algorithms were correctly defined and coded. • It allows performance of line coverage follow-up (applying specialized software packages) that provides the tester with lists of lines of code that have not yet been executed. The tester can then initiate test cases to cover these lines of code. • It ascertains quality of coding work and its adherence to coding standards. The main disadvantages of white box testing are: • The vast resources utilized, much above those required for black box testing of the same software package. • The inability to test software performance in terms of availability (response time), reliability, load durability, and other testing classes related to operation, revision and transition factors. • The characteristics of white box testing limit its use to software modules of very high risk and very high cost of failure, where it is highly important to identify and fully correct as many of the software errors as possible. 23 Black box testing: Equivalence classes for output correctness tests: • The output correctness tests apply the concept of test cases. • Equivalence class partitioning is a black box method aimed at increasing the efficiency of testing and, at the same time, improving coverage of potential error conditions. • An equivalence class (EC) is a set of input variable values that produce the same output results or that are processed identically. • EC boundaries are defined by a single numeric or alphabetic value, a group of numeric or alphabetic values, a range of values, and so on. • A test case that includes more than one invalid EC may not allow the tester to distinguish between the program’s separate reactions to each of the invalid ECs. • Hence, the number of test cases required for the invalid ECs equals the number of invalid ECs. • Compared to the use of a random sample of test cases, equivalence classes save testing resources because they eliminate duplication of the test cases defined for each EC. 24 Test cases and boundary values: • According to the definition of equivalence classes, one test case should be sufficient for each class. • When equivalence classes cover a range of values (e.g. monthly income, apartment area), the tester has a special interest in testing border values when these are considered to be error prone. • In these cases, the preparation of three test cases – for mid range, lower boundary and upper boundary values – is recommended. 25 Example – the Golden Splash Swimming Center • The following example illustrates the definition of (valid and invalid) equivalence classes and the corresponding test case values. • The software module in question calculates entrance ticket prices for the Golden Splash Swimming Center. • The Center’s ticket price depends on four variables: day (weekday, weekend), visitor’s status (OT = one time, M = member), entry hour (6.00– 19.00, 19.01–24.00) and visitor’s age (up to 16, 16.01–60, 60.01–120). 26 27 28 29 Other operation factor testing classes: • Apart from output correctness tests, operation factor testing classes include the following classes of tests: 30 Documentation tests • An erroneous user manual or programmer manual can lead to mistakes during program operation and maintenance that may incur damages equivalent in severity to those caused by software bugs. • Common components of documentation, supplied by the developer, are: – Installation manual: In commercial software packages (COTS software), the installation manual usually includes customization instructions. – User manual: In many cases, the user manual is supplied as a computerized help manual. – Programmer manual: It includes the information required for maintaining the system (bug corrections, adaptation to changing requirements and software improvement), program structure, description of program logic including algorithms, and so on. – Document completeness check: Its purpose is to check whether all the required documents have been completed as specified and as intended by the designer. – Document correctness tests: Correctness tests determine whether the instructions listed in the user document are correct. – Document style and editing inspection: Refers to document clarity. 31 Availability tests • Availability is defined as reaction time – the time needed to obtain the requested information or the time required for firmware installed in computerized equipment to react. • There is a need to carry out the tests under regular operation load as well as under maximal load conditions as specified in the requirement specifications. • It should be noted that the availability requirements for regular and maximal workloads are usually different. Reliability tests • The software system reliability requirement deals with features that can be translated as events occurring over time, such as average time between failures (e.g., 500 hours), average time for recovery after system failure (e.g., 15 minutes) or average downtime per month (e.g., 30 minutes per month). • Reliability requirements are to be in effect during regular full-capacity operation of the system. • It should be noted that in addition to the software factor, reliability tests also relate to the hardware, the operating system and the data communication system effects. 32 Stress tests a-Load tests: • Load tests relate to the functional performance of the system under maximal operational load: maximal transactions per minute, hits per minute to an Internet site and the like. • Load tests, which are usually conducted for loads higher than those indicated in the requirements specification, are of utmost importance for software systems planned to serve simultaneously a large population of users. • Manual performance of load tests is impractical for most software systems, and is therefore carried out by computerized tests based on comprehensive simulations of high loads, again similar to the procedures adapted for availability testing. • They allow us to ascertain whether upgrading is necessary and which changes should be made to allow the software system to meet the planned requirements. 33 b-Durability tests: • Durability tests are carried out in physically extreme operating conditions such as high temperatures, humidity, and high-speed driving along unpaved rural roads. • Hence, these durability tests are typically required for real-time firmware integrated into systems such as weapon systems, long-distance transport vehicles, and meteorological equipment. • Durability issues for firmware include firmware responses to climatic effects such as extreme hot and cold temperatures, dust, road bumps, and extreme operation failures resulting from sudden electrical failure, voltage “jumps” in the supply mains, sudden cutoffs in communications, and so on. • Information system software durability tests focus on operation failures resulting from sudden electrical failures, voltage “jumps” in the supply mains and sudden cutoffs in communications. 34 Software system security tests • Software security components of software systems are aimed at preventing unauthorized access to the system or parts of it, detection of unauthorized access and the activities performed by the penetration, and the recovery of damages caused by unauthorized penetration cases. • The main security issues dealt with by these tests are: – Access control, where the usual requirement is for control of multilevel access (usually by a password mechanism). – Of special importance here are the firewall systems that prevent unauthorized access to Internet sites. – Backup of databases and software files and recovery in cases of system failure. 35 Training usability tests • When large numbers of users are involved in operating a system, training usability requirements are added to the testing agenda. • The scope of training usability is defined by the resources needed to train a new employee, in other words, how many hours of training are required for a new employee to achieve a defined level of acquaintance with the system. Operational usability tests • The focus of this class of tests is the aspects of the system that affect the performance regularly achieved by system operators. These tests are of high importance in cases where the workings of the system can affect substantially the productivity of its users. 36 Revision factor testing classes: • • • Maintainability tests – The system structure abides by the standards and development procedures imposed on the specific components for support of future maintenance activities. – The programmer’s manual is prepared according to approved documentation standards and provides complete system documentation. – The internal documentation incorporated in the software code is prepared to cover the system’s documentation requirements. Flexibility Tests – Flexibility is required for adaptation of the software to the variety of customer needs for the purpose of improving system functionality. – Flexibility tests are intended to test the software characteristics that support flexibility, such as adequate modular structure and application of parametric options to provide a wide range of possible applications. Testability tests – Testability requirements deal with the ease of testing the software system. – Testability here relates to the addition of special features in the program that help the testers in their work, such as the possibility of obtaining intermediate results for certain checkpoints and predefined log files. – Another objective of testability deals with diagnostic tool applications implemented for the analysis of the system performance and the report of any failure found. Some features of this kind are activated automatically when starting the software package or during regular operation and report whenever conditions warranting alarm arise. 37 Transition factor testing classes: • Portability tests – Portability requirements specify the environments in which the software system has to be operable. – The portability test to be carried out will verify, validate and test these factors as well as estimate the resources required for transfer of a software system to a different environment. • Reusability tests – Reusability defines which parts of the program (modules, integrations and the like) are to be developed for future reuse in other software development projects – Reusability requirements are of special importance for object-oriented software projects. Tests are therefore devised to examine whether reusability standards were indeed adhered to. • Software interoperability tests • Equipment interoperability tests 38 Advantages and disadvantages of black box testing The main advantages of black box testing are: • In black box testing, system performance test classes such as load tests and availability tests are important. • For testing classes that can be carried out by both white and black box tests, black box testing requires fewer resources than those required for white box testing of the same software package. The main disadvantages of black box testing are: • Possibility that coincidental aggregation of several errors will produce the correct response for a test case, and prevent error detection. • Black box tests do not readily identify cases of errors that counteract each other to accidentally produce the correct output. • Absence of control of line coverage. Black box tests may not execute a substantial proportion of the code lines, which are not covered by a set of test cases. • Impossibility of testing the quality of coding and its strict adherence to the coding standards. 39