Chapter 11: Testing According to the authors, “Testing is the process of finding differences between the behavior specified by the system models and the observed behavior of the system.” In effect, the main purpose of testing is to break the system. During the developmental phases (such as system design, object design, etc.) our main goal was to build a system that was to satisfy the requirements and specifications of the client. Having completed building the system, we now need to break it. The idea is that if we cannot break the system, then it is very likely that we did a good job. That is, the system has met with the requirements and specifications provided by the client. Also, note that tests should be carried out by qualified persons who are somewhat familiar with the whole system. It is better however, that a tester should not be a developer. Furthermore, a tester should be familiar with testing techniques. There are different types of test used at different “stages” of the application. For example there are: 1. The Unit tests. 2. Structural tests. 3. Functional tests. 4. Performance tests. Some techniques to conduct tests that falls under Quality Control Techniques are: 1. Fault Avoidance Techniques: Tries to prevent the occurrence of errors and failures by finding faults in the system before it is released. It includes (a) Development methodologies: provides techniques that reduce fault introduction into the system models and code. Includes unambiguous representations of requirements, minimizing coupling and maximizing coherence, use of data abstraction and encapsulation, capture of rationale configuration for maintenance and early definition of subsystem interface. (b) Configuration management: avoid faults caused by undisciplined change in the system. Need to notify other developers when changes are made so that other parts of the system that may be affected by such changes may be upgraded. (c) Verification: attempts to find faults before any execution of the system. Must assume that pre conditions are true and verify that post conditions are met. (d) Review: is manual inspection of the system without actually executing the system. Two types – walkthrough and inspection. Walkthrough is going through the code line by line trying to identify errors in the code. Inspection, which is similar to walkthrough, checks the code against the requirements, checks the algorithm for efficiency and check the comments to see if they are accurate. Developer is not present during inspection. 2. Fault Detection Techniques: Attempts to find faults of the system during development, and in some cases, after release. These techniques do not attempt to recover from the fault – example a blackbox that may be found on an aircraft. The blackbox would try to record the reasons for a crash. Two types of fault detection techniques: 1 (a) Debugging: debugger moves through a number of states finally coming to an error and hence be able to fix it. It is finding an error in what is regarded as an unplanned way. Two type of debugging: correctness debugging – find deviation between observed and specified functional requirements and performance debugging – find deviations between observed and nonfunctional requirements, such as response time. (b) Testing: finding an error in a planned way. Note – a successful test is a test that was able to find errors. The developer tries to find errors before delivery of the system. The idea is to choose test data that would guarantee a greater degree of failure of the system. Testing activities include: (i) Unit Testing: tries to find faults in participating objects and/or subsystems with respect to use cases from the use case model. (ii) Integration Testing: the activity of finding faults when testing the individually tested components together. The system structure testing is the culmination of the integration testing whereby all of the components are tested together. (iii) System Testing: tests all of the components together (seen as a single system) with respect to functional requirements, design, etc. It includes: Functional Testing: tests the requirements from the RAD (Requirements Analysis Document) and if available from the user manual. Performance Testing: checks the nonfunctional requirements and additional design goals. Done by developers. Acceptance testing and Installation testing: check requirements against project agreement. Done by client with support from developer. 3. Fault Tolerance Technique: There may exist cases whereby we are unable to prevent errors. If this is the case, then we must be able to recover from any such errors. Fault tolerance techniques are critical in highly reliable systems example, a life saving unit, the space shuttle (which has 5 onboard computers providing a modular operation), etc. Testing concepts: 1. A component: a part of the system that can be isolated for testing – could be an object, a group of objects or one or more subsystems. 2. A fault: bug or defect – a design or coding mistake that may cause abnormal component behavior. 3. An error: manifestation of a fault during the execution of the system. 4. A failure: deviation between the specification of a component and its behavior. This is triggered by one or more errors. 2 5. A test case: a set of inputs and expected results that exercises a component with the purpose of causing failures and detecting faults. 6. A test stub: a partial implementation of components on which the test component depends. 7. A test driver: a partial implementation of a component that depends on the tested component. 8. A correction: a change to a component – repair a fault. Faults, errors and failures: As noted a fault could be the result of bad coding or design. The figure on page 443 – figure 11.3 shows an example of a fault. In this case, the workmen may have miscommunicated thus resulting in the tracks not aligned. In actual software development, this sort of thing can also happen. The programming may be divided into groups, each group responsible for a subsystem. Due to lack of communication, the subsystems cannot be integrated properly, yet each subsystem is working in their own rights. A fault would become an error only after the piece of code was tested. As in the example given, the line is tested via a use case. A train is run on the track, which would lead to derailment (a failure of the system). Test cases: A test case has 5 attributes: Attributes Name Location Input Oracle Description Name of test case Full path of name of executable Input data or commands Expected test results against which the output of the test is compared Output produced by the test. Log The name of a test case should reflect the component that is tested, thus it should include part of the components name. The location should describe where the test case is found. That is the pathname or URL, as the case may be, of the executable program and its inputs. Input describes set of inputs to be used that will be entered either by the tester or a test driver. The expected behavior (output) is described by the oracle and the log is a correlation of the observed behavior with the expected behavior for various runs. Testing must be done in some sort of sequence. That is, if a test case is dependent upon the result of another test, then that test should be completed first. This may seem trivial 3 for small applications, but when applications consist of millions of lines of code and involves dozens of programmer, it becomes very important indeed. Coordination is the key. Test cases are classified into “black-box” and “white-box” tests depending upon the aspect of the system model that is tested. Black-box test deals with the input/output of the component, not with the structure or the behavior of the component. White-box focuses on the internal structure. This makes sure that every state in the dynamic model of the object and every interaction of the objects is tested. Unit testing involves both black-box and white-box testing. It tests for both input/output as well as the structural and dynamic aspects of the component. Black-box Testing: Black-box testing focuses on the functional requirements of the software. It allows the tester to use inputs that would test all functional requirements. It attempts to find errors in the following categories: (a) Incorrect or missing functions. (b) Interface errors. (c) Errors in data structures or external data base access. (d) Performance errors and (e) Initialization and termination errors. Black-box testing is deferred to be done later in the software development process, unlike white-box testing which is done earlier. Some questions that are used to guide a blackbox test are: 1. How is functional validity tested? 2. What classes of input will make good test cases? 3. Is the system particularly sensitive to certain input values? 4. How are the boundaries of a data class isolated? 5. What data rates data volumes can the system tolerate? 6. What effect will specific combinations of data have on system operation? The first step in black-box testing is to understand the objects that are modeled in the software and the relationships that connect these objects. Once this has been done, the next step is to define a series of test that will verify “all objects have the expected relationships to one another.” To do this, the software engineer begins to create a graph with a collection of nodes (that represents the objects), links (edges – that represents the relationships) and node weight (that represent the properties of a node). The link could be directed (relationship moves in one direction), bi-directional (moves in both direction) and parallel (a number of different relationships established between two nodes. The figure shows that a menu select on new file generates a document window. The node weight of document window provides a list of attributes that are to be expected when the window is generated. The link weight indicates that the window must be generated in less than 1 second. An undirected link establishes a symmetric relationship between new file 4 menu select and document text and parallel links exist between document window and document text. White-box testing: This test uses the control structure to design the test cases. The tests derived to conduct white-box testing: 1. Guarantee that all independent paths within a module have been exercised at least once. 2. Exercise all logical decisions on their true and false sides. 3. Execute all loops at their boundaries and within their operational bounds. 4. Exercise internal data structures to assure their validity. One technique used for white-box testing is the basis-path testing technique. The test case designer would derive a logical complexity measure of a design and use this measure as a guide for defining a basis set of execution paths. Test cases derived to exercise the basis set are guaranteed to execute every statement in the program at least one time during testing. But a flow graph of the application must first be derived. The following graphs give representation of code statement: If While Sequence Do Until Case These sub-graphs would be used to convert a program into a flow graph. The sub-graph would represent the various coding constructs that were used in the program. The final graph would then be scanned for independent paths. Each independent path would now require a separate test case. In this way, no line of code within the program would escape testing. An example: check the attached diagrams. Test stubs and drivers: When we want to test single components of a system, we need to separate that component from the rest of the system. We do this by creating stubs to represent the related parts to the system and drivers to carry out the test. The stubs actually are used to simulate the parts of the system that are called by the component that is under test. So a stub may consist of values, etc. that may be required by the component that is tested. Such values would actually provide the test for the component. Note: a test stub should simulate the 5 component that it is substituting as close as possible, else the tested component may not be adequately tested. Thus, it is some times even better to use the actual component that would have been simulated by the test stub, to carry out the test. Corrections: Once a problem has been found, then corrections are made. Corrections could be simple and applied to only the component that is under test or they could be more involved requiring changes to an entire class or subsystem. There may be cases whereby entire subsystems need to be redesigned, which may eventually introduce new errors. The authors suggested several techniques that can be used to track and handle any new faults that may evolve: 1. Problem tracking: once documentation of the entire process of finding errors and correction is kept, then it is easy to revise those portions of the system with the intent of finding faults. 2. Regression testing: re-execution of all prior tests after a change. 3. Rationale maintenance: justifying the changes that are made with the requirements of the subsystem. Testing activities: 1. Inspecting Components: Inspections find faults in a component by reviewing its source code. It could be done before or after the unit test. Fagan suggested a five-step method for inspection: (a) Overview: The author of the component briefly presents the purpose and scope of the component and the goal of the inspection. (b) Preparation: The reviewers become familiar with the implementation of the component. (c) Inspection meeting: A reader paraphrases the source code (reads each source code statement and explains what the statement should do) of the component and the inspection team raise issues with the component. A moderator keeps the meeting on track. (d) Rework: The author revises the component. (e) Follow-up: The moderator checks the quality of the rework and determines the component needs to be re-inspected. 2. Unit Testing: This focuses on the building block of the system, that is, the objects and subsystems. There are 3 main reasons for unit test: (a) It reduces the complexity of the overall test activity enabling focus on smaller units at a time. (b) It becomes easier to pinpoint and correct errors. (c) Allow parallelism in testing i.e. it each component is tested independent of each other. There are many unit testing techniques: 6 Equivalence testing: a black-box testing technique that minimize the number of test cases. The possible inputs are partitioned into equivalence classes and a test case is selected for each class. Only one member of an equivalence class needs to be tested. The test consists of two steps: identification of the equivalence class and the selection of the test inputs. To identify equivalence class we use: Coverage: Every possible input belongs to one of the equivalence classes. Disjointedness: No input belongs to more than one equivalence class. Representation: If the execution demonstrates an error when a particular member of an equivalence class is used, then the same error should appear if any other member of the class is used. For each equivalence class, two pieces of data is used – a typical input and an invalid input. The example given for the method that returns the number of days: 3 equivalence classes were found for the month parameter – months with 31 days, with 30 days and February with either 28 or 29 days. Invalid inputs would be non-positive integers and integers bigger than 12. Two equivalence classes were found for the year parameter – leap year and non-leap year. Negative integers are invalid for year. Together, this yield 6 equivalence classes that would need to be tested – table 11.2 page 455. Boundary testing: Focuses on the conditions at the boundary of the equivalence class. The testing requires that elements be selected from the “edges” of the equivalence class. The above example: generally years that are multiple of 4 are leap years, but note that years that are multiple of 100 are not leap years unless they are multiple of 400, even though they may be multiple of 4. Example – 2000 is a leap year but 1900 was not a leap year, even though it was a multiple of 100 and 4. Hence, both 1900 and 2000 are good boundary cases for year and 0 and 13 would be good boundary cases for month. Path Testing: A white-box testing technique that identifies faults in the implementation of the component. The idea behind path testing is that each line of code would be tested at least once, and thus if any fault (error) exist in any of the paths tested, it would be found. Thus to carry out this test, flow graph of the source must be developed. In the case of the example method, the flow graph on page 456 was developed. Note the decisions were taken into consideration – no looping structures in this method. There are five if statements in the code (page 457) represented by the diamond shapes and the activities are represented by the rectangular-rounded-edged shapes consisting of 7- two for the exceptions and 5 for the if’s and their else’s. The table on page 458 (table 11.4) shows the test cases and the path – note there are 6 test cases indicating 6 paths. Note that even though path testing could be used in OO languages, it was developed specifically for the imperative languages, thus polymorphism for example, will need more test cases to be used than could be computed from the cyclomatic complexity formula. Also, because OO methods are shorter, less control faults may be uncovered. This is because more inheritance exist in OO and thus the tests would require involvement of a larger number of objects. Note also two things: Because path testing is heavily dependent on the structure of the program, the problem of picking up a 7 value such as 1900 as not being a leap was not found. That is the test was made only for a modulo of year by 4 not also by 100 and 400. Furthermore, none of the path tests were able to pick up that August was missing from the set of months consisting of 31 days. State-based testing: This technique focuses on OO systems. This test focus on comparing the resulting state of the system with the expected state of the system. We derive test cases from the state-chart diagram for the class. Similar to the equivalence testing, for each state, a representative set of stimuli is derived. The attributes of the class are tested after each stimuli is applied. The example on page 460 shows the watch case from chapter 2 being tested. The states tested are MeasureTime and SetTime. State-based testing is a difficult method and is still not fully developed as a testing method. Furthermore, owing to it’s difficulties, it is hoped to be automated, and thus the automated version would be easier to use. 3. Integration testing: Once the unit testing has been successful, then it is time to integrate the units into larger components – classes and/or subsystems or larger subsystems. Integration testing is supposed to be able to detect faults that the unit test did not find. Some of these faults lie in the interface that would be used to integrate the smaller objects into bigger subsystems. The idea is to start small – integrate two objects first and test them, then if no faults occur, add another object, then another, etc. The key to easier and maybe more successful integration testing is ordering the components. There are a number of strategies developed (based on the assumption that the system components are hierarchical in their relationship with each other) that enable ordering: Big bang testing Bottom-up testing Top-down testing Sandwich testing. Big bang testing: Assumes that all components are tested individually and then are put together and tested. This test could be expensive, because if a fault is found when doing the big test, it would be difficult to locate and fix, especially for huge programs. Also interface failure may be difficult to distinguish from component failures. Bottom-up testing: all components of the bottom layer are tested individually and then integrated with the layer up. This is continued until the entire system is tested. Note: when two components exist at the same level and are tested together, it is known as a double test (if three components are tested together, it is known as triple test, and four together as quadruple test). Test drivers are used to simulate the components that are not tested. Top-down testing: The reverse of the bottom-up test; i.e. components from the top layers are tested first and progressively integrate the lower layers. 8 Both of these tests have their advantages and disadvantages. Which is chosen is actually dependent upon the tester/developer. In the case of the bottom-up test, an advantage is that interface faults can be more easily found. The disadvantage is that the interface components are tested last. These may be some of the more important components and if faults are found they may require many of the lower components to be revised which of course means that large amount of components have to be retested. The advantage of the top-down test is that all interface components are tested first, hence if faults are found corrections can be made to lower components before they are even tested. The disadvantage is that the development of test stubs can be time consuming and may also be error prone. This is because a large number of stubs are required. Figure 356 shows how both of these tests are implemented. Sandwich testing: combines the top-down and bottom-up tests trying to make use "of the best of both strategies". The idea is to re-map the subsystem decomposition into three layers - a middle layer, a layer above and a layer below. The components in the bottom and top layers are used as is (no stubs are written). The middle (target) layer is the focus. The other two layers are tested in parallel. A major disadvantage is that methods in the target layer are not tested properly, if at all. So the modified sandwich test helps to correct this problem, but of course more stubs and drivers are required. Nevertheless, the modified sandwich test is shorter than either the bottomup or the top-down test. 4. System Testing: Once unit and integration tests are completed, then the system needs to be tested to make sure that the system meets both functional and nonfunctional requirements. System testing includes: (a) Functional Testing: test of functional requirements from use cases. (b) Performance Testing: test of nonfunctional requirements. (c) Pilot Testing: tests of common functionality among a selected group of end users. (d) Acceptance Testing: usability, functional and performance tests done at the developers’ environment by the customer against the acceptance agreement. (e) Installation Testing: usability, functional and performance tests done at the customer environment by the customer against the acceptance agreement. Functional Testing: Also called requirements testing, test to find differences between the functional requirements and the system. A black-box testing method is used i.e. boundary conditions are tested. The test cases are derived from the use case models. Figure 11.24 and 11.25 give an example using the use case PurchaseTicket. Note the features that are likely to fail and that are actually tested (page 360). Performance Testing: 9 Attempts to find differences between design goals selected during system design and the system. May include: Stress testing: checks to see if the system can respond to many simultaneous requests. Volume testing: attempts to find faults associated with large amounts of data such as static inputs imposed by the data structure, etc. Security testing: attempts to find security faults in the system. Few systematic methods exist to carry out such test. Usually this is carried out by teams of individuals who try to break into the system using their experience and knowledge. Timing tests: attempts to find behavior that violates timing constraints described by the nonfunctional requirements. Recovery tests: evaluate the ability of the system to recover from errors such as hardware failure, etc. After all of these tests are completed without finding any errors, then the system is said to be validated. Pilot Testing: If the software is developed to be placed on the market, then a group of people are invited to test the software and give their feedback. On the other hand, if it developed for a particular client, a group of users are chosen to test the system. They would pretend as if the system was there on a permanent basis and test it as thorough as possible without any guideline to tests. An alpha test is test carried out by the users in the developers environment. A beta test is carried out in the users’ environment by a limited number of users. Beta tests are much more common, especially with the use of the Internet. Acceptance Testing: Three ways in which the client evaluates a system during acceptance testing: Benchmark test: a set of test cases are prepared that represents typical conditions under which the system will operate. Competitor test: used when a new system is replacing an old system. They are tested against each other. Shadow test: new and old system run in parallel and their outputs compared. If all is well, then the customer accepts the system. If not, the developers are notified pertaining as to what is wrong. The developers will modify, delete or add conditions as specified by the client. Installation Testing: After acceptance, the system is installed in clients’ environment. The installation test is carried out to make sure that the system is properly installed and all of the requirements are met. 10