What is Software Testing? And Why is it So Hard J. Whittaker paper (IEEE Software – Jan/Feb 2000) Summarized by F. Tsui Common Reasons of “Bugs” Escaped to Customers • User executed an “untested” segment of code. • User executed the sequence of code differently in actual use from any of the test cases • User applied a different combination of input values from any of the test cases • User’s operating environment is different from any of the tested environment --- or an environment that was conscientiously untested due to cost. Note: These reasons give us a “hint” of where and what to test. Proposed: 4 Major Testing “Phases” or “Activities” 1. Model the User’s Software Environment 2. Generating & Selecting the Test Scenarios 3. Running and Evaluating the Test Scenarios 4. Record the Test Results and Measure the Progress 1. Modeling the Software Environment • Simulate the interactions between the software and its environment; mostly in the interfaces that the software uses: – Human Interface (e.g. GUI with mouth clicks, key board inputs, other devices) – Software Interface (e.g. software interfaces to Op System or other modules or apps, especially in the areas of “error” return messages) – File System interfaces (e.g. data written out to or read from external files or DB) – Communications interface (e.g. protocols to the network, both valid and invalid ones) In testing the interface the tester must consider 2 things: 1) the actual data value for the test inputs and 2) the sequencing of the test inputs e.g. -1 in lock-stock-barrel Interface Testing Suggestions • Boundary Value Partitioning (combination of what we called boundary value and equivalence class) is often used for input value choices • The sequencing of inputs must consider the “dependencies” among the inputs. (our “strong normal” or “worst case” case testing) – Exercise both the “legal” combination and sequence along with the “illegal” combination and sequence. • Use “Models” to help set up sequences of inputs and transitions: – Graphs – State Diagrams, – Logic Tables, etc. 2. Selecting Test Scenarios • There are often too many test cases (possibly infinite) to run, but schedule and resource forces us to select a subset . – Consider “Coverage” as criteria: my addition 1. Code statements coverage – how much of the source code are executed at least once? 2. Input coverage – how many of the inputs and how much of the variations for each input are covered? (boundary value, robust testing, -----, decision table) 3. Output coverage - how many of the outputs are covered and to what depths? – Consider the execution paths: • Sequences of inputs to execute different paths of the source code – Major user paths in terms of “typical usage” scenarios – Less well –defined (thus less well coded) and minor functional paths only used by a minority of users. A bit different from just plain path testing ---- needs to know the usage & users and application domain Execution Paths Testing as Discriminating “Criteria”: • For Control flow Testing – Every source line executed – Every branch (case, while, if-then-else) Think about our lecture on paths testing: - code coverage - branch coverage - linearly independent paths - all logical combinations how far would you go? • For Data flow testing – Data “define” to data “usage” path – Data structure initialization to usage How much of the D-U paths would you use ---- just the D to C-use, D-to P-use, or all D-U paths? Would you include program slice testing to focus on certain data? Fault Seeding and Discovery as “criteria” • Insert defects into the code • Design test scenarios to detect seeded defects, but the seeded defects are not necessarily known to the testers. • Use the amount of seeded defects uncovered to predict how effective the testing methodology is e.g. : - put in 10 seeded defects - set the goal to detect , say 90%, of the seeded bugs - but, say, found only 6 seeded bugs and 25 “real” bugs - 6/10 = 25/x - x = 250/6 ; which implies there are 42 total “real” bugs - so, we believe 42-25= 17 more real bugs left - decision would be keep on testing until we discover 90%seeded defect Input Domain Testing as “criteria” • Select test cases to cover all physical input • Select test cases that causes each input interface control to be stimulated (e.g. window menu, radio button, drop downs, etc.) • Discriminating Criteria for these above: – Statistically equivalent to covering the complete set (equivalence partitioning) – As stated before, typical or most likely to be executed by users 3. Running and Evaluating Test Scenarios • Executing the designed test scenarios: manually executing test scenarios are expensive ---so often testing includes a tool that aids in automating some aspect of the test execution • Evaluating test results : comparing the actual test result against the expected test result --- needs human oracle to compare results ?. – Use formal specification to state the expected result and then compare against the actual results, also specified in formal terms. (my example: use pre & post conditions with “assert”.) – Embedding some code to show the executing results (e.g. outputting variable values to be checked by an external person or automated program ----- in some strategically picked program slices my words Regression Testing – just re-running old test cases? • • Regression test is a form of testing changes (& fixes) Often an application software may go through several versions before final testing and release, each version may contain corrected code, modified code, and/or additional functional code: – We need to test to see that the newer version with these potential changes did not “regress” what was working before in the previous versions. This is called regression testing. How much testing do we do? 1. 2. 3. Test only the modified areas Test only the modified areas and the “immediate neighborhood” Test the new/modified areas and also re-run all the previous test scenarios Some Other Concerns (when evaluating tests) • Should code that is hard to test or buggy be rewritten? – What is “hard to test” code ? – How much of the code should be re-written if a code piece is judged to be hard-to-test - We do not have a clear definition of hard to test code – “non-cohesive” code? - Furthermore, are we writing code for users or for testers and other developers? • Recreating or reproducing a failed case is not always easy because the state under which the code failed is not always understood. – What do we do with non-reproducible failures? • Report them but do nothing? • Do not even report those? 4. Measuring Test Progress • Instead of just reporting “numbers” of test cases designed, or executed; number of failures found; etc. , should also ask: We talked about this when you reported bugs for inspection. e.g. problem type and severity; problems/page (rate) – – – – Have we considered “common” programming errors Have we forced all data to be initialized? Have we found all seeded errors? Have we tried the expected user usage modes? • Testers and Support would also like to know how many defects are left in the code and the chance of these showing up in the field. How do we deal with this ? More on Measurement and Models • Testability: (not sure how measured) – If a software has “high” testability, then it is easier to test and thus more likely to have the defects found and removed prior to releasing to users. (refers to Voas’ notion of testability --- another paper) If we found small # of bugs in highly testable code --- then we should be happy --- & finding lots of bugs in low testable code is also a good sign? • Reliability models – Models the frequency of defects found through testing – Using the model to predict the probability of defects to be found in the user environment Finally • • • • Hire the “smartest” & “qualified” people you can get. Equip them with proper tools and training Give them the authority and time to perform their work Listen to them when they speak about the “quality” of the product. My addition