Are You Sure What Failures Your Tests Produce? Lee White Results on Testing GUI Systems • CIS (Complete Interaction Sequences) approach for testing GUI systems: Applied to four large commercial GUI systems • Testing GUI system in different environments: operating system, CPU speed, memory • Modified CIS approach applied to regression test two versions of a large commercial GUI system Three Objectives for this Talk • Use of memory tools during GUI testing discovered many more defects; observability problems here • In GUI systems, defects manifested themselves as different failures (or not at all) in different environments • In GUI systems, many more behaviors reside in the code than designer intended. Complete Interaction Sequence (CIS) • Identify all responsibilities (GUI activity that produces an observable effect on the surrounding user environment). • CIS: Operations on a sequence of GUI objects that collectively implement a responsibility. • Example: (assume file opened) File_Menu -> Print -> Print_Setup_Selection -> Confirm_Print FSM for a CIS (Finite State Model) • Design a FSM to model a CIS • Requires experience to create FSM model • To test for all effects in a GUI, all paths within the CIS must be executed • Loops may be repeated, but not consecutively Init Edit Cut File Copy ReadyEdit Open File Name File Select File Move Cursor Paste Highlight Finish Open2 Select File2 Move Cursor2 Figure 1 Edit-Cut-Copy-Paste CIS FSM Name File2 File Ready How to Test a CIS? • Design Tests: FSM model based upon the design of the CIS is used to generate tests. • Implementation Tests: In the actual GUI, check all CIS object selections, and select all those transitions to another GUI object within the CIS; add these transitions to the FSM model to generate tests, as well as any new inputs or outputs to/from the CIS. I1 I2 BB A A C DD C O1 Figure 2 Design Tests for a Strongly Connected Component [(I1,B,C,D,A,B,C,O1), (I2,A,B,C,D,A,B,C,O1)] B I1 I2 A C O1 * I3 O2 D Figure 3 Implementation Tests for a Strongly Connected Component [ (I1,B,C,D,B,C,D,A,B,C,D,A*,B,C,O1), (I1,B,C,D,B,C,D,A,B,C,D,A*,B,C,D,O2), (I2,A,B,C,D,B,C,D,A,B,C,D,A*,B,C,O1), (I2,A,B,C,D,B,C,D,A,B,C,D,A*,B,C,D,O2), (I3,D,A,B,C,D,B,C,D,A*,B,C,O1), (I3,D,A,B,C,D,B,C,D,A*,B,C,D,O2) ] 1) Real Network Suite RealJukeBox RealDownload RealPlayer Team A (3) Team B (4) Team B (4) 2) Adobe Suite PhotoDeluxe EasyPhoto Acrobat Reader Team B (4) Team A (3) Team A (3) 3) Inter WinDVD Team C (3) 4) Multi-Media DB GVisual VStore AdminSrvr ObjectBrowser Team D (4) Table 1 Case Study of 4 Systems Design # Tests # Faults Impl. # Tests # Faults GUI System GUI Objects Real Networks 443 84 9 242 19 Adobe PS Acrobat R. 507 223 2 612 10 Inter WinDVD 112 56 0 154 3 Multi-Media DB 294 98 0 241 9 Memory Tools • Memory tools monitor memory changes, CPU changes and register changes • Used to detect failures that would have eluded detection, and account for 34% of faults found in these empirical studies • Used two such tools: Memory Doctor and Win Gauge from Hurricane Systems Tool. Table 2 Hidden Faults Detected by Memory Tools GUI System Hidden Faults All Faults Percent Real Network 7 19 37% Adobe PS Acrobat Rd 4 10 40% Inter WinDVD 1 3 33% Multi-Media DB 2 9 22% Total Faults 14 41 34% Failures of GUI Tests on Different Platforms Lee White and Baowei Fei EECS Department Case Western Reserve University Environment Effects Studied • Environment Effects: Operating System, CPU Speed, Memory Changes • Same software tested: RealOne Player • 950 implementation tests • For OS, same computer used, but use of Windows 98 and 2000 investigated Table 3. Faults detected by implementation tests for different operating systems Surprises Defects Faults Windows 98 96 35 131 Windows 2000 37 24 61 Table 4. Faults detected by implementation tests for different CPU speeds Surprises Defects Faults PC1 31 19 50 PC2 34 19 53 PC3 37 24 61 Table 5. Faults detected by implementation tests for different memory sizes Surprises Defects Faults PC3 (256 MB) 96 35 131 PC3 (192 MB) 99 36 135 PC3 (128 MB) 101 38 139 Regression Testing GUI Systems A Case Study to Show the Operations of the GUI Firewall for Regression Testing GUI Features • Feature: A set of closely related CISs with related responsibilities • New Features: Features in a new version not in previous versions • Totally Modified Features: Features that are so drastically changed in a new version that this change cannot be modeled by an incremental change; simple firewall cannot be used. Software Under Test • Two versions of Real Player (RP) and RealJukeBox (RJB): RP7/RJB1, RP8/RJB2 • 13 features; RP7: 208 obj, 67 CIS, 67 des. tests, 137 impl. tests; RJB1: 117 obj, 30 CIS, 31 des. tests, 79 impl. tests • 16 features; RP8: 246 obj, 80 CIS, 92 des. tests, 176 impl. tests; RJB2: 182 obj, 66 CIS, 127 des. tests, 310 impl. tests. RP7/RJB1 RP8/RJB2 8 Features 21 Faults 17 Faults Firewall 59 Faults 16 Features 5 Totally Modified Features Tested from Scratch by T2 0 Faults 53 Faults in Original System 3 New Features Tested by T1 Figure 4 Distribution of Faults Obtained by Testers T1 and T2 Failures Identified in Version1, Version2 • We could identify identical failures in Version1 and Version2. • This resulted in 9 failures in Version2, and 7 failures in Version1 not matched. • The challenge here was to show which pair of failures might be due to the same fault. Different Failures in Versions V1, V2 for the Same Fault • V1: View track in RJB, freezes if album cover included • V2: View track in RJB, loses album cover • Env. Problem: Graphical settings needed from V2 for testing V1 Different Failures (cont) • V1: Add/Remove channels in RP does not work when RJB is also running • V2: Add/Remove channels lose previous items • Env. Problem: Personal browser used in V1, but V2 uses a special RJB browser Different Failures (cont) • V1: No failure present • V2: In RP, Pressing forward crashes system before playing stream file • Env. Problem: Forward button can only be pressed during play in V1, but in V2, Forward botton can be selected at any time; regression now finds this fault Conclusions for Issue #1 • The use of memory tools illustrated extensive observability problems in testing GUI systems: • In testing four commercial GUI systems: 34% were missed without use of this tool. • In regression testing, 85% & 90% missed. • Implication: GUI testing can miss defects or surprises (or produce minor failures). Conclusions for Issue #2 • Defects manifested as different failures (or not at all) in different environments: • Discussed in regression testing study • Also observed in testing case studies, as well as for testing in different HW/SW environments. Implication for Issue #2 • When testing, you think you understand what failures will occur for certain tests & defects for the same software. But you don’t know what failures (if any) will be seen by the user in another environment. Conclusions for Issue #3 • Difference between design and implementation tests are due to nondesign transitions in actual FSMs for each GUI CIS: • Observed in both case studies • Implication: Faults are commonly associated with these unknown FSM transitions, and are not due to the design. Question for the Audience • Are these same three effects valid to this extent for software other than just GUI systems? • If so, then why haven’t we seen lots of reports and papers in the software literature reporting this fact?