presentation slides can be found here.

advertisement
Skoll: A System for Distributed Continuous Quality
Assurance
Atif Memon & Adam Porter
University of Maryland
{atif,aporter}@cs.umd.edu
Quality Assurance for Large-Scale Systems
• Modern systems increasingly complex
– Run on numerous platform, compiler & library combinations
– Have 10’s, 100’s, even 1000’s of configuration options
– Are evolved incrementally by geographically-distributed teams
– Run atop of other frequently changing systems
– Have multi-faceted quality objectives
• How do you QA systems like this?
2
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Distributed Continuous Quality Assurance
• QA processes conducted around-the-world, around-the-clock
on powerful, virtual computing grids
– Grids can by made up of end-user machines, project-wide
resources or dedicated computing clusters
• General Approach
– Divide QA processes into numerous tasks
– Intelligently distribute tasks to clients who then execute them
– Merge and analyze incremental results to efficiently complete
desired QA process
• Expected benefits
– Massive parallelization allows more, better & faster QA
– Improved access to resources/environs. not readily found in-house
– Carefully coordinated QA efforts enables more sophisticated
analyses
3
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Collaborators
Doug Schmidt & Andy Gohkale
Group
Alex Orso
Myra Cohen
Murali Haran, Alan Karr, Mike Last, & Ashish Sanil
Sandro Fouché, Alan Sussman, Cemal Yilmaz (now at
IBM TJ Watson) & Il-Chul Yoon
4
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Skoll DCQA Infrastructure & Approach
Clients
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
5
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Skoll DCQA Infrastructure & Approach
Clients
1. Model
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
6
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Skoll DCQA Infrastructure & Approach
Clients
2. Reduce Model
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
7
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Skoll DCQA Infrastructure & Approach
Test
TestResources
Request
Clients
3. Distribution
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
8
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Skoll DCQA Infrastructure & Approach
Test Results
Clients
4. Feedback
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
9
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Skoll DCQA Infrastructure & Approach
Clients
5. Steering
See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August 2007, 33(8), pp. 510-525.
10
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
The ACE+TAO+CIAO (ATC) System
• ATC characteristics
– 2M+ line open-source CORBA implementation
– maintained by 40+, geographically-distributed developers
– 20,000+ users worldwide
– Product Line Architecture with 500+ configuration options
• runs on dozens of OS and compiler combinations
– Continuously evolving – 200+ CVS commits per week
– Quality concerns include correctness, QoS, footprint, compilation
time & more
11
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Define QA Space
Options
Type
Settings
Operating System
compile-time
{Linux, Windows XP, ….}
TAO_HAS_MINIMUM_CORBA
compile-time
{True, False}
ORBCollocation
runtime
{global, per-orb, no}
ORBConnectionPurgingStrategy
runtime
{lru, lfu, fifo, null}
ACE_version
component version
{v5.4.3, v5.4.4,…}
TAO_version
component version
{v1.4.3, v1.4.4,…}
run(ORT/run_test.pl)
test case
{True, False}
Constraints
(TAO_HAS_AMI)  (¬TAO_HAS_MINIMUM_CORBA)
run(ORT/run_test.pl) (¬TAO_HAS_MINIMUM_CORBA)
12
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Nearest Neighbor Search
13
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Nearest Neighbor Search
14
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Nearest Neighbor Search
15
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Nearest Neighbor Search
16
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Fault Characterization
• We used machine learning techniques (classification trees) to
model option & setting patterns that predict test failures
CORBA_MESSAGING
|
1
0
AMI_POLLER
AMI
1
0
AMI_CALLBACK
1
ERR-1
0
OK
17
OK
1
ERR-2
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
0
ERR-3
Applications & Feasibility Studies
• Compatibility testing of component-based systems
• Configuration-level fault characterization
• Test case generation & input space exploration
18
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Compatibility Testing of Comp-Based Systems
Goal
• Given a component-based system, identify components & their
specific versions that fail to build
Solution Approach
• Sample the configuration space, efficiently test this sample &
identify subspaces in which compilation & installation fails
– Initial focus on building & installing components. Later work will
add functional and performance testing
See: I. Yoon, A. Sussman, A. Memon & A. Porter, Direct-Dependency-based
Software Compatibility Testing. International Conference on Automated Software
Engineering, Nov. 2007 (to appear).
19
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
The InterComm (IC) Framework
• Middleware for coupling large scientific simulations
– Built from up to 14 other components (e.g., PVM, MPI, GCC, OS)
– Each comp can have several actively maintained versions
– There are complex constraints between components, e.g.,
• Requires GCC version 2.96 or later
• When configured with multiple GNU compilers, all must have the
same version number
• When configured with multiple comps that use MPI, all must use the
same implementation & version
– http://www.cs.umd.edu/projects/hpsl/chaos/ResearchAreas/ic
• Developers need help to
– Identify working/broken configurations
– Broaden working set (to increase potential user base)
– Rationally manage support activities
20
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Annotated Comp. Dependency Graph
• ACDG = (CDG, Ann)
– CDG: DAG capturing inter-comp deps
– Ann: comp. versions & constraints
• Constraints for each cfg, e.g.,
– ver (gf) = x  ver (gcr) = x
– ver (gf) = 4.1.1  ver (gmp) ≥ 4.0
• Can generate cfgs from ACDG
– 3552 total cfgs. Takes up to ~10,700
CPU hrs to build all
21
Comp
ic
ap
pvm
lam
mch
gf
gf77
pf
gxx
pxx
mpfr
gmp
pc
pc
gcr
gcr
fc
fc
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Vers.
Description
1.5
InterComm
0.7.9
Array
mgmt
3.2.6, 3.3.11,
3.4,5
Parallel
data comm.
6.5.9,
7.0.6,
MPI
impl7.1.3
1.2.7
MPI
impl
4.0.3,
4.1.195
GNU
Fortran
3.3.6,
3.4.677
GNU
Fortran
6.2
PGI Fortran
3.3.6, GNU
3.4.6, C++
4.0.3, 4.1.1
PGI6.2
C++
High -prec.2.2.0
floating-point
4.2.1
Arbitrary
prec. arith
6.2
PGI C
3.3.6, 3.4.6,
GNU4.0.3,
C 4.1.1
4.0Linux OS
Fedora Core
Improving Test Execution
• Cfgs often share common build subsequences. This build effort
should be reusable across cfgs
• Combine all cfgs into a data structure called a prefix tree
• Execute implied test plan across grid by (1) assigning subpaths
to clients, (2) building each subcfg in a VM & caching the VMs
to enable reuse
• Example: with 8 machines each able to cache up to 8 VMs
exhaustive testing takes up to 355 hours
fc 4.0
gcr v4.0.3
gmp v4.2.1
22
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
pf/6.2
Direct-Dependency (DD) Coverage
• Hypothesis: A comp’s build process is most likely to be affected by the
comps on which it directly depends
– A directly depends on B iff there is a path (in CDG) from A to B containing
no comp nodes
• Sampling approach
– Identify all DDs between every pair of components
– Identify all valid instantiations of these DDs (ver. combs that violate no
constraints)
– Select a (small) set of cfgs that cover all valid instantiations of the DDs
23
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Executing the DD Coverage Test Suite
• DD test suite much smaller than exhaustive
– 211 cfgs with 649 comps vs 3552 cfgs with 9919 comps
– For IC, no loss of test eff. (same build failures exposed)
• Speedups achieved using 8 machines w/ 8 VM cache
– Actual case: 2.54 (18 vs 43 hrs)
– Best case: 14.69 (52 vs 355 hrs)
24
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Summary
• Infrastructure in place & working
– Complete client/server implementation using VMware
– Simulator for large scale tests on limited resources
• Initial results promising, but lots of work remains
• Ongoing activities
– Alternative algorithms & test execution policies
– More theoretical study of sampling & test exec approaches
– Apply to more software systems
25
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Configuration-Level Fault Characterization
Goal
• Help developers localize configuration-related faults
Current Solution Approach
• Use covering arrays to sample the cfg space to test for
subspaces in which (1) compilation fails or (2) reg. tests fail
• Build models that characterize the configuration options and
specific settings that define the failing subspace
See: C. Yilmaz, M. Cohen, A. Porter, Covering Arrays for Efficient Fault
Characterization in Complex Configuration Spaces, ISSTA’04, TSE v32 (1)
26
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Covering Arrays
• Compute test schedule from t-way covering arrays
– a set of configurations in which all ordered t-tuples of option
settings appear at least once
• 2-way covering array example:
27
Configurations
C3 C4 C5 C6
C1
C2
C7
C8
C9
O1
0
0
0
1
1
1
2
2
2
O2
0
1
2
0
1
2
0
1
2
O3
0
1
2
1
2
0
2
0
1
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Limitations
• Must choose the strength covering array before computing it
– No way to know, a priori, what the right value is
– Our experience suggests failures patterns can change over time
• Choose too high:
– Run more tests than necessary
– Testing might not finish before next release
• Non-uniform sample negatively affects classification performance
• Choose too low:
– Non-uniform sample negatively affects classification techniques
– Must repeat process at higher strength
28
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Incremental Covering Arrays
• Start with traditional covering array(s) of low strength (usually
2)
• Execute test schedule & classify observed failures
• If resources allow or classification performance requires
– Increment strength
– Build new covering array using previously run array(s) as seeds
See: S. Fouche, M. Cohen and A. Porter. Towards Incremental Adaptive Covering
Arrays, ESEC/FSE 2007, (to appear)
29
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Incremental Covering Arrays (cont.)
Multiple CAs at each level of t
• Use t1 as a seed for the first t+1way array (t+11)
• To create the ith t-way array (ti),
create a seed of size = |t-11|
using non-seeded cfgs from t-1i
• If |seed| < |t-11| complete seed
with cfgs from t+11
30
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
MySQL Case Study
• Project background
– Widely-used, 2M+ line open-source database project
– Cont. evolving & maint. by geographically-distributed developers
– Dozens of cfg opts & runs on dozens of OS/compiler combos
• Case study using release 5.0.24
– Used 13 cfg opts with 2-12 settings each (> 110k unique cfgs)
– 460 tests/per config across a grid of 50 machines
– Executed ~50M tests total using ~25 CPU years
31
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Results
• Built 3 trad and incr covering arrays for 2  t  4
– Traditional sizes : 108, 324, 870
– Incremental sizes: 113, 336 (223), 932 (596)
• Incr appr exposed & classified the same failures as trad appr
• Costs depend on t & failures patterns
– Failures at level t
: Inc > Trad (4-9%)
– Failures at level < t : Inc < Trad (65-87%)
– Failures at level > t : Inc < Trad (28-38%)
32
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Summary
• New application driving infrastructure improvements
• Initial results encouraging
– Applied process to a configuration space with over 110K cfgs
– Found many test failures corresponding to real bugs
– Incremental approach more flexible than traditional approach.
Appears to offer substantial savings in best case, while incurring
minimal cost in worst case
• Ongoing extensions
– MySQL continuous build process
– Community involvement starting
– Want to volunteer? Go to http://www.cs.umd.edu
33
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
GUI Test Case – Executable By A “Robot”
• JFCUnit
– Other interactions
– Exponential with
length
• Capture/replay
– Tedious
• Test “common”
sequences
– Bad Idea
• Model-based
techniques
– GUITAR
–
34
guitar.cs.umd.edu
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Modeling
The Event-Interaction Space
Sampling
• Event flow graph (EFG)
– Nodes: all GUI events
• Starting events
– Edges: Follows
• relationship
• Reverse Engineering
Follows
– Obtained Automatically
Follows
• Test case generation
Follows
– Cover all edges
Follows
See: Atif M. Memon and Qing Xie, Studying the Fault-Detection
Effectiveness of GUI Test Cases for Rapidly Evolving Software. IEEE
Transactions on Software Engineering, vol. 31, no. 10, 2005, pp. 884-896.
35
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Lets See How It Works!
• Point to the CVS head
– Push the button
– Read error report
– Gets code from CVS head
– Builds
– Reverse engineers the eventflow graph
– Generates test cases to cover
all the edges
• 2-way covering
– Runs them
• SourceForge.net
Number of Faults Detected
9
• What happens
8
7
6
Crossw ordSage
5
4
FreeMind
3
2
1
0
– Four applications
36
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
GanttProject
JMSN
Digging Deeper!
• Intuition
– Non-interacting events (e.g., Save, Find)
– Interacting events (e.g., Copy, Paste)
• Key Idea
– Identify interacting events
– Mark the
EFG edges
(Annotated graph)
– Generate
3-way, 4-way, … covering
test cases for interacting
EFG
events only
37
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Identifying Interacting Events
• High-level overview of approach
– Observe how events execute on the GUI
– Events interact if they influence one another’s execution
• Execute event e2; execute event sequence <e1, e2>
• Did e1 influence e2’s execution?
• If YES, then they must be tested further; annotate the <e1, e2> edge in graph
• Use feedback
– Generate seed suite
• 2-way covering test cases
– Run test cases
• Need to obtain sets of GUI states
– Collect GUI run-time states as feedback
– Analyze feedback and obtain interacting event sets
– Generate new test cases
• 3-way, 4-way, … covering test cases
38
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Did We Do Better?
• Compare feedback-based approach to 2-way coverage
Number of Faults Detected
9
Crossw ordSage
8
FreeMind
7
GanttProject
6
5
4
JMSN
3
2
1
0
39
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Summary
• Manually developed test cases
– JFCUnit, Capture/replay
– Can be deployed and executed by a “robot”
– Too many interactions to test
• Exponential
• The GUITAR Approach
– Develop a model of all possible interactions
– Use abstraction techniques to “sample” the model
• Develop adequacy criteria
– Generate an “initial test suite”; Develop an “Execute tests – collect
feedback – annotate model – generate tests” cycle
– Feasibility study & results
40
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
Future Work
• Need volunteers for MySQL Build Farm Project
– http://skoll.cs.umd.edu
• Looking for more example systems (help!)
• Continue improving Skoll system
• New problem classes
– Performance and robustness optimization
– Improved use of test data
• Test case ROI analysis
• Configuration advice
• Cost-aware testing (e.g., minimize power, network, disk)
• Use source code analysis to further reduce state spaces
• Extend test generation technology outside GUI applications
• QA for distributed systems
41
http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}
The End
Download