Slides - Computer Science & Engineering

advertisement
Experimentation in Computer
Science – Part 3
1
Experimentation in Software
Engineering --- Outline
 Empirical Strategies
 Measurement
 Experiment Process (Continued)
2
Experiment Process:
Phases
Experiment
Idea
Experiment
Process
Experiment
Definition
Experiment
Planning
Experiment
E
Operation
Analysis &
Interpretation
Presentation
& Package
Conclusions
Experiment Planning:
Overview
Experiment
Definition
Experiment
Planning
Context
Selection
Hypothesis
Formulation
Variables
Selection
Selection of
Subjects
Validity
Evaluation
Experiment
Operation
Instrumentation
Experiment
Design
4
Experiment Planning:
Instrumentation
 Instrumentation types:
 Objects (e.g., specs, code)
 Guidelines (e.g., process descriptions, checklists,
tutorial documents)
 Measurement instruments (surveys, forms,
automated data collection tools)
 Overall goal of instrumentation: facilitate its
performance without affecting control
(instrumentation must not affect outcomes)
Experiment Planning:
Validity Evaluation
 Threats to external validity concern the ability to
generalize results outside the experimental setting
 Threats to internal validity concern the ability to conclude
that a causal effect exists between independent and
dependent variables
 Threats to construct validity concern the extent to which
variables and measures accurately reflect the constructs
under study.
 Threats to conclusion validity concern issues that affect
our ability to draw accurate statistical conclusions
Experiment Planning:
Process and Threats Related
Theory
(hypothesis)
cause-effect
construct
Cause
construct
Effect
construct
treatment-outcome
construct
Observation
Treatment
Independent variable
Outcome
Dependent variable
Experiment Planning:
Process and Threats Related
Theory
(hypothesis)
cause-effect
construct
Cause
construct
external
Effect
construct
construct
construct
treatment-outcome
construct
Observation
Treatment
Independent variable
internal
conclusion
Outcome
Dependent variable
Experiment Planning:
Threats to External Validity
 Population: subject population not representative
of population we wish to generalize to
 Place: experimental setting or materials not
representative of setting we wish to generalize to
 Time: experiment is conducted at a time that
affects results
Reduce external validity threats in a given experiment by making
environment as realistic as possible; however, reality is not
homogenous, so important to report environment characterisitics.
Reduce external validity threats long-term through replication.
Experiment Planning:
Threats to Internal Validity
 Instrumentation: measurement tools report
inaccurately or affect results
 Selection: groups selected are not equivalent
 Learning: subjects learn over the course of the
experiment, altering later results
 Mortality: subjects drop out of the experiment
 Social Effects: e.g., control group resents
treatment group (demoralization or rivalry)
Reduce internal threats through careful experiment design.
Experiment Planning:
Threats to Construct Validity
 Inadequate preoperational explication of constructs:
theory isn’t clear enough (e.g. what is “better”)
 Mono-operation or mono-method bias: using a single
independent variable, case, subject, treatment, or
measure may under-represent constructs
 Levels of constructs: using incorrect levels of constructs
may confound presence of construct with its level
 Integration of testing and treatment: testing itself makes
subjects sensitive to treatment; test is part of treatment
 Social effects: experimenter expectancy, evaluation
apprehension, hypothesis guessing
Reduce construct threats through careful design, and replication.
Experiment Planning:
Threats to Conclusion Validity
 Low statistical power: increases risk of being unable to
reject a false null hypothesis
 Violated assumptions of statistical tests: some tests have
assumptions, e.g. about normally distributed and
independent samples
 Fishing: searching for a specific result causes analyses to
not be independent, and researchers may influence
results by seeking specific outcomes
 Reliability of measures: if you can’t measure the result
twice with equal outcomes, measures aren’t reliable
Reduce conclusion validity threats through careful design, and
perhaps through consultation with statistical experts
Experiment Planning:
Priorities Among Validity Threats
 Decreasing some types of threats may cause others to
increase. (E.g. using CS students increases group size,
reduces heterogeneity, aids conclusion validity, reduces
external validity.)
 Tradeoffs need to be considered for type of study:
 Theory testing is more interested in internal and construct validity
than external
 Applied experimentation is more interested in external and
possibly conclusion validity
Experiment Process:
Phases
Experiment
Idea
Experiment
Process
Experiment
Definition
Experiment
Planning
Experiment
E
Operation
Analysis &
Interpretation
Presentation
& Package
Conclusions
Experiment Operation:
Overview
 Experiment operation: carrying out the actual
experiment and collecting data
 Three phases:
 Preparation
 Execution
 Data validation
15
Experiment Operation:
Preparation
 Locate participants
 Offer inducements to obtain participants
 Obtain participant consent, maybe also IRB approval
 Consider confidentiality (maintain it, inform
participants about it)
 Avoid deception where it affects participants, reveal it
later discussing necessity (beware validity tradeoffs;
providing information is good but may affect results)
 Prepare instrumentation
 Objects, guidelines, tools, forms
 Use pilot studies and walkthroughs to reduce threats
16
Experiment Operation:
Execution
 Execution might take place over a small set of
specified occasions, or across a long time span
 Data collection takes place: subjects or
interviewers fill out forms, tools collect metrics
 Consider interaction between experiment and
environment, e.g., if experiment is being
performed in-vivo, watch for confounding effects
(experiment process altering behavior)
17
Experiment Operation:
Data Validation
 Verify that data has been collected correctly
 Verify that data is reasonable
 Consider whether outliers exist and should be
removed (must be for good reasons)
 Verify that experiment was conducted as
intended
 Post-experiment questionnaires can assess
whether subjects understood instructions
18
Experiment Process:
Phases
Experiment
Idea
Experiment
Process
Experiment
Definition
Experiment
Planning
Experiment
E
Operation
Analysis &
Interpretation
Presentation
& Package
Conclusions
Analysis and Interpretation:
Overview
 Quantitative interpretation can include:
 Descriptive statistics: describe and graphically
present data set, used before hypothesis testing to
better understand data and identify outliers
 Data set reduction: locate and possibly remove
anomalous data points
 Hypothesis testing: apply statistical tests to determine
whether the null hypothesis can be rejected
20
Analysis and Interpretation:
Visualizing Data Sets
 Graphs are effective ways to provide an
overview of a data set
 Basic graphs types for use in visualization:






Scatter plots
Box plots
Line plots
Bar charts
Cumulative bar charts
Pie charts
21
Analysis and Interpretation:
Data Set Reduction
 Hypothesis testing techniques depend on quality of data
set; data set reduction improves data set quality by
removing anomalous data (outliers)
 Outliers can be removed, but only for reasons such as
that they represent rare events not likely to occur again
 Scatter plots can help find outliers
 Statistical tests can determine probabilities that points are outliers
 Sometimes redundant data is not easily analyzed, if the
redundancy is too large; factor analysis and principal
components analysis can identify orthogonal factors with
which to replace redundant factors
22
Analysis and Interpretation:
Hypothesis Testing
 Hypothesis testing: can we reject H0?
 If statistical tests say we can’t, we draw no conclusions
 If tests say we can, H0 is false with a given significance
 = P(type-I-error) = P(reject H0 | H0 is true).
 We also calculate p-value : the lowest possible
significance with which we can reject H0
 Typically,  is 0.05; to claim significance  must be < 
23
Analysis and Interpretation:
Statistical Tests per Design
Design
Parametric
One factor, one treatment
Non-parametric
Chi-2
Binomial test
One factor, two treatments,
completely randomized
t-test
f-test
Mann-Whitney
Chi-2
One factor, two treatments,
paired comparison
paired t-test
Wilcoxon
Sign test
One factor, more than two
treatments
ANOVA
Kruskal-Wallis
Chi-2
More than one factor
ANOVA
24
Analysis and Interpretation:
Statistical Tests
 Important to choose the right test - type of data
must be appropriate
 are data items paired or not?
 is data normally distributed or not?
 are data sets completely independent or not?
 Take a stats course, see texts such as
Montgomery, consult with statisticians, use
statistical packages
25
Analysis and Interpretation:
Statistical vs Practical Significance
 Statistical significance does not imply practical
importance. E.g. if T1 is shown with statistical
significance to be 1% more effective than T2, it
must still be decided whether 1% matters
 Lack of statistical significance does not imply
lack of practical importance. The fact that H0
cannot be rejected at level  does not mean that
H0 is true, and results of high practical
importance may justify using a lower 
26
Experiment Process:
Phases
Experiment
Idea
Experiment
Process
Experiment
Definition
Experiment
Planning
Experiment
E
Operation
Analysis &
Interpretation
Presentation
& Package
Conclusions
Presentation:
An Outline for an Experiment Report
1. Introduction, Motivation
2. Background, Prior Work
3. Empirical Study
3.0 Research Questions
3.1 Objects of analysis
3.1.1 participants
3.1.2 objects
3.2 Variables and
measures
3.2.1 independent variables
3.2.2 dependent variables
3.2.3 other factors
3.3 Experiment setup
3.3.1 setup details
3.3.2 operational details
3.4 Analysis strategy
3.5 Threats to validity
3.6 Data and analysis
4. Interpretation
5. Conclusions
28
Presentation Issues
•
•
•
•
Supporting replicability.
What to say and what not to say?
How much to say?
Describing design decisions
Download