PowerPoint slides

advertisement
Extreme Metrics Analysis for Fun and Profit
Paul Below
Agenda
• Statistical Thinking
• Metrics Use: Reporting and Analysis
• Measuring Process Improvement
• Surveys and Sampling
• Organizational Measures
February 20, 2003
2
Agenda
Statistical Thinking
“Experiments should be
reproducible. They should
all fail in the same way.”
February 20, 2003
3
Statistical Thinking
• You already use it, at home and at work
• We generalize in everyday thinking
• Often, our generalizations or predictions are
wrong
February 20, 2003
4
Uses for Statistics
• Summarize our experiences so others can
understand
• Use information to make predictions or
estimates
• Goal is to do this more precisely than we
would in everyday conversation
February 20, 2003
5
Listen for Questions
• We are not used to using numbers in professional
lives
– “What does this mean?”
– “What should we do with this?”
• We need to take advantage of our past experience
February 20, 2003
6
Statistical Thinking
is more important than methods or technology
Analysis is iterative, not one shot
Data
Induction
I
Deduction
I
D
D
Model
Learning
February 20, 2003
(Modification of Shewhart/Deming cycle by
George Box, 2000 Deming lecture,
Statistics for Discovery)
7
Agenda
Metrics Use:
Reporting and Analysis
"It ain't so much the things we don't
know that get us in trouble. It's the
things we know that ain't so."
Artemus Ward, 19th Century
American Humorist
February 20, 2003
8
Purpose of Metrics
• The purpose of metrics is to take action. All types of
analysis and reporting have the same high-level goal: to
provide information to people who will act upon that
information and thereby benefit.
• Metrics offer a means to describe an activity in a quantitative
form that would allow a knowledgeable person to make
rational decisions. However,
– Good statistical inference on bad data is no help.
– Bad statistical analysis, even on the right variable, is still
bad statistics.
February 20, 2003
9
Therefore…
• Metrics use requires implemented processes for:
– metrics collection,
– reporting requirements determination,
– metrics analysis, and
– metrics reporting.
February 20, 2003
10
Types of Metrics Use
“You go to your tailor for a suit of clothes and the first
thing that he does is make some measurements;
you go to your physician because you are ill and
the first thing he does is make some
measurements. The objects of making
measurements in these two cases are different.
They typify the two general objects of making
measurements. They are: (a) To obtain
quantitative information (b) To obtain a causal
explanation of observed phenomena.”
Walter Shewhart
February 20, 2003
11
The Four Types of Analysis
1. Ad hoc: Answer specific questions, usually in a short time
frame. Example: Sales support
2. Reporting: Generate predefined output (graphs, tables)
and publish or disseminate to defined audience, either on
demand or on regular schedule.
3. Analysis: Use statistics and statistical thinking to
investigate questions and reach conclusions. The questions
are usually analytical (e.g., “Why?” or “How many will there
be?”) in nature.
4. Data Mining: Data mining starts with data definition and
cleansing, followed by automated knowledge extraction from
historical data. Finally, analysis and expert review of the
results is required.
February 20, 2003
12
Body of Knowledge (suggestions)
• Reporting
– Database query languages, distributed
databases, query tools, graphical techniques,
OLAP, Six Sigma Green Belt (or Black Belt),
Goal-Question-Metric
• Analysis
– Statistics and statistical thinking, graphical
techniques, database query languages, Six
Sigma black belt, CSQE, CSQA
• Data Mining
– Data mining, OLAP, data warehousing, statistics
February 20, 2003
13
Analysis Decision Tree
Type of
Question
?
Enumerative
Analytical
Factors
Analyzed:
Few
Many
One
Time?
Yes No
Ad hoc
February 20, 2003
Reporting
Analysis
Data Mining
and Analysis
14
Extreme Programming
February 20, 2003
15
Extreme Analysis
• Short deadlines, small releases
• Overall high level purposes defined up front, prior to analysis
start
• Specific questions prioritized prior to analysis start
• Iterative approach with frequent stakeholder reviews to obtain
interim feedback and new direction
• Peer synergy, metrics analysts work in pairs.
• Advanced query and analysis tools, saved work can be
reused in future engagements
• Data warehousing techniques, combining data from multiple
sources where possible
• Data cleansing done prior to analysis start (as much as
possible)
• Collective ownership of the results
February 20, 2003
16
Extreme Analysis Tips
Produce clean graphs and tables displaying important information.
These can be used by various people for multiple purposes.
Explanations should be clear, organization should make it easy to find
information of interest. However,
It takes too long to analyze everything -- we cannot expect to produce
interpretations for every graph we produce. And even when we do, the
results are superficial because we don't have time to dig into everything.
"Special analysis", where we focus in on one topic at a time, and study it
in depth, is a good idea. Both because we can complete it in a
reasonable time, and also because the result should be something of use
to the audience.
Therefore, ongoing feedback from the audience is crucial to obtaining
useful results
February 20, 2003
17
Agenda
Measuring Process Improvement
“Is there any way that the
data can show
improvement when things
aren’t improving?” -Robert Grady
February 20, 2003
18
Measuring Process
Improvement
• Analysis can determine if a perceived difference
could be attributed to random variation
• Inferential techniques are commonly used in other
fields, we have used them in software engineering
for years
• This is an overview, not a training class
February 20, 2003
19
Expand our Set of Techniques
Metrics are used for:
• Benchmarking
• Process improvement
• Prediction and trend analysis
• Business decisions
• …all of which require confidence analysis!
Is This a Meaningful Difference?
Relative Performance
2.0
1.5
1.0
0.5
0
1
2
3
CMM Maturity Level
February 20, 2003
21
Pressure to Product Results
• Why doesn’t the data show
improvement?
• “Take another sample!”
• Good inference on bad data is
no help
“If you torture the data
long enough, it will
confess.” -- Ronald
Coase
February 20, 2003
22
Types of Studies
Anecdote  Case Study  Quasi-experimental  Experiment
• Anecdote: “I heard it worked once”, cargo
cult mentality
• Case Study: some internal validity
• Quasi-Experiment: can demonstrate
external validity
• Experiment: can be repeated, need to be
carefully designed and controlled
February 20, 2003
23
Attributes of Experiments
Subject  Treatment  Reaction
• Random Assignment
• Blocked and Unblocked
• Single Factor and Multi Factor
• Census or Sample
• Double Blind
• When you really have to prove causation
(can be expensive)
February 20, 2003
24
Limitations of
Retrospective Studies
• No pretest, we use previous data from
similar past projects
• No random assignment possible
• No control group
• Cannot custom design metrics (have to use
what you have)
February 20, 2003
25
Quasi-Experimental Designs
• There are many variations
• Common theme is to increase internal
validity through reasonable comparisons
between groups
• Useful when formal experiment is not
possible
• Can address some limitations of
retrospective studies
February 20, 2003
26
Causation in Absence of Experiment
• Strength and consistency of the association
• Temporal relationship
• Non-spuriousness
• Theoretical adequacy
February 20, 2003
27
What Should We Look For?
Are the Conclusions Warranted?
Some information to accompany claims:
•measure of variation
•sample size
•confidence intervals
•data collection methods used
•sources
•analysis methods
February 20, 2003
28
Decision Without Analysis
• Conclusions may be wrong or misleading
• Observed effects tend to be unexplainable
• Statistics allows us to make honest,
verifiable conclusions from data
February 20, 2003
29
Types of Confidence Analysis
Variables
February 20, 2003
Quantitative
Categorical
Correlation
Two-Way Tables
30
Two Techniques We Use Frequently
• Inference for difference between two
means
– Works for quantitative variables
– Compute confidence interval for the
difference between the means
• Inference for two-way tables
– Works for categorical variables
– Compare actual and expected counts
February 20, 2003
31
Quantitative Variables
Comparison of means of quartiles 2 and 4 yields p value of
88.2%, not a significant difference at 95% level)
Project Productivity
ISBSG release 6
1.0
.9
.8
.7
.6
AFP per hour
.5
.4
.3
.2
.1
0.0
N=
February 20, 2003
119
120
120
119
1
2
3
4
Quartiles of Project Size
32
Categorical Variables
P value is approximately 50%
Effort
Low PM
Variance
Met
3
Not Met
February 20, 2003
9
Medium
PM
6
High PM
10
9
7
33
Categorical Variables
P value is greater than 99.9%
Date
Low PM
Variance
Met
2
Not Met
February 20, 2003
10
Medium
PM
10
High PM
6
3
13
34
Expressing the Results “in English”
• “We are 95% certain that the difference in
average productivity for these two project
types is between 11 and 21 FP/PM.”
• “Some project types have a greater
likelihood of cancellation than other types,
we would be unlikely to see these results
by chance.”
February 20, 2003
35
What if...
• Current data is insufficient
• Experiment can not be done
• Direct observation or 100% collection cannot be
done
• or, lower level information is needed?
February 20, 2003
36
Agenda
Surveys and Samples
In a scientific survey every
person in the population
has some known positive
probability of being
selected.
February 20, 2003
37
What is a Survey?
• A way to gather information about a
population from a sample of that
population
• Varying purposes
• Different ways:
– telephone
– mail
– internet
– in person
February 20, 2003
38
What is a Sample?
• Representative fraction of the
population
• Random selection
• Can reliably project to the larger
population
February 20, 2003
39
What is a Margin of Error?
• An estimate from a survey is unlikely to
exactly equal to quantity of interest
• Sampling error means results differ from a
target population due to “luck of the draw”
• Margin of error depends on sample size
and sample design
February 20, 2003
40
What Makes a Sample Unrepresentative?
• Subjective or arbitrary selection
• Respondents are volunteers
• Questionable intent
February 20, 2003
41
How Large Should the Sample Be?
• What do you want to learn?
• How reliable must the result be?
– Size of population is not important
– 1500 people is reliable enough for entire U.S.
• How large CAN it be?
February 20, 2003
42
“Dewey Defeats Truman”
• Prominent example of a poorly conceived survey
• 1948 pre-election poll
• Main flaw: non-representative sample
• 2000 election: methods not modified to new
situation
February 20, 2003
43
Is Flawed Sample the Only Type of
Problem That Happens?
• Non-response
• Measurement difficulties
• Design problems, leading questions
• Analysis problems
February 20, 2003
44
Some Remedies
• Stratify sample
• Adjust for incomplete coverage
• Maximize response rate
• Test questions for
– clarity
– objectivity
• Train interviewers
February 20, 2003
45
Agenda
Organizational Measures
“Whether measurement is
intended to motivate or to
provide information, or both,
turns out to be very
important.” -- Robert Austin
February 20, 2003
46
Dysfunctional Measures
• Disconnect between measure and goal
– Can one get worse while the other gets
better?
• Is one measure used for two incompatible
goals?
• The two general types of measurement
are...
February 20, 2003
47
Measurement in Organizations
• Motivational Measurements
– intended to affect the people being measured, to
provoke greater expenditure of effort in pursuit of
org’s goals
• Informational Measurements
– logistical, status, or research information, provide
insight to provide short term management and
long term improvement
February 20, 2003
48
Informational Measurements
• Process Refinement Measurements
– reveals detailed structure of processes
• Coordination Measurements
– logistical purpose
February 20, 2003
49
Mixed Measurements
The desire to be viewed favorably provides an incentive for
people being measured to tailor, supplement, repackage, or
censor information that flows upward.
• “Dashboard” concept is incomplete
• We have Gremlins
February 20, 2003
50
The Right Kind of Culture
Internal or external motivation?
• Ask yourself what is driving the people around you
to do a good job:
– Do they identify with the organization and fellow
team members? (Work hard to avoid letting
coworkers down)
– Are they only focused on the next performance
review and getting a big raise?
February 20, 2003
51
Why is this important?
• Each of us makes dozens of small decisions
each day
– Motivational measures influence us
– These small decisions add up to large
impacts
• Are these decisions aligned with the
organization’s goals?
February 20, 2003
52
Conclusion: It Has Been Done
• There are organizations in which people have
given themselves completely to pursuit of
organizational goals
• These people want measurements as a tool that
helps get the job done
• If this is your organization, fight hard to keep it
February 20, 2003
53
A Few Selected Resources:
• Measuring and Managing Performance in Organizations,
Robert D. Austin, 1996.
• Schaum’s Outlines: Business Statistics, Leonard J.
Kazmier, 1996.
• International Software Benchmarking Standards Group,
http://www.isbsg.org.au
• American Statistical Association,
http://www.amstat.org/education/Curriculum_Guidelines.html
• Graphical techniques books by E. Tufte
• Contact a statistician for help
February 20, 2003
54
eds.com
Download