Statistics, Probability, and Making Decision

advertisement
Statistics, Probability,
and Decision Making
Statistics, Probability and Decision Making
1
Trial
Length
1
25.45
2
25.40
3
25.50
4
25.42
5
25.38
Mean
25.44
Which trial represents
the length?
Most feel the mean is
the best estimate.
Statistics, Probability and Decision Making
2
How Precise is the Estimate?
You decide that the length is 25.43.
But look at the measurements.
Is 25.50 a misfit?
Statistics, Probability and Decision Making
3
What about an unexpected value?
• Get rid of it…
• No, you need a statistical
reason !
• Only if it was a mistake.
Statistics, Probability and Decision Making
4
Is it a mistake?
An outlier: A single
observation "far
away" from the rest.
Q: How far away is “far away”?
A: It depends on whether the
value differs from the rest
within a “reasonable” range.
Statistics, Probability and Decision Making
5
Decisions, decisions…
Statistics, Probability and Decision Making
6
Rejecting Data in a Small Data Set
Trial
1
2
3
4
5
Mean
Length
25.45
Run the “Q-test.”
25.40
To test 25.50, calculate Q.
25.50
25.42
25.38
25.44
Q = (The suspect - the value closest to it)
Range
Q = 0.05 ÷ 0.12 = ≈ 0.42
Statistics, Probability and Decision Making
7
Compare Qcalculated with Qcritical
Qcritical
90% confidence
Number of
trials
0.94
0.76
0.64
0.56
0.51
0.47
0.44
0.41
3
4
5
6
7
8
9
10
• If Qcalc > Qcritical, reject.
• If Qcalc < Qcritical, keep .
Statistics, Probability and Decision Making
8
From the previous example…
Qcalc = 0.42
N = 5, Qcritical = 0.64
• If Qcalc > Qcritical
• If Qcalc < Qcritical
Statistics, Probability and Decision Making
9
Rejecting data in a large set
Use a Normal Distribution
• Find the confidence
interval
µ ± 3 σ
95% of the data falls within
two standard deviations of
the mean.
• Does measurement
falls outside the
confidence interval?
Statistics, Probability and Decision Making
10
Outliers…
Q:
A:
Why worry about them?
Q: Where do they come from?
Values may not be properly A: Possible sources:
distributed.
1. Recording and
measurement errors
2. Incorrect distribution
3. Unknown data structure
Note: Outliers are in red
Statistics, Probability and Decision Making
11
Managing Outliers
If the data is a normal distribution:
1.
Calculate the mean and the standard deviation.
2.
Find the ±3 standard deviation range for
imposing limits on the data.
3.
Identify outliers (greater ± 3 standard deviations).
4.
Get rid of them!!!
Statistics, Probability and Decision Making
12
Download