Statistics, Probability, and Decision Making Statistics, Probability and Decision Making 1 Trial Length 1 25.45 2 25.40 3 25.50 4 25.42 5 25.38 Mean 25.44 Which trial represents the length? Most feel the mean is the best estimate. Statistics, Probability and Decision Making 2 How Precise is the Estimate? You decide that the length is 25.43. But look at the measurements. Is 25.50 a misfit? Statistics, Probability and Decision Making 3 What about an unexpected value? • Get rid of it… • No, you need a statistical reason ! • Only if it was a mistake. Statistics, Probability and Decision Making 4 Is it a mistake? An outlier: A single observation "far away" from the rest. Q: How far away is “far away”? A: It depends on whether the value differs from the rest within a “reasonable” range. Statistics, Probability and Decision Making 5 Decisions, decisions… Statistics, Probability and Decision Making 6 Rejecting Data in a Small Data Set Trial 1 2 3 4 5 Mean Length 25.45 Run the “Q-test.” 25.40 To test 25.50, calculate Q. 25.50 25.42 25.38 25.44 Q = (The suspect - the value closest to it) Range Q = 0.05 ÷ 0.12 = ≈ 0.42 Statistics, Probability and Decision Making 7 Compare Qcalculated with Qcritical Qcritical 90% confidence Number of trials 0.94 0.76 0.64 0.56 0.51 0.47 0.44 0.41 3 4 5 6 7 8 9 10 • If Qcalc > Qcritical, reject. • If Qcalc < Qcritical, keep . Statistics, Probability and Decision Making 8 From the previous example… Qcalc = 0.42 N = 5, Qcritical = 0.64 • If Qcalc > Qcritical • If Qcalc < Qcritical Statistics, Probability and Decision Making 9 Rejecting data in a large set Use a Normal Distribution • Find the confidence interval µ ± 3 σ 95% of the data falls within two standard deviations of the mean. • Does measurement falls outside the confidence interval? Statistics, Probability and Decision Making 10 Outliers… Q: A: Why worry about them? Q: Where do they come from? Values may not be properly A: Possible sources: distributed. 1. Recording and measurement errors 2. Incorrect distribution 3. Unknown data structure Note: Outliers are in red Statistics, Probability and Decision Making 11 Managing Outliers If the data is a normal distribution: 1. Calculate the mean and the standard deviation. 2. Find the ±3 standard deviation range for imposing limits on the data. 3. Identify outliers (greater ± 3 standard deviations). 4. Get rid of them!!! Statistics, Probability and Decision Making 12