Random Variable X=Amt in next bottle X=#G in 4 Probability Distribution N(μ=10.2,σ=0.16) B(n=4,p=.5) X=total of 2 tossed dice We must admit that we cannot know exactly what value X will take…. Summary Characteristics 2 3 4 5 6 7 8 9 10 11 12 …so that we can do the intelligent thing and talk about something we CAN know, the probability distribution of X. Mean Median Mode Std dev Variance skew There are summary characteristics of any probability distribution… But knowing these summary measures do not replace our need to know the probability distribution. Class 06: Descriptive Statistics EMBS: 3.1, 3.2, first part of 3.3 Characteristics of probability distributions • Measures of Location – Mean – Median – Mode • Measures of Variability – Standard Deviation – Variance • Measure of Shape – skewness Descriptive Statistics (for numerical data) • Measures of Location – Sample Mean – Sample Median – Sample Mode • Measures of Variability – Sample StDev – Sample Variance • Measure of Shape – Sample skewness A positively-skewed pdf Mode is the most likely value P(X<median) = P(X>median) = 0.5 Mean is the probabilityweighted average Skewness > 0 http://dept.econ.yorku.ca/~jbsmith/ec2500_1998/lecture9/Lecture9.html A negatively-skewed pdf Skewness < 0 An exhibit at MOMA invites visitors to mark their heights on a wall. A normal distribution results: Well, not quite. The distribution is actually slightly negatively skewed by the confounding presence of children, who are obviously shorter than adults - you can see this in the great number of names well below the central band which are not mirrored by names higher up. Rest assured, however, that the exchildren distribution is itself Gaussian. http://www.thisisthegreenroom.com/2009/bell-curves-in-action/ The Normal pdf Mean = μ median = μ mode = μ Skewness = 0 http://www.comfsm.fm/~dleeling/statistics/no tes06.html Measures of Variability σ = 0.7 σ = 1.0 σ = 1.5 http://www.google.com/imgres?q=standard+deviation+curve&hl=en&gbv=2&bi w=1226&bih=866&tbm=isch&tbnid=pppxDi8aC37y8M:&imgrefurl=http://www. comfsm.fm/~dleeling/statistics/notes06.html&docid=Hu1RMsiu0MevM&imgurl=http://www.comfsm.fm/~dleeling/statistics/normal_curve_d iff_sx.gif&w=401&h=322&ei=9qAqT8KXAcPptgfC3uX0Dw&zoom=1&iact=hc&vpx =748&vpy=508&dur=1013&hovh=201&hovw=251&tx=142&ty=111&sig=106136 691078404837864&page=1&tbnh=149&tbnw=186&start=0&ndsp=20&ved=1t:4 29,r:13,s:0 Skewed pdfs can also have different standard deviations Which pdf has the largest σ? Pdfs Can have different means, but identical standard deviations Which pdf has the largest σ? Which pdf has the largest μ? Characteristics of probability distributions • Measures of Location – Mean – Median – Mode Probability weighted average 50% point Most likely • Measures of Variability Descriptive Statistics (for numerical data) • Measures of Location – Sample Mean – Sample Median – Sample Mode • Measures of Variability – Standard Deviation – Sample StDev Expected squared – Variance distance from mean – Sample Variance • Measure of Shape – skewness Neg if skewed left, 0 if symmetric, pos if skewed right. • Measure of Shape – Sample skewness Characteristics of probability distributions • Measures of Location – Mean – Median – Mode Probability weighted average 50% point Most likely • Measures of Variability Descriptive Statistics (for numerical data) • Measures of Location – Sample Mean – Sample Median – Sample Mode – skewness Neg if skewed left, 0 if symmetric, pos if skewed right. =median() =mode() • Measures of Variability – Standard Deviation – Sample StDev Expected squared – Variance distance from mean – Sample Variance • Measure of Shape =average() • Measure of Shape =stdev() =var() – Sample skewness =skew() Characteristics of probability distributions • Measures of Location – Mean – Median – Mode Probability weighted average 50% point Most likely • Measures of Variability Descriptive Statistics (for numerical data) • Measures of Location – Sample Mean GET THEM – Sample Median ALL USING – Sample Mode DATA ANALYSIS, • Measures of Variability DESCRIPTIVE STATISTICS, – Standard Deviation – Sample StDev SUMMARY Expected squared – Variance Variance STATISTCS distance from mean – Sample • Measure of Shape – skewness Neg if skewed left, 0 if symmetric, pos if skewed right. • Measure of Shape – Sample skewness Characteristics of probability distributions • Measures of Location – Mean – Median – Mode Probability weighted average 50% point Most likely • Measures of Variability Descriptive Statistics (for numerical data) • Measures of Location – Sample Mean – Sample Median – Sample Mode • Measures of Variability – Standard Deviation – Sample StDev Expected squared – Variance distance from mean – Sample Variance • Measure of Shape – skewness Neg if skewed left, 0 if symmetric, pos if skewed right. • Measure of Shape – Sample skewness COUNT RANGE The sample standard deviation Understanding sample standard deviation X X X X X X X X X X 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 10 4 14 10 20 16 26 20 30 stdev 8.25 8.25 It measures variability about the mean. 0 0 0 4 6 8 10 10 10 16 14 12 20 20 20 stdev 8.25 7.62 7.21 All the data contribute to the measure. 0 0 2 10 4 16 10 18 20 20 stdev 8.07 8.07 It measures variability …. In either direction. Our Data Section 4 4 4 . . . 5 5 5 ND ID Gender (M=1) 901526453 0 901533561 0 901536075 1 . . . . . . 901636399 1 901643915 1 901643995 0 HS Stat? 0 1 0 . . . 0 0 0 Ht 67 63 70 . . . 76 72 64 Value 0 0.06 0 . . . 0 0.1 0 Data/DataAnalysis/DescriptiveStatistics SummaryStatistics ND ID Gender (M=1) Mean Standard 0.061 Error 4 Median 4 Mode Standard 0.504 Deviation Sample 0.254 Variance -2.060 Kurtosis 0.030 Skewness 1 Range 4 Minimum 5 Maximum 901589800.3 Mean Standard 4992.821 Error 901596170 Median #N/A Mode Standard 41473.487 Deviation Sample 1720050147 Variance 0.555 Kurtosis -0.581 Skewness 228090 Range 901444465 Minimum 901672555 Maximum Section Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 4.493 310 69 Sum Count 62209696222 Sum 69 Count HS Stat? 0.609 Mean Standard 0.059 Error 1 Median 1 Mode Standard 0.492 Deviation Sample 0.242 Variance -1.847 Kurtosis -0.455 Skewness 1 Range 0 Minimum 1 Maximum 42 69 Sum Count Ht 0.217 Mean Standard 0.050 Error 0 Median 0 Mode Standard 0.415 Deviation Sample 0.173 Variance -0.039 Kurtosis 1.401 Skewness 1 Range 0 Minimum 1 Maximum 15 69 Sum Count Value 69.351 Mean Standard 0.477 Error 70 Median 71 Mode Standard 3.959 Deviation Sample 15.673 Variance -0.793 Kurtosis -0.307 Skewness 16 Range 60 Minimum 76 Maximum 0.185 4785.25 Sum 69 Count 12.75 69 0.056 0 0 0.465 0.216 6.385 2.706 2 0 2 Data/DataAnalysis/DescriptiveStatistics SummaryStatistics Section Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count ND ID 4.493 0.061 4 4 0.504 0.254 -2.060 0.030 1 4 5 310 69 Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 901589800.3 4992.821 901596170 #N/A 41473.487 1720050147 0.555 -0.581 228090 901444465 901672555 62209696222 69 Data/DataAnalysis/DescriptiveStatistics SummaryStatistics Gender (M=1) Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count HS Stat? 0.609 0.059 1 1 0.492 0.242 -1.847 -0.455 1 0 1 42 69 Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 0.217 0.050 0 0 0.415 0.173 -0.039 1.401 1 0 1 15 69 Data/DataAnalysis/DescriptiveStatistics SummaryStatistics Ht Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Value 69.351 0.477 70 71 3.959 15.673 -0.793 -0.307 16 60 76 4785.25 69 Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 0.185 0.056 0 0 0.465 0.216 6.385 2.706 2 0 2 12.75 69 Fill Test Data Normal(10.2,0.16)? EXHIBIT 2 LOREX PHARMACEUTICALS Filling Line Test Results with Target = 10.2 9.89 10.17 10.29 10.00 10.04 10.35 10.05 10.19 10.21 9.79 10.53 10.47 10.24 10.17 10.15 10.06 10.13 10.04 10.20 10.37 10.24 10.03 10.01 10.20 10.41 10.17 10.35 10.06 10.19 10.17 10.07 10.27 10.13 10.24 10.14 9.84 10.36 10.11 10.42 10.16 10.21 10.01 10.25 10.44 10.44 10.44 10.13 10.29 10.53 10.32 10.16 10.21 10.09 10.02 10.32 10.14 10.11 10.20 10.35 9.96 10.30 10.33 10.36 10.17 10.15 10.14 10.07 10.37 10.40 10.25 10.24 10.03 10.20 10.04 10.16 10.22 10.12 10.36 10.24 10.07 10.40 10.29 10.21 10.10 10.23 10.19 10.19 10.29 10.25 10.18 10.42 9.85 10.45 10.37 10.22 10.19 10.23 10.48 10.17 9.76 10.06 10.17 10.04 10.41 10.27 10.00 10.23 10.11 10.19 9.97 10.05 10.12 10.33 10.18 10.54 9.91 10.28 10.23 9.98 9.99 10.15 10.11 10.19 10.22 10.10 9.99 10.40 10.76 10.20 10.31 10.16 10.23 10.17 10.00 10.11 10.30 10.64 10.10 10.23 10.45 10.17 10.19 9.98 10.13 Fill Test Data Descriptive Statistics Summary Statistics Amount Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 10.198 0.014 10.190 #N/A 0.163 0.026 0.771 0.245 0.997 9.758 10.756 1468.542 144 Fill Test Data Histogram 1 data point was < 9.758 Frequency 1 2 3 10 17 33 36 14 16 7 3 1 1 Histogram 40 35 30 Frequency Bin 9.758 9.841 9.925 10.008 10.091 10.174 10.257 10.340 10.423 10.506 10.590 10.673 More 2 data points were between 9.758 and 9.841 Data Data Analysis Histogram Check chart output 25 20 15 Frequency 10 5 0 1 was above 10.673 Bin Preview of Coming Attractions • Class 07 40 Frequency – Find out how to use these counts to test H0: these data came from N(10.2,.16) – Find out how to use the Denmark family counts to test H0: those data came from Binomial(4,.5) Histogram 30 20 10 Frequency 0 Bin