Lecture 1-3 Descriptive Statistics Central Tendency 1 Descriptive Statistics • Two important characteristics of a population – Center: measures of central tendency – Behavior around the center: measures of variability or dispersion • Examples of central tendency: mean, median, mode • Examples of measures of dispersion: sample standard deviation, sample range 2 Sample Mean • The arithmetic average of the data. • Denoted by • If we have items in our sample n x x i 1 n i 3 Sample Median • The “middle” value of the data. • Denoted • Requires the data to be ordered from smallest to largest: . • Median divides the data into two equal parts • Let be the “location” of the middle 4 Sample Median • If is odd, the median is • If is even, it is 5 Sample Mode • Most frequently observed value • Has no special notation • Often not unique – Multiple modes – Frequently occurs with small data sets • Not used often in practice. 6 Example • The following data are the results of test done by the National Bureau of Standards to determine the melting point of biphenyl: 343.0 342.4 343.4 343.1 343.3 343.7 343.5 343.1 343.3 343.4 343.8 343.3 343.3 343.3 • Compute the mean, median and mode. 7 Example • Mean: • Median: Ordered data set: 342.2, 343.0, 343.1, 343.1, 343.3, 343.3, 343.3 343.3, 343.3, 343.4, 343.4, 343.5, 343.7, 343.8 8 Example • Median - Continued • Mode: Most popular is 9 Relationships • For symmetric distributions, mean, median, and mode tend to be very similar. • For right-skewed distributions, tendency is Mode < Median < Mean • For left-skewed distributions, Mean < Median < Mode • Results depend on sample size 10 Which Measure of Centrality is Best? • Depends on the data: – The mean sensitive to extreme values. – The median is not sensitive to these values. – The mode is not always representative of all of the data. • The median is an example of a “robust” measure of the center. • It is less sensitive to “outliers” 11 Descriptive Statistics: Variability or Dispersion 12 Measures of Variability/Dispersion • Measures of central tendency give information only about the “typical” value. • Real data exhibit variability. 13 Common Measures of Variability • Sample Range • Sample Variance • Sample Standard Deviation • Interquartile Range 14 Sample Range • Difference between the largest and smallest observation • Note: a single value, not a list of values like the term “range” in mathematics. • Not complete agreement upon symbol. 15 Sample Variance • Deviations from the typical value are the heart of variability. • Let be the deviation or residual. • Definition (for now): 16 Sample Variance • Consider 17 Sample Variance • Now consider 18 Sample Variance • Theoretical Formula: n 2 ( ) x x i s 2 i 1 n 1 • Computational Formula: = • The reflect the “degrees of freedom.” 19 Sample Standard Deviation • Note: the units of the sample variance are squared (e.g. if the data are in ). • Sample standard deviation converts the sample variance into the units of the data. 20 Interquartile Range (IQR) • Range of the “middle 50%” of the data. • Requires calculation of the “quartiles” – First quartile (: approximately 25% below – Third quartile (): approximately 25% above • Conceptually: – Use to split the data into two parts. – is the median of the lower half. – is the median of the upper half – No universal definition other than the concept! 21 IQR • IQR = • Robust estimate of dispersion – Insensitive to outliers – Concept of leverage: 22 Measures of Position • Measures of position: – Describe the relative position a specific data value in ranked order – Common term: percentile – Important for robust and nonparametric statistics • Examples: – Median: estimate of the 50th percentile – First quartile: estimate of the 25th percentile – Third quartile: estimate of the 75th percentile 23 Al Contamination in PET Plastic • Data: 291 222 125 79 145 119 244 118 182 63 30 140 101 102 87 183 60 191 119 511 120 172 70 30 90 115 • Variance: = ; 24 Contamination - Continued • Variance (continued) = • Standard Deviation: 25 Contamination - Continued • Data in Ascending Order 30 30 60 63 70 79 87 90 101 102 115 118 119 119 120 125 140 145 172 182 183 191 222 244 291 511 • Range: Min = 30; Max = 511 R = Max – Min=511-30=481 26 Contamination - Continued • IQR – Lower half and upper half have 13 values each – Median of each half is the 7th value. 27 Which Measure of Variability Is Best? • Statistical theory uses the variance – Standard deviation is in the original units – Variance and standard deviation are sensitive to extreme values. • Range – Not commonly used outside of SPC – Very sensitive to extreme values • IQR – Useful for exploratory data analysis – Robust to extreme values 28