E370 2013 Spring Chapter 3 Summary Statistics Summarizing - Distribution Measures of Central Location/Central Tendency Mean, Median, Mode Measures of Variability/Dispersion Range, Standard Deviation,Variance, Coefficient of Variation Measures of Shape Skewness (e.g. Pearson’s 2nd Skewness) 2 Three Measures of Central Tendency Tendency Statistic Mean Median 3 Formula Excel Formula 1 n xi n i 1 Familiar and uses all the =AVERAGE(Data) sample information. Middle value in sorted array =MEDIAN(Data) Pro Con Influenced by extreme values. Ignores extremes Robust when and can be extreme data affected by values exist. gaps in data values. Three Measures of Central Tendency (cont.) 4 Variance N The population variance (𝜎2) is defined as the sum of squared deviations around the mean m divided by the population size. 2 xi i 1 N n (s2), we For the sample variance divide by n – 1 instead of n, otherwise s2 would tend to underestimate the unknown population variance s2. s2 xi x i 1 n 1 Note! the denominator is sample size (n) minus one ! Drawback: due to its units, hard to interpret 5 2 2 Variance(cont’d) Excel’s built in functions are Statistic Excel population formula Excel sample formula Variance =VAR.P(Array) =VAR.S(Array) =STDEV.P(Array) =STDEV.S(Array) Standard deviation 6 Coefficient of Variation The coefficient of variation(CV) of a set of observations is the standard deviation of the observations divided by their mean, that is: • This coefficient provides a unit-free measure of variation. It measures relative dispersion, and is useful for comparing dispersion of variables measured in different units or with different means. 7 Measure of Skewness Pearson’s Skewness Coefficients • First: Sk = (mean-mode)/sample averg(or pop averg) • Second: Sk = 3(mean-median)/sample averg(or pop averg) Characteristics of Pearson’s Second Skewness Coefficient: • Usually exist between -3 and +3 • zero means symmetric. • Negative means negative (left) skewness • Positive means positive (right) skewness 8 Mean, Median, Mode If a distribution is right-skewed (positive) it is often true: MEAN > MEDIAN > MODE If a distribution is left-skewed (negative) it is often true: MODE > MEDIAN > MEAN Excel =AVERAGE(Array): =MEDIAN(Array): Returns the arithmetic mean. Returns the median of an ordered array. The array must be put in order before use, or the value it returns is meaningless. =MODE.SNGL(Array): Returns the first mode that is found in an array. =MODE.MULT(Array): Will return multiple modes if they exist in an array. =MIN(Array): Returns the value of the smallest magnitude in an array. =MAX(Array): Returns the value of the greatest magnitude in an array. =VAR.P(Array): Returns the population variance of an array. =VAR.S(Array): Returns the sample variance of an array. =STDEV.P(Array): Returns the population standard deviation of an array. =STDEV.S(Array): Returns the sample standard deviation of an array. Data==>Data Analysis==>Descriptive Statistics: Generates a table of statistics for one or more variables.