12/8/2014 AGSC 320 Statistical Methods Numerical descriptive measures Data representation 1. • Measures of central tendency e.g., mean, mode, median, midrange 2. Measures of dispersion e.g., range, variance, standard deviation 3. Measures of distribution shape e.g., normal, skewed, uniform, random 4. Measures of position e.g., percentiles, quartiles, standard scores 2 Data organization Height of 20 trees: 50, 45, 32, 48, 56, 38, 42, 48, 55, 36, 41, 51, 30, 59, 53, 47, 57, 51, 46, 44 7 6 5 4 3 2 1 0 30 35 40 45 Height 50 55 60 3 1 12/8/2014 Measures of central tendency 1. mean or arithmetic average Definition: sum of values divided by the total number of observations 4 Measures of central tendency 2. Median: Definition: the midpoint / middle value in a group of data The point that separate the data in two set with the same number of observations Steps: • arrange the data in order • find the midpoint 5 Measures of central tendency 3. Mode Definition: the most frequently occurring value / observation Notes: • not always unique • can also be bimodal, multimodal 4. Midrange Definition: sum of the lowest and highest values divided by 2 6 2 12/8/2014 Measures of central tendency Summary Statistics Mean Value 7 Relationship among mean,median,mode • Depending on the shape of the histogram / frequency distribution the mean can be located differently in respect with median or mode Mean=Median=Mode Mode<Median<Mean Mean<Median<Mode 8 Measures of dispersion 1. Range: Definition: the difference between the largest and smallest observation Range = xmax - xmin where xmax – largest observation xmin – smallest observation 9 3 12/8/2014 Measures of dispersion 2. Variance: • Definition: sum of the squared differences between each observation and the mean, divided by the number of observations. Population Sample 10 Measures of dispersion Working formulas for Variance and Standard deviation 11 Measures of dispersion 3. Standard deviation Definition: the square root of the variance • A measure of the spread of the observations in the original units Population Sample 12 4 12/8/2014 Measures of dispersion Variance and Standard deviation Using definition: Using working formulas: 13 Measures of dispersion Range rule of thumb A rough estimate of the standard deviation is a quarter of range s range 4 Example using tree data s ....... ........... ....... 14 Measures of dispersion 4. Coefficient of variation Ratio between standard deviation and mean sample CV s 100 x population CV 100 Example using tree data CV ....... 100 ........... [%] ....... 15 5 12/8/2014 Measures of central tendency grouped data Mean or Arithmetic average Definition: sum of values divided by the total number of observations c sample data x f j 1 c c j f j 1 xj Population data f j j 1 c f j j 1 j j x j , value of the j class midpo int th c number of classes f j frequency of the j th class 16 Measures of dispersion grouped data Variance and Standard deviation for frequency distribution c c sample data s2 f j 1 j ( x j x )2 population data c f j 1 j 2 f j 1 1 j ( j )2 c f j 1 j x j , value of the j th class midpo int c number of classes f j frequency of the j th class 17 Measures of dispersion grouped data Example: Daily commuting times, in minutes Calculate mean, variance, standard deviation, CV Daily commuting time Number of employees Less than 10 min 4 10 – 20 min 9 20 – 30 min 6 30 – 40 min 4 40 – 50 min 2 18 6 12/8/2014 Measures of dispersion grouped data • Remember: in a class all individuals are assumed to have the mid-value of the respective class • Mid-value of the class = class mean Commuting time < 10 min 10 – 20 min 20 – 30 min 30 – 40 min 40 – 50 min Total # employees 4 9 6 4 2 Class mean 5 fj x μj 20 19 Measures of dispersion grouped data • Mean commuting time: • Variance: c 2 f j 1 j ( j )2 c f j 1 4(5 ....) 2 9(15 ...) 2 .... (4 9 6 4 2) j • Standard deviation: σ=…… • Coefficient of variation: CV=σ/μ x100= ……/…….= 20 Use of standard deviation • Connect mean with standard deviation • Chebyshev’s Theorem: For any k>1, at least 1-1/k2 of the data lie within k standard deviation from the mean • Example: if k=2 →1-1/k2=1-1/4=0.75 or 75% This means that 75% of data values are within two standard deviation from the mean 21 7 12/8/2014 Measures of Distribution Shape • Skeweness: a measure of the asymmetry of the frequency distribution n ( xi x )3 /( n 1) i 1 3/ 2 n 2 ( xi x ) /( n 1) i 1 • Kurtosis: measure of the "peakeness" of the frequency distribution n ( xi x ) 4 /( n 1) 3 2 n 2 ( xi x ) /( n 1) i 1 i 1 22 Measure of position • Locate the relative position of an observation /data within dataset PERCENTILES – divide the data set into 100 groups with equal number of observations • indicate the position of an individual in a group – Education – Health related industry – Life sciences percentile (# observations less than x) 0.5 100 total # observations [%] 23 Percentiles charts 24 8 12/8/2014 Standard scores • Compare the relative position of observations within their defining dataset • Standard score or z-score z observation' s value mean x x standard deviation s • Allows comparison of different datasets or different type of data 25 Standard score Example: Student received 92% Statistics and 75% English Was the overall student’s performance bad? Additional info: • Mean grade for Statistics was 85 and for English was 70 • Variance for Statistics was 36 and for English was 9. Compute the z-scores: zStatistics x x ................ ............ s zEnglish x x ................ ............ s Conclusion: 26 Population vs. statistics • Various numerical measures can be computed for the population as well as for a sample – Mean , median, variance, coefficient of variation • When the measure is computed for the entire population then the measure is called population parameter or simply PARAMATER • When the measure is computed for a portion of the population (namely sample), then the measure is called sample statistics, or simply STATISTIC. 27 9