Measures of Central Tendency& Variability Di Fara Gino’s Di Fara 1 2 4 3 4 4 3 4 1 3 3 2 5 3 2 Σ 44/15 = 2.93 Gino’s 2 3 2 3 2 2 4 4 4 3 2 4 3 2 4 Σ 44/15 = 2.93 Measures of Central Tendency and Variability So far, we have used very basic characterizations of distributions • • Number of modes (unimodal, bimodal, multimodal) Skew (positive or negative) & Symmetry We need a way to characterize these same distributions quantitatively (using numbers). This allows us to compare distributions. We can describe distributions using two categories of measures: Measures of Central Tendency • mean, median, mode Measures of Variability • range, standard deviation, variance Measures of Central Tendency (where all the action is) Mean- The average of all the scores. The sum of all the scores divided by the number of scores. Example: x : {1, 3, 4, 8 } Σ x = (1 + 3 + 4 + 8) = 16 = 4 N 4 4 The mean is denoted differently depending on the type of data from which it comes: Population mean = μ (pronounced “myou”) __ Sample mean = x (spoken as “x-bar”) m The median is the “middle score” Median – The score that falls in the exact middle of the distribution. (Half the scores are lower and half higher than the median. x = {5, 6, 2, 3, 1, 9, 8, 0, 2, 4, 5} First, arrange the numbers in ascending order: x = {0, 1, 2, 2, 3, 4, 5, 5, 6, 8, 9} Find the number that falls in the middle. For an even number of scores, average the two middle numbers. x = {0, 1, 1, 2, 2, 3, 4, 5, 5, 6, 8, 9} Mode – The score that occurs most frequently. The score with the highest FREQUENCY. Example: 1, 3, 1, 5, 2, 1, 1, 8, 2, 3, 1, 1, 1, 0, 1, 3, 2, 1, 1, 1 Mode – The score that occurs most frequently. The score with the highest FREQUENCY. Example: 1, 3, 1, 5, 2, 1, 1, 8, 2, 3, 1, 1, 1, 0, 1, 3, 2, 1, 1, 1 Relations between measures of central tendency describe score distribution shape : Skewness When the mean, median, and mode agree, you have symmetry. Pos Skew: Mean > Median Pos Skew: Mean < Median Review of Summation: x: {1, 0, 3} y: {2, 5, 1} Sx = 1 + 0 + 3 = 4 Sx2 = 1 + 0 + 9 = 10 x x 1 01 30 3 y 2 5 1 (Sx)2 = (1 + 0 + 3)2 = 42 = 16 S 3x = 3(1) + (3)0 + (3)3 = 3 + 0 + 9 = 12 S xy = 1(2) + (0)5 + (3)1 = 2 + 0 + 3 = 5 (Sx)(Sy) = (1+0+3)(2+5+1) = (4)(8) = 32 Measures of variability: (how clustered or spread out the distribution is) Relative Frequency The Normal Distribution 0.015 0.01 0.005 0 0 32 64 96 128 160 192 224 X Range - The maximum difference in the data (Max-Min score) Standard Deviation -The average amount that the scores deviate from the mean. Variance - Similar to the standard deviation but with special properties. # Canoli Eaten 4 5 6 6 7 8 8 9 10 10 10 10 11 11 11 12 12 14 14 14 16 16 21 The Range Canolli Eating Contest 6 # C ontestants Contestant 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 5 4 3 2 1 0 0 2 4 6 8 10 12 14 16 18 20 22 # Canolli Eaten Minimum = 4 Maximum = 21 Range = Maximum - Minimum = 21 - 4 = 17 Standard Deviation: example Canolli Eating Contest # C ontestants 6 5 How much does each score in the sample differ from the average score? 4 3 2 1 0 0 2 4 6 8 10 12 14 16 18 20 22 # Canolli Eaten The amount by which each score differs from the mean is called its deviation. Standard Deviation (population) Raw vs. Deviation Scores How do you suppose we would go about finding the AVERAGE amount by which each score DEVIATES from the mean? x: { 1, 2, 3, 2} x 1 2 3 2 μ 2 2 2 2 x–μ -1 0 1 0 (x – μ)2 1 0 1 0 Σ(x-μ)2 ]- Sum of squares (SS) √ Σ(x-μ)2 _______ = N √ SS = N √ 2 4 = .7071 = .71 “deviation method” s = .71 Standard Deviation (sample) x: { 1, 2, 3, 2} _ x x 2 1 2 2 2 3 2 2 _ x–x -1 0 1 0 _ (x – x)2 1 0 1 0 _ Σ(x-x)2 ]- Sum of squares (SS) √ _ _______ Σ(x-x)2 = N-1 √ SS = N-1 √ 2 3 = .8165 = .82 “deviation method” s = .82 The “raw scores method” is an easier way to calculate the Sum of Squares (SS) Remember, s= √ SS N s= √ SS N-1 “raw scores method” SS = Sx2 __ (Sx)2 N Finding the standard deviation using the “raw scores method” for finding the Sum of Squares (SS) x2 x _________ 1 1 2 4 3 9 2 4 Sx = 8 Sx2 = 18 SS = __ 2 Sx SS = 18 SS = 2 __ (Sx)2 N (8)2 4 Finding the standard deviation using the “raw scores Remember: method” for finding the Sum of Squares (SS) POPULATION: x x2 _________ SS 1 1 s4 = 2 N 3 9 2 4 Sx = 8 Sx2 = 18 s= SS = SAMPLE: 2 (Sx) __ 2 Sx SS N s= N 1 SS = 18 __ 4 SS = 2 √ s = .71 2 4 (8)2 s= √ s = .82 2 3 Summary Slide for Standard Deviation 2 ( ) x m POPULATION: SS s= N SAMPLE: SS s= N 1 x) ( x N 2 2 (x x) 2 x) ( x N 2 2 A family of statistics to describe populations and samples Central Tendency Variability Population Sample - mean m = (Sx)/N - mean _ x = (Sx)/N - median - mode - same - same - range - Std Dev. s = √(SS/N) - Variance s2 - same - Std Dev. s = √(SS/N-1) - Variance s2 Revisiting Pizza… s = 1.16 s = .88 The Normal Distribution and Z-scores Are all unimodal, symmetrical distributions normal? NO. Kurtosis The Normal Distribution and Z-scores What did you get on your SATs? • Prior to 2005, the highest possible score was 1600 • In 2005, an additional section was added to the SAT, making the highest possible score a 2400 If my score (I took the SATs in 2002) was a 1400, and my friend’s score (2006) was an 1800, did my friend do better than I did or not? We need to find a way to compare scores from different distributions. We cannot compare the raw scores directly. If we know that the particular variable on which our score was measured is NORMALLY distributed: • we can specify HOW MANY standard deviations our score is above or below the mean. • For example: We read on Princeton Review’s website that SAT scores are normally distributed. Using the old scale of measurement, the population mean SAT score was 1000, with a standard deviation of 150 points. m = 1000 s = 150 How many standard deviations away from the mean is a score of 1300? . . . . . 600 800 1000 1200 1400 What about a score of 1325? How many standard deviations is it from the mean? 1325 – 1000 150 = 325 150 1325 = 2.166 = 2.17 600 800 1000 1200 1400 The Z-score z = z = x-m s x-x s for population Measures how extreme or unusual a score is within a population for sample *in units of standard deviation. (this means it tells us exactly HOW MANY standard deviations a score is from the mean). The Z-score Example: MY SAT score (1400) MY friend’s SAT score (1800) Population of SAT scores (old grading system): Population of SAT scores (new grading system): m s 1000 pts 150 pts m s 1500 200 z= 1400 – 1000 150 z= 1800 – 1500 200 = 400 = 150 z = 2.6666 = 2.67 standard deviations above the mean pts pts 300 200 z = 1.5000 = 1.50 standard deviations above the mean