Statistics 1 How long is a name? • To answer this question, we might collect some data on the length of a name. How long is a name? • First we need to establish our target population. How long is a name? • First we need to establish our target population. • Let’s say in this mathematics class. How long is a name? • What names should we use? How long is a name? • What names should we use? • Names as listed on the roll. Data Averaging • We call this a central tendency. • There are 3 measures which we can use. • MEAN • MEDIAN • MODE Mean • Usually when we say average, we are referring to the mean. • To find the mean, we add up all the numbers and divide by how many there are. Example • Find the mean of 4, 0, 2, 1, 6 4 + 0 + 2 + 1+ 6 13 = = 2.6 5 5 In Excel we can use the formula • =average(highlight cells) Data on names Median • A median is the middle value when the data is put in order. • If there are an odd number of data, the middle is unique. • If there is an even number of data, we need to average the two middles. Example • Find the median of 4, 8, 2, 9, 1 • First put them in order • 1, 2, 4, 8, 9 Example • Find the median of 4, 8, 2, 9, 1 • First put them in order • 1, 2, 4, 8, 9 • The middle number is ‘4’ Example • Find the median of 4, 8, 2, 9, 1, 6 • • • • First put them in order 1, 2, 4, 6, 8, 9 The middle number is ‘4’ and ‘6’ Averaging gives median is 5. Sort data on Excel or use formula =median(data) Mode • The mode is the most common number. • You can have 2 modes but not more than 2. Example • Find the mode of 6, 4, 3, 7, 8, 6, 7, 2 Example • Find the mode of 6, 4, 3, 7, 8, 6, 7, 2 • There are two modes 6 and 7 Using Excel • Formula =mode(data) • You must be careful as Excel will only give one mode Which average is the best? • Generally we use the mean as it includes all the data but if we have extreme values, the median is a better measure as it is not affected by extreme values. Example • These are the incomes of a group of university students. • $2400, $1500, $2000, $1800, $22 000 • Find the best ‘average’. Example • $2400, $1500, $2000, $1800, $22 000 2400 + 1500 + 2000 + 1800 + 22000 Mean = = $5940 5 Median = $2000 No mode • The mean is not representative whereas the median is. Frequency tables Length Tally Frequency 3 ll 2 4 llll 5 5 llll llll llll 14 6 llll ll 7 7 llll 5 8 ll 2 Mode is 5 Length Tally Frequency 3 ll 2 4 llll 5 5 llll llll llll 14 6 llll ll 7 7 llll 5 8 ll 2 Median is also 5 Length Tally Frequency 3 ll 2 4 llll 5 5 llll llll llll 14 6 llll ll 7 7 llll 5 8 ll 2 Mean is 5.4 Length Tally Frequency 3 ll 2 4 llll 5 5 llll llll llll 14 6 llll ll 7 7 llll 5 8 ll 2 Calculating the mean by hand 3 ´ 2 + 4 ´ 5 + 5 ´14 + 6 ´ 7 + 7 ´ 5 + 8 ´ 2 = 5.4 2 + 5 + 14 + 7 + 5 + 2 Using the calculator • • • • • • • • STAT mode Place data in list 1 Place frequency in list 2 CALC, SET, 1Var Xlist list1 1Var Freq list2 Exe 1Var Measures of spread • It is not enough to just give the ‘average’. • The mean, median and mode is the same for all 3 sets of data: • 48 49 50 50 51 52 • 40 45 50 50 50 55 60 • 0 0 50 50 50 100 100 • But the data sets are quite different Measures of spread • Range is • (highest number) - (lowest number) • For our data set the first names have a range of 8 - 3 = 5 Measures of spread • Again, if there are extreme values, the range can distort the true spread of the data. 5-number summary • We often sort the data into a 5 number summary. • The data is split into 4 groups Example 1 • 1 14 29 35 43 48 49 78 82 82 92 95 95 • 13 numbers Example 1 • • • • 1 14 29 35 43 48 49 78 82 82 92 95 95 Lowest is 1 Median is 49 Highest is 95 Example 1 • • • • • • 1 14 29 35 43 48 49 78 82 82 92 95 95 Lowest is 1 Lower quartile is 35 Median is 49 Upper quartile is 82 Highest is 95 Example 2 • 9 11 17 22 23 28 30 36 Example 2 • 9 11 17 22 14 22.5 23 28 30 36 29 Example 2 • 9 11 17 22 14 23 28 22.5 • 5-number summary is • 9 14 22.5 29 36 30 36 29 For first names in our class • • • • The 5-number summary is 3 4 5 6 8 Lower quartile is 4 Upper quartile is 6 Interquartile range is the difference between quartiles 6 - 4 = 2 Statistics so far • • • • • Central tendencies: Mean = 5.4 Median = 5 Mean = 5 Because the mean and median are about the same, we wouldn’t expect extreme values. Statistics so far • Measures of spread: • Range = 5 • Interquartile range = 2 Statistics so far • 5 - number summary • 3 4 5 6 8