S1: Chapter 2/3 Data: Measures of Location and Dispersion Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com Last modified: 9th September 2015 For Teacher Use Lesson 1: Concept of variables in statistics. Mean of ungrouped and grouped data. Using a calculator to input data and calculate Σπ₯, Σππ₯, π, π₯. Combined means. Lesson 2: Median from discrete data and continuous data (using linear interpolation) Lesson 3: Quartiles/percentiles using linear interpolation. Lesson 4: Variance/Standard Deviation Lesson 5: Coding. Consolidation. Variables in algebra vs stats π₯ Differences Similarities ο± Just like in algebra, variables in stats represent the value of some quantity, e.g. shoe size, height, colour. ο± Variables can be discrete or continuous. Discrete: Limited to specific values, e.g. shoe size, colour, day born. May be quantitative (i.e. numerical) or qualitative (e.g. favourite colour). Continuous: Can be any real number in some range (time, weight, …) Discrete ? Continuous ? ο± Can be part of further calculations, e.g. if π₯ represents height, then 2π₯ represents twice people’s height. In stats this is known as ‘coding’, which we’ll cover later. ο± Unlike algebra, a variable in stats represents the value of multiple objects. e.g. The heights of all people in a room. ο± Because of this, we can do operations on it as if it was a collection of values: ο± If π₯ represents people’s heights, Σπ₯ gives the sum of everyone’s heights. In algebra this would be meaningless. If π₯ = 4, then Σπ₯ makes no sense! ο± π₯ is the mean of π₯. Notice π₯ is a collection of values whereas π₯ is a single value. Mean of ungrouped data Length of cat π 2.2 2.5 2.6 2.65 2.9 You all know how to find the mean of a list of values. But lets consider the notation, and see how theoretically we could calculate each of the individual components on a calculator. Σπ₯? π₯= π? οWhip out yer Casios… The overbar in stats specifically means ‘the sample mean of’, but don’t worry about this for now. Inputting Data Length of cat π 2.2 2.5 2.6 2.65 2.9 1. Press the ‘MODE’ button and select STAT. 2. The “1-VAR” means “one variable”, so select this. (We’ll also be using A + BX later in the year, which as you might guess, is to do with fitting straight lines of best fit) 3. Enter each value above in the table that appears, and press ‘=‘ after each one. 4. Now press AC to stop entering data so we can start inputting a calculation… Inputting Data Length of cat π 2.2 2.5 2.6 2.65 2.9 In STATS mode, we can still do many calculations as normal (e.g. try 1 + 1 = ) but we can also insert various statistical calculations involving π₯. Use SHIFT -> [1] to access these. Try inputting these expressions: π =π ? Σπ₯ /π = π. ππ ? 3π₯ + 1 = π. ππ ? Σπ₯ = ππ.?ππ π₯ = π. ππ ? Grouped Data Height π of bear (in metres) Estimate of Mean: Frequency 0 ≤ β < 0.5 4 0.5 ≤ β < 1.2 20 1.2 ≤ β < 1.5 5 1.5 ≤ β < 2.5 11 π₯= ππ₯ ?π = 46.75 ? 40 = 1.17π ? What does the variable π₯ represent? The midpoints of each interval. They‘re effectively a sensible single value used to represent each interval. Why is our mean just an estimate? Because we don’t know the exact heights within each group. Grouping data loses information. ? ? Inputting Frequency Tables Height π of bear (in metres) Frequency 0 ≤ β < 0.5 4 0.5 ≤ β < 1.2 20 1.2 ≤ β < 1.5 5 1.5 ≤ β < 2.5 11 We now need to input frequencies. Go to SETUP (SHIFT -> MODE), press down, then select STAT. Select ‘ON’ for frequency. Your calculator will preserve this setting even when switched off. Now go into STATS mode and input your table, using your midpoints as the π₯ values. How on the calculator do we get: Σππ₯ ? Σπ ? This is Σπ₯. This is because your calculator thinks π₯ is the original variable rather than just ? a list of interval midpoints. Just imagine that your calculator is converting your table into a list of values with duplicates. ? = Σπ) This is π. (Since π Bro Exam Tip: In the exam you get a method mark for the division and an accuracy mark for the final answer. Write: 46.75 π₯= = 1.16875 40 You’re not required to show working like “0.25 × 4 + β―” π₯= ππ₯ π Mini-Exercise Use your calculator’s STAT mode to determine the mean (or estimate of the mean). Ensure that you show the division in your working. 1 2 IQ of L6Ms2 (π) Frequency (π) Num children (π) Frequency (π) 0 2 80 < π ≤ 90 7 1 6 90 ≤ π < 100 5 2 1 100 ≤ π < 120 3 3 1 120 ≤ π < 200 1 Σπ 11 π= = ?= 1.1 π 10 3 Time π π₯= Σππ₯ 1560 ? = 97.5 = Σπ 16 Frequency (π) 9.5 < π‘ ≤ 10 32 10 ≤ π‘ < 12 27 12 ≤ π‘ < 15 47 15 ≤ π‘ < 16 11 π₯= Σππ₯ Σπ = 1414 117 = ? 12.1 to 3sf. GCSE RECAP :: Combined Mean The mean maths score of 20 pupils in class A is 62. The mean maths score of 30 pupils in class B is 75. a) What is the overall mean of all the pupils’ marks. b) The teacher realises they mismarked one student’s paper; he should have received 100 instead of 95. Explain the effect on the mean and median. ππ × ππ + ππ × ππ ππππ π΄πππ = = = ππ. π ππ ππ ? The revised score will increase the mean but leave the median unaffected. Test Your Understanding Archie the Archer competes in a competition with 50 rounds. He scored an average of 35 points in the first 10 rounds and an average of 25 in the remaining rounds. What was his average score per round? ππ × ππ + ππ × ππ π΄πππ = = ππ ?ππ Median – which item? You need to be able to find the median of both listed data and of grouped data. Grouped data Listed data π Position of median Median 1,4,7,9,10 5 4,9,10,15 4 ? 2 ? /3 4? 5? /6 ?7 9.5 ? ?7 7.5 ? Items 2,4,5,7,8,9,11 7 1,2,3,5,6,9,9,10,11,12 10 3rd nd rd th th th Can you think of a rule to find the position of the median given π? ! To find the position of the median for π listed data, find 2 : - If a decimal, round?up. - If whole, use halfway between this item and the one after. IQ of L6Ms2 (π) Frequency (π) 80 ≤ π < 90 7 90 ≤ π < 100 5 100 ≤ π < 120 3 120 ≤ π < 200 2 Position to use for median: 8.5? ! To find the median of π grouped data, find 2 , then use linear interpolation. π DO NOT round or adjust it in any way. 2 This is just like at GCSE where, if you had a cumulative frequency graph with 60 items, you’d look across the 30th. We’ll cover linear interpolation in a sec… Quickfire Questions… What position do we use for the median? Lengths: 3cm, 5cm, 6cm, … π = 11 Median position: 6th ? Lengths: 4m, 8m, 12.4m, … π = 24 Median position: 12th/13 ? th Age Freq 10 ≤ π < 20 12 20 ≤ π < 30 5 Median position: 8.5 ? Score Freq 150 ≤ π < 200 3 200 ≤ π < 400 7 th Median position: 18? Volume (ml) Median position: 5 ? Ages: 5, 7, 7, 8, 9, 10, … π = 60 Median position: 30th/31 ? st Score Weights: 1.2kg, 3.3kg, … π = 35 Freq 0 ≤ π£ < 100 5 100 ≤ π£ < 200 6 200 ≤ π£ < 300 2 Median position: 6.5? Freq 150 ≤ π < 200 15 200 ≤ π < 400 6 Median position: 10.5 ? Weights: 4.4kg, 7.6kg, 7.7kg… π = 18 Median position: 9th/10 ? th Bro Exam Note: Pretty much every exam question has dealt with median for grouped data, not listed data. Linear Interpolation Height of tree (m) Freq C.F. 0.55 ≤ β < 0.6 55 55 0.6 ≤ β < 0.65 45 100 0.65 ≤ β < 0.7 30 130 0.7 ≤ β < 0.75 15 145 0.75 ≤ β < 0.8 5 150 At GCSE we draw a suitable line on a cumulative frequency graph. How could we read of this value exactly using suitable calculation? We could find the fraction of the way along the line segment using the frequencies, then go this same fraction along the class interval. Linear Interpolation Height of tree (m) Freq C.F. 0.55 ≤ β < 0.6 55 55 0.6 ≤ β < 0.65 45 100 0.65 ≤ β < 0.7 30 130 0.7 ≤ β < 0.75 15 145 0.75 ≤ β < 0.8 5 150 The 75th item is within the 0.6 ≤ β < 0.65 class interval because 75 is within the first 100 items but not the first 55. Frequency up until this interval 55? ? 75 0.6m ? Med Height at start of interval. Item number we’re interested in. ? 100 Frequency at end of this interval 0.65m ? Height at end of interval. Linear Interpolation Height of tree (m) Freq C.F. 0.55 ≤ β < 0.6 55 55 0.6 ≤ β < 0.65 45 100 0.65 ≤ β < 0.7 30 130 0.7 ≤ β < 0.75 15 145 0.75 ≤ β < 0.8 5 150 Frequency up until this interval What fraction of the way across the interval are we? ππ ? ππ Hence: ππ π΄ππ πππ = π. π + × π. ππ ππ ? 55 75 0.6m Med Item number we’re interested in. Height at start of interval. 100 Frequency at end of this interval 0.65m Height at end of interval. Bro Tip: I like to put the units to avoid getting frequencies confused with values. Bro Tip: To quickly get frequency before and after, just look for the two cumulative frequencies that surround the item number. More Examples Weight of cat (kg) Freq C.F. 1.5 ≤ π€ < 3 10 10 3≤π€<4 8 4≤π€<6 14 Time (s) Freq C.F. 8 ≤ π‘ < 10 4 4 18 10 ≤ π‘ < 12 3 7 32 12 ≤ π‘ < 14 13 20 Median class interval: π≤π<π ? 10? 3kg ? ? 16 7? ? 18 4kg ? Fraction along interval: π π? Median: π π+ × π ? = π. ππππ π ? 20 10 ? 12s ? 14s ? Median: ππ + π × ππ π = ππ. π (to 3sf) ? What’s different about the intervals here? Weight of cat to nearest kg Frequency 10 − 12 7 13 − 15 2 16 − 18 9 19 − 20 4 There are GAPS between intervals! What interval does this actually represent? 10 − 12 9.5 −? 12.5 Lower class boundary Class width = 3 Upper class boundary ? Identify the class width Distance π travelled (in m) … Time π taken (in seconds) 0 ≤ d < 150 0−3 150 ≤ d < 200 π−π πππ ≤ π < πππ 7 − 11 … Lower class boundary = 200 ? Lower class boundary = 3.5? Class width = 10? Class width = 3 ? Weight π in kg … Speed π (in mph) 10 − 20 10 ≤ s < 20 21 − 30 20 ≤ π < 29 ππ − ππ ππ ≤ π¬ < ππ Lower class boundary = 30.5 ? Class width = 10? … Lower class boundary = 29? Class width = 2 ? Linear Interpolation with gaps Jan 2007 Q4 10 29 72 97 105 111 116 119 120 ? 29 19.5 ?miles ? 60 72? 29.5?miles ππ ππππππ = ππ. π + ×? ππ = ππ. πππ ππ Test Your Understanding Questions should be on a printed sheet… Age of relic (years) 0-1000 1001-1500 1501-1700 1701-2000 Frequency 24 29 12 35 26 ππππππ = 1000.5 + × 500 29 ? = ππππ. π years Shark length (cm) 40 ≤ π₯ < 100 100 ≤ π₯ < 300 300 ≤ π₯ < 600 600 ≤ π₯ < 1000 Frequency 17 5 8 10 3 ππππππ = 100 + × 200 5? = 220ππ Exercise 2 1 Questions should be on a printed sheet… ππ π΄ππ πππ = ππ. π + × ?π = ππ. π ππ 2 3 π΄ππ πππ = ππ + ππ × ππ = ππ. π ππ ? π΄ππ πππ = ππ + ππ × ππ = ππ. πππ ππ ? Exercise 2 4 ππππ. π π΄πππ = =?ππ. ππππ πππ ππ π΄ππ πππ = ππ. π + × π = ππ. π ππ ? Quartiles – which item? You need to be able to find the quartiles of both listed data and of grouped data. The rule is exactly the same as for the median. Grouped data Listed data Items 1,4,7,9,10 4,9,10,15 2,4,5,7,8,9,11 IQ of L6Ms2 (π) π Position of LQ & UQ 5 ? 1 /2 ? & 3 /4 2 ? &6 3 and ?8 4 7 1,2,3,5,6,9,9,10,11,12 10 2nd & 4th st nd rd nd th rd th th LQ & UQ ?7 6.5 & ?12.5 4? &9 3& ?10 Can you think of a rule to find the position of the LQ/UQ given π? ! To find the position of the LQ/UQ for 1 3 listed data, find 4 π or 4 π then as before: ? - If a decimal, round up. - If whole, use halfway between this item and the one after. Frequency (π) 80 ≤ π < 90 7 90 ≤ π < 100 5 100 ≤ π < 120 3 120 ≤ π < 200 2 Position to use for LQ: 4.25 ? ! To find the median of 1 3 grouped data, find 4 π and 4 π, then use linear interpolation. Again, DO NOT round this value. Percentiles The LQ, median and UQ give you 25%, 50% and 75% along the data respectively. But we can have any percentage you like. π = 43 Item to use for 57th percentile? 24.51 ? You will always find these for grouped data in an exam, so never round this position. Notation: Lower Quartile: π1? Upper Quartile: π?3 Median: π?2 ? 57th Percentile: π57 Test Your Understanding These are the same as the ‘Test Your Understanding’ questions on your sheet from before. Age of relic (years) 0-1000 1001-1500 1501-1700 1701-2000 Item for π1 : 25th? Frequency 24 29 12 35 1 π1 = 1000.5 + × 500 29 ? = 1017.74 years 10 π3 = 1700.5 + × 300 35 ? = 1786.21 years πΌππ = 768.47 ? Shark length (cm) 40 ≤ π₯ < 100 100 ≤ π₯ < 300 300 ≤ π₯ < 600 600 ≤ π₯ < 1000 Frequency 17 5 8 10 Item to use for P37 : 14.8 ? ππ. π π37 = ππ + ×?ππ = ππ. ππ ππ π. π π78 = πππ + ×?πππ = πππ ππ Exercise 3 ? ? Exercise 3 ππ π1 = ππ + × ππ = ππ. π ππ ππ π3 = ππ + × ππ = ππ. π ππ πΌππ = ππ. π ? π × ππ = ππ. π ππ ππ π3 = ππ. π + × ππ = ππ. π ππ πΌππ = π. π π1 = ππ. π + ππ × ππ = ππ. π ππ ππ π3 = ππ + × ππ = πππ. π ππ π π90 = πππ + × πππ = πππ. π ππ π1 = ππ + ? ? What is variance? Distribution of IQs in L6Ms5 Distribution of IQs in L6Ms4 πΉππππ’ππππ¦ π·πππ ππ‘π¦ πΉππππ’ππππ¦ π·πππ ππ‘π¦ 110 πΌπ Here are the distribution of IQs in two classes. What’s the same, and what’s different? 110 πΌπ Variance Variance is how spread out data is. Variance, by definition, is the average squared distance from the mean. π 2 Σ = π₯−π₯ 2 π Distance from mean… Squared distance from mean… Average squared distance from mean… Simpler formula for variance Variance “The mean of the squares minus the square of the mean (‘msmsm’)” 2 Σπ₯ π2 = ? − π₯2? π Standard Deviation π = ππππππππ The standard deviation can ‘roughly’ be thought of as the average distance from the mean. Examples 1, 3 Variance π 2 = 5 − 2?2 = 1 Standard Deviation π = 1 =?1 So note that that in the case of two items, the standard deviation is indeed the average distance of the values from the mean. 2cm 3cm 3cm 5cm 7cm Variance π 2 = 19.2 − 42?= 3.2cm Standard Deviation ? π = 3.2 = 1.79cm Practice Find the variance and standard deviation of the following sets of data. 2 ? Variance = 2.67 4 6 Standard Deviation = 1.63? 1 2 3 4 5 Variance = 2 ? Standard Deviation = 1.41 ? Extending to frequency/grouped frequency tables We can just mull over our mnemonic again: Variance: “The mean of the squares minus the square of the means (‘msmsm’)” Σππ₯ 2 ππππππππ = ? − π₯ 2 ? Σπ Bro Tip: It’s better to try and memorise the mnemonic than the formula itself – you’ll understand what’s going on better, and the mnemonic will be applicable when we come onto random variables in Chapter 8. Bro Exam Note: In an exam, you will pretty much certainly be asked to find the standard deviation for grouped data, and not listed data. Example May 2013 Q4 We can use our STATS mode to work out the various summations needed. Just input the table as normal. Note that, as the discussion before, Σπ₯ 2 on a calculator actually gives you Σππ₯ 2 because it’s already taking the frequencies into account. ππππ. π ? = ππ. ππππ πππ ππππππ. ππ π = ππ. ππ π2 = − ππ. ππππ ? πππ π = ππ. ππ = π. ππ π₯= ? You ABSOLUTELY must check this with the ππ₯ you can get on the calculator directly. Test Your Understanding May 2013 (R) Q3 ? Most common exam errors ο±Thinking Σππ₯ 2 means Σππ₯ 2 . It means the sum of each value squared! ο±When asked to calculate the mean followed by standard deviation, using a rounded version of the mean in calculating the standard deviation, and hence introducing rounding errors. ο±Forgetting to square root the variance to get the standard deviation. ALL these mistakes can be easily spotted if you check your value against “ππ₯” in STATS mode. Exercises Page 40 Exercise 3C Q1, 2, 4, 6 Page 44 Exercise 3D Q1, 4, 5 Coding What do you reckon is the mean height of people in this room? Now, stand on your chair, as per the instructions below. INSTRUCTIONAL VIDEO Is there an easy way to recalculate the mean based on your new heights? And the variance of your heights? Starter Suppose now after a bout of ‘stretching you to your limits’, you’re now all 3 times your original height. What do you think happens to the standard deviation of your heights? It becomes 3 times larger (i.e. your heights are 3 times as spread out!) ? What do you think happens to the variance of your heights? It becomes 9 times larger ? (Can you prove the latter using the formula for variance?) The point of coding Cost π₯ of diamond ring (£) £1010 £1020 £1030 £1040 £1050 We ‘code’ our variable using the following: π₯ − 1000 π¦= 10 New values π¦: £1 £2 £3 ? £4 Standard deviation of π¦ (ππ¦ ): therefore… Standard deviation of π₯ (ππ₯ ): £5 π ? 10 ? 2 Finding the new mean/variance Old mean π₯ Old ππ₯ Coding New mean π¦ New ππ¦ 36 4 π¦ = π₯ − 20 16 ? 4? 36 ? 8 ? π¦ = 2π₯ 72 16 35 4 π¦ = 3π₯ − 20 85 ? 12 ? 20 3 2 ?7 ?9 40 5 40 ? 3 ? 11 27 300 ? 25 ? π₯ π¦= 2 π₯ + 10 π¦= 3 π₯ − 100 π¦= 5 Example Exam Question Suppose we’ve worked all these out already. f) Coding is ππ = ππ − π Mean decreases by 5. π unaffected. Median decreases by 5. Lower quartile decreases by 5. Interquartile range unaffected. ? Exercises Page 26 Exercise 2E Q3, 4 Page 47 Exercise 3E Q2, 3, 5, 7 (Note: answers in book for Q2 are incorrect: ππ€ = 70.71 ? therefore ? π π€ = 7.071 10 ? ππ€−200 = 70.71 ππ€−10 = 0.354 ? 200 Chapters 2-3 Summary I have a list of 30 heights in the class. What item do I use for: • π1 ? • π2 ? • π3 ? ? 8th Between 15 ?th and 16th 23rd ? For the following grouped frequency table, calculate: Height π of bear (in metres) 0 ≤ β < 0.5 4 0.5 ≤ β < 1.2 20 1.2 ≤ β < 1.5 5 1.5 ≤ β < 2.5 11 a) The estimate mean: β = b) The estimate median: c) The estimate variance: (you’re given Σπβ2 = 67.8125) Frequency 0.25 × 4 + 0.85 × 20 + β― 46.75 ? = 40 = 1.17π π‘π 3π π 40 16 0.5 + × 0.7 = 1.06π ? 20 67.8125 46.75 2 ? π = − 40 40 2 = 0.329 π‘π 3π π Chapters 2-3 Summary What is the standard deviation of the following lengths: 1cm, 2cm, 3cm π2 = 14 2 − 22 = 3 3 ? π= 2 3 The mean of a variable π₯ is 11 and the variance 4. π₯+10 The variable is coded using π¦ = 3 . What is: a) The mean of π¦? b) The variance of π¦? π = π? π πππ = π? A variable π₯ is coded using π¦ = 4π₯ − 5. For this new variable π¦, the mean is 15 and the standard deviation 8. What is: a) The mean of the original data? π = π? b) The standard deviation of the original data? ππ = ? π