Statistics 1. 2. 3. 4. 5. 6. 7. 8. Finding the Mean, Median & Mode Statistical notation (C) Frequency Table and Graph The 5-figure summary (& box plot) (C) Cumulative frequency (table/graph) (C) Making comparisons (C) Standard deviation (C) Probability Knightswood©Copyright Kayar publishers 2000 Outline Course Notes Statistics page 34 1) Finding the Mean, Median & Mode The mean, median and mode are all types of averages. Mean = nx (all the numbers added together and divided by the number of numbers). median = The middle number in a set of ordered numbers (if there is no single middle number then the middle two are added together and halved). Mode = The most frequently occurring number. Example: Answer: Here are Jane's marks from her last 8 tests (marks are out of 10). 4, 5, 6, 6, 7, 7, 7, 9 Find the mean, median and mode for this data. Mean = nx = 4 + 5 + 6 + 6 + 7 + 7 + 7 + 9 = 6.375 8 Median = 6 + 7 = 6.5 2 Mode = 7 Note: 2) The mean is usually considered to be the best average since it takes into account the value of every number in the data set. Statistical Notation (C) Here are some commonly used symbols in statistics and their meaning (try to find them on your scientific calculator). xi n Each number in your data set (in the above example, x 1 = 4, x 2 = 5, etc...) The number of numbers in your data set. x The mean x All the numbers in your data set added together. x2 Square all the numbers in your data set first then add them all together. ( x ) 2 Add all the numbers in your data set together then square the total. n−1 or s P(a) Standard deviation (see later) The probability of the outcome "a" occurring. Knightswood©Copyright Kayar publishers 2000 Outline Course Notes Statistics page 35 3) Frequency Table & Graph Example: (C) Here are the number of goals Inverness Thistle scored over 20 matches during season 2000/2001. 3, 1, 1, 2, 1, 2, 0, 1, 0, 4, 1, 0, 0, 2, 2, 0, 1, 1, 0, 4. Show the information in a frequency table and draw a frequency graph and find the mean, mode and median. Answer: The frequency table. Number of goals (x) tally frequency (f) fxx 0 IIII 5 0 1 IIII II 7 7 2 IIII 4 8 3 II 2 6 4 II 2 8 0 0 20 29 5 Total This total gives n This total gives x The frequency graph. Goals scored by Inverness Thistle f 6 5 Notice that we can easily find the mean from the totals in the table. 29 x = x n = 20 = 1.45 4 By looking at the graph, you can see that the highest bar is for 1 goal. 3 2 So the mode is 1. 1 0 1 2 3 4 5goals The median is not so obvious. The middle numbers are x 10 and x11 which are both 1 again (you can see this from the graph or you can order the data set). So the median is calculated by 1 + 1 = 1. 2 Knightswood©Copyright Kayar publishers 2000 Outline Course Notes Statistics page 36 4) The 5-figure summary (& box plot) (C) Example: A local newspaper makes a note of its daily sales for the past two weeks (numbers are in nearest thousand);11, 9, 8, 7, 8, 10, 12, 13, 10, 6, 7, 7, 9, 10 Find the 5-figure summary for this data. Answer: We must firstly order the data set. 6, 7, 7, 7, 8, 8, 9, 9, 10, 10, 10, 11, 12, 13 There are 14 numbers in the data set which divides into two sets of 7. 14 There is no middle number so use x7 and x8 for the median (called Q 2). 7 x7 x8 7 3 Q3 Q2 The two sets of 7 now divide into four sets of 3, each clearly having a middle number at x 4 and x11. 3 Q1 3 3 The 5-figure summary is defined as... L (the lowest number in the data set) H (the highest number in the data set) Q1 (occurring at exactly x4 in this case) x7 + x8 9 + 9 Q2 (the median) = = 2 2 Q3 (occurring at exactly x11 in this case) = = = = = 6 13 7 9 10 Q1, Q2 & Q3 are known as the quartiles. The range of the data set is defined to be H - L = 13 - 6 = 7 This information can be represented as a box plot to make it easier to see how the data is spread. Box Plot 6 2 3 4 5 6 7 7 8 9 10 9 10 13 11 12 13 14 Notice.... The position of the box tells us that the numbers are mostly between 7 & 10. The median or Q 2 is indicated by a line through the box. The length of the box (the interquartile range, Q3-Q1) gives an indication of the spread of the numbers. You will find out a better measurement of the spread later on. Knightswood©Copyright Kayar publishers 2000 Outline Course Notes Statistics page 37 5) Cumulative Frequency (table/graph) Example: During an epidemic of blood poisoning, a sample of 24 people were screened for the disease. The number of infected blood cells per ml were counted by machine and put into class intervals of 100. Diseased cells 0-100 Frequency i) 101-200 3 201-300 2 301-400 2 401-500 4 501-600 6 601-700 3 1 701-800 1 801-900 1 901-1000 1 The cumulative frequency table. We can construct a cumulative frequency table as shown below. Try to make sure you can see where all the numbers are coming from ( help from a teacher may be required in this example). Diseased cells frequency Cumulative (per ml) frequency 0 - 100 3 3 101 - 200 2 5 201 - 300 2 7 301 - 400 4 11 401 - 500 6 17 501 - 600 3 20 601 - 700 1 21 701 - 800 1 22 801 - 900 1 23 901 - 1000 1 24 X Plot the diseased cells against the cumulative frequency. X Draw a smooth best fitting curve. ii) Cumulative frequency graph. 24 Cumulative Frequency (C) 21 18 X Q3 X X The graph can now be used to estimate the values of Q1, Q2 & Q3 . X 15 12 X Q3 9 6 3 0- 24 + 4 = 6, so we will use 6, 12 & 18 for the quartiles. Q2 Q1 You can now read that, Q1 = 250 Q2 = 420 Q3 = 530 approximately X Q2 X X 100 Q1 200 300 400 500 Cells Knightswood©Copyright Kayar publishers 2000 Outline Course Notes 600 700 800 900 1000 continued over/ Statistics page 38 5) Cumulative Frequency (table/graph) (continued) (C) iii) The interquartile and semi-interquartile range (a measurement of spread). The interquartile range can now be calculated as Q3 - Q1 = 530 - 250 = 280 1 or the semi-interquartile range = 2 (Q 3 − Q 1 ) = 140 is more commonly used. iv) Interpreting results We can see from our graph that most people in our sample had a relatively low count of diseased cells (the graph rises steeply at the beginning, between 275 (Q1) and 530 (Q3 )) and very few had a count over 800. Of course, in the real world, we would have to sample many more people to make any firm conclusions. 6) Making comparisons (C) Statistics are often used to compare two sets of data which may look similar but in fact are quite different. Example 1: The number of punishment exercises given out each week by a teacher in Poshwood Academy is shown in the stem-leaf diagram below. 0, 0, 1, 3, 5, 6, 7, 7, 8, 9 0, 0, 1, 1, 1, 2, 2, 2, 4, 5, 7, 7, 7 0, 3 Here n = 25 1 1 2 2 Analysis Count the numbers in the data set. You should find 25. 25 Let us find the 5-figure summary. Use a box diagram to find out where Q1 , Q2 & Q3 occur. L=0 H = 23 x + x7 6 + 7 Q1 = 6 = = 6.5 2 2 Q2 = x 13 = 10 x + x 20 12 + 14 Q3 = 19 = = 13 2 2 12 6 x6 Q2 x7 12 6 x19 6 Q1 x20 6 Q3 Q2 is an exact middle number Q1 & Q3 has to be calculated The Box Plot for Poshwood Academy 0 10 6.5 0 5 10 Knightswood©Copyright Kayar publishers 2000 Outline Course Notes 13 23 15 20 25 30 Statistics page 39 6) Making comparisons (continued) (C) Example 2: Some distance away, in Shadywood Secondary, the number of punishment exercises issued by a teacher were recorded as in the previous example. Here are the results in another stem-and-leaf diagram. This the survey was carried out over a Key 14 week period (in the last example n=25). 0, 2, 6, 9, 9, 1 2 = 12 We can still compare the information. 2, 7, 0, 1, 1, 3, 4, 6, 9 1 1 2 2 Analysis This time n=14. 14 Let us again find the 5-figure summary. Use a box diagram to find out where Q1 , Q2 & Q3 occur. L=0 H = 29 Q1 = x 4 = 9 x + x 8 17 + 20 Q2 = 7 = = 18.5 2 2 Q2 = x 11 = 23 7 x7 7 x8 Q2 3 3 Q1 3 Q3 3 Q2 must be calculated Q1 & Q3 has an exact middle number The Box Plot for Shadywood Secondary 9 0 15 10 5 0 18.5 23 20 29 25 30 25 30 Compare it with the box plot for Poshwood Academy 6.5 0 10 13 23 15 10 5 0 By comparing the two box plots;- 20 There tends to be more punishments given out by the teacher from Shadywood Secondary (the box is more over to the right) There is a much greater spread of data in Shadywood Secondary (look at the length of box) The median (a simple average) is lower for Poshwood Academy (look at the position of Q2) Knightswood©Copyright Kayar publishers 2000 Outline Course Notes Statistics page 40 7) Standard Deviation (C) Standard deviation is a measurement of how the data is spread out from the mean. It is a much better measurement of spread than the semi-interquartile range. You have a choice of two formulas for calculating the standard deviation. Example: Method 1 The length of phone calls made in a day from an office were monitored and the results are shown below (in minutes). 3, 12, 12, 5, 9, 8, 9, 21, 4, 6, 2, 5, 5, 7, 8, 2, 10, 9 Find the standard deviation of the data set (working to 2 decimal places). mean = x = nx = 137 = 7.61 18 x A table of values is now useful,- (x − x) 2 x−x 2 -5.61 31.47 2 -5.61 31.47 3 -4.61 21.25 4 -3.61 13.03 5 -2.61 6.81 5 -2.61 6.81 5 -2.61 6.81 6 -1.61 2.59 7 -0.61 0.37 8 0.39 0.62 8 0.39 0.62 9 1.39 1.93 9 1.39 1.93 9 1.39 1.93 10 2.39 5.71 12 4.39 19.27 12 4.39 19.27 21 13.39 x = 137 (totals) 179.29 2 (x − x) = 351.18 Standard deviation ( n−1 or s) is given by the formula, n−1 = (x − x) 2 n−1 = 351.18 17 = 4.54 This is a lengthy method. A quicker way is shown on the next page. Knightswood©Copyright Kayar publishers 2000 Outline Course Notes Statistics page 41 7) Standard Deviation (continued) Method 2 (the one pass formula) Although this formula looks more difficult it is actually quicker to use (especially if you are able to use the statistical mode on your scientific calculator). There is no need to calculate the mean and it cuts down on the rounding error build-up. For our data set, x = 137 and x 2 = 1393 (obtained by firstly squaring every number in the data set and then adding them all together or simply by using the statistical mode on your calculator). Standard deviation ( n−1 or s) can now be calculated by this formula, n−1 = ( x)2 x 2 − n n−1 = 1393 − 18769 18 17 = 350.28 = 4.54 17 In an exam you must show all the working so it is a good idea to use the statistical mode on your calculator to find x and x 2 then substitute these values into their correct place in the formula and do the calculation. Once you have found the answer you can check it on the calculator by using the n−1 key. Don't try to memorise these formulae, they are given to you in the exam. 8) Probability (C) Probability is measured on a scale of 0 to 1. If a certain event (outcome) is impossible the probability of it occurring is 0. If a certain event (outcome) is certain then the probability of it occurring is 1. Probability of any event happening is calculated from, P(E) = number of favourable outcomes total number of outcomes where P(E) is the probability of an event "E" occurring. Example: a) b) c) There are 52 cards in a standard pack of playing cards. What is the probability of selecting any club (√) at random? What is the probability of selecting any king at random? What is the probability of selecting a joker at random? Answer: a) Probability of selecting a club. P(any √) = 13 = 0.25 52 (there are 13 cards of each suit) b) Probability of selecting any king. P(any king) = 4 = 0.077 (to 3 decimal places) 52 c) Probability of selecting a joker. P(joker)= 0 = 0 (no jokers in a 52 card pack) 52 Knightswood©Copyright Kayar publishers 2000 Outline Course Notes Statistics page 42