0DWKHPDWLFV Statistics Higher Spring 1999 HIGHER STILL 0DWKHPDWLFV 6WDWLVWLFV +LJKHU 6XSSRUW0DWHULDOV *+,-./ EXPLORATORY DATA ANALYSIS (EDA) PREVIOUS KNOWLEDGE Measures of Central Tendency Students should be aware of the following: n • the sample mean , x = x1 + x 2 + x 3 ... + x n = n ∑x i=1 n i . advantages - takes account of all the data , easy to handle mathematically disadvantages - can be distorted by a single low or high value • the mode is the most frequently occurring observation in a data set advantages - easy to find disadvantages - does not take account of all the data , difficult to handle mathematically • the median (Q 2 ) is the middle observation in a set of data, arranged in numerical order and splits the data into two equal halves advantages - not affected by unusually low or high values disadvantages - does not take account of all the data , difficult to handle mathematically • the lower quartile (Q1 ) , the median (Q 2 ) and the upper quartile (Q 3 ) split an ordered data set into four equal quarters . Example The number of matches in a box were counted for a sample of 17 boxes. The results were: 51 52 52 51 48 48 53 49 48 52 50 50 51 48 50 47 46 . For the above data find (a) the mean (b) the mode (c) the median (d) the lower and upper quartiles. Answer To order the data, we first draw a stem-and-leaf diagram (see later section on EDA). 4 5 8 8 6 8 9 8 7 1 2 3 0 1 0 2 1 2 0 4 8 means 48 matches Mathematics: Statistics (Higher) Teachers notes 1 followed by an ordered stem-and-leaf diagram 4 5 6 7 8 8 8 8 9 0 0 0 1 1 1 2 2 2 3 4 8 means 48 matches ∑x . 846 = 49.8 17 n (b) The most frequent observation is 48 so the mode or modal value = 48 . (a) x = = (c) In the above data set the median is the 9th observation . From the stem-and-leaf diagram, the 9th observation is 50 so the median = 50 . n + 1 NB If there are n observations in a data set , the median is the value in the th position. 2 (d) A number of different methods exist to find the quartiles of a data set . Two of the most commonly used methods are illustrated below. METHOD 1 The lower quartile can be defined as the median of the lower half of the ordered data set, with the upper quartile being the median of the upper half. Ignoring the median, the lower half of the data is 46 47 48 48 48 48 49 50. The lower quartile lies halfway between the 4th and 5th observations, so the lower quartile 48 + 48 is = 48. 2 Ignoring the median, the upper half of the data is 50 51 51 51 52 52 52 53. The upper quartile lies halfway between the 13th and 14th observations, so the upper quartile is 51 + 52 = 51.5. 2 METHOD 2 n +1 The lower quartile can be defined as the th position within an ordered data set. 4 3(n + 1) Similarly, the upper quartile can be defined as the th position within an 4 ordered data set. With n = 17, the lower quartile is in the (17+1)/4 = 4.5th position, so the lower quartile is 48 + 48 = 48. The upper quartile is the value in the 3(17+1)/4 = 13.5th position, so the 2 51 + 52 upper quartile is = 51.5. 2 Mathematics: Statistics (Higher) Teachers notes 2 NB (a) The above methods give the same answers for the quartiles. However, this will not always be the case . Students should be encouraged in their written work to be clear about the method they have used for determining the quartiles of a data set. (b) Almost all scientific and graphic calculators will find the mean. Some graphic calculators will also find the upper and lower quartiles. Samples and Populations In a statistical study, the complete set of objects under investigation is called a population. Collecting information on every member of the population is known as a census. A census, however, is regularly rejected as a means of gathering information as it can be very time - consuming and expensive to administer. Instead, a sample or subset of the population is taken. It is extremely important that the sample is representative of the population, i.e. that its characteristics mirror the characteristics of the population. The sample will be analysed and conclusions made about the population. Samples which are unrepresentative or biased may lead to biased or unjustified conclusions. The process of using a sample to infer details about the population is known as statistical inference. Sample statistics, such as the mean, x , will allow the statistician to estimate the parameter µ, the true value of the population. For example, in a sample of 100 senior pupils, we might discover the proportion of the 100 pupils in the sample who are vegetarians. We might assume that the proportion in the population of all senior school pupils is similar, but we would not know this population proportion exactly. Probability sampling methods should be used to avoid sampling bias. Simple random sampling is a method of sampling where each member of the population has an equal chance of being selected for the sample. For example, a sample of n objects could be selected from a population of N objects by putting N tickets (numbered consecutively from 1 to N) in a hat and n tickets picked out a random. Other sampling methods are available but will not be covered within this course. Mathematics: Statistics (Higher) Teachers notes 3 Measures of Variability Students should be aware of the following: • the range = maximum value - minimum value • the interquart ile range (IQR) = upper quartile - lower quartile = Q 2 - Q1 , 1 • the semi - interquart ile range (SIQR) = half of the interquart ile range = 2 (Q 2 - Q1 ) . advantages - not affected by extreme values disadvanta ges - difficult to handle mathematic ally • the sample standard deviation is a measure of the variabili ty of the data about the sample mean , 2 n x ∑ xi n (x x ) ∑ ∑ i i=1 i =1 i=1 , s = or n -1 n -1 advantages- takes account of all the data , easy to handle mathematically disadvanta ges - can be distorted by extreme values n n 2 i 2 NB (a) s2 is called the sample variance. (b) The (n – 1) divisor is used in the formula because on average it produces better estimates of the population standard deviation σ. Example The number of matches in a box were counted for a sample of 17 boxes. The results were: 51 52 52 51 48 48 53 49 48 52 50 50 51 48 50 47 46 . For the above data find (a) the range (b) the interquartile range (c) the sample standard deviation . Answer (a) range = maximum – minimum = 53 – 46 = 7 (b) From the example on page 1, Q3 = 51.5 and Q1 = 48 ⇒ interquartile range = 51.5 – 48 = 3.5 Mathematics: Statistics (Higher) Teachers notes 4 (c) n = 17 , ∑ x = 846 , ∑ x 2 = 42166 42166 - (846 ) 17 = 17 - 1 2 so s = 42166 - 42100.941... = 2.02 16 n ∑ (x - x ) NB (i) The formula s = 2 i=1 is not used since the use of x = 49.7647... n-1 could lead to rounding error . (ii) Almost all scientific and graphic calculators will find the sample standard deviation . Now try Exercise 1 - Average/Variability. Exploratory Data Analysis (EDA) Before a detailed analysis of a data set is carried out, an initial impression of the data is normally sought. First impressions can be gained by displaying the data in a simple but convenient form and by calculating simple measures of central tendency and variability. In particular, the student should be able to interpret the following diagrams: • • • stem-and-leaf diagrams dotplots boxplots Stem-and-leaf diagrams The stem-and-leaf diagram is a very useful way of organising data and can be used as an alternative to a frequency table or bar chart. It can also be used to compare two data sets. Example The heights of 30 adult males are recorded to the nearest centimetre. 175 195 168 168 167 169 169 165 176 190 173 179 180 175 172 188 178 183 161 172 174 171 173 184 160 184 169 167 171 179 Draw a stem-and-leaf diagram for the above data. Answer Initially, an unordered stem-and-leaf diagram is created with the stems being represented by the hundred/ten digits and the leaves by the unit digit (a key which explains this representation is included beneath each diagram). The unordered diagram is then converted to an ordered stem-and-leaf diagram. Mathematics: Statistics (Higher) Teachers notes 5 16 17 18 19 unordered 8910775899 5135823169249 08434 05 16 17 18 19 16 8 means 168 centimetres ordered 0157788999 1122334556899 03448 05 16 8 means 168 centimetres The stems can also be split to give a more detailed picture of the distribution of the leaves. For each stem, the leaves between 0 and 4 (inclusive) are separated from the leaves between 5 and 9 (inclusive). The ordered stem and leaf diagram above can be adjusted to the following: 16 16 17 17 18 18 19 19 01 57788999 1122334 556899 0344 8 0 5 16 8 means 168 centimetres Example The heights of 30 adult females are recorded to the nearest centimetre. 155 169 172 160 169 171 190 171 172 156 166 163 166 170 172 156 170 163 170 172 169 164 172 169 171 153 169 174 161 175 Draw a back-to-back stem-and-leaf diagram using the above data and the data from the previous example. Comment on any differences between the two sets of data. Answer The back-to-back stem-and-leaf diagram allows a simple comparison of the two data sets to be made. Working as before, we place the leaves for the male data on the left of the stems and the leaves for the female data on the right of the stems. Mathematics: Statistics (Higher) Teachers notes 6 Males Females 15 3 15 5 6 6 1 0 16 0 1 3 3 4 9 9 9 8 8 7 7 5 16 6 6 9 9 9 9 9 4 3 3 2 2 1 1 17 0 0 0 1 1 1 2 2 2 2 2 4 9 9 8 6 5 5 17 5 4 4 3 0 18 8 18 0 19 0 5 19 16 8 means 168 centimetres From the above data, males are generally taller than females. Dotplots A dotplot is another alternative to the bar chart. Example The number of matches in a box were counted for a sample of 17 boxes. The results were: 51 52 52 51 48 48 53 49 48 52 50 50 51 48 50 47 46 Draw a dotplot for the above data. Answer Firstly we might construct an ordered stem-and-leaf diagram. 4 5 6 7 8 8 8 8 9 0 0 0 1 1 1 2 2 2 3 4 8 means 48 The dot plot can then be easily constructed. • • • • • • • • • • • • • • • • • 46 47 48 49 50 51 52 53 Number of matches Now try Exercise 2 - Exploratory Data Analysis (EDA) , Questions 1 - 5. Mathematics: Statistics (Higher) Teachers notes 7 Boxplots Once an ordered stem-and-leaf diagram has been produced, it can easily be converted into a boxplot (or box and whisker diagram). The boxplot is a graphical representation of the five number summary: minimum , lower quartile , median upper quartile and maximum. The boxplot is an extremely useful way of comparing two or more data sets. Example The heights of 30 adult males are recorded to the nearest centimetre. 175 195 168 168 167 169 169 165 176 190 173 179 180 175 172 188 178 183 161 172 174 171 173 184 160 184 169 167 171 179 Draw a boxplot for the above data. Answer Firstly, we construct an ordered stem-and-leaf diagram and then calculate the median and the quartiles. 16 16 17 17 18 18 19 19 01 57788999 1122334 556899 0344 8 0 5 16 8 means 168 centimetres The minimum is 160 and the maximum is 195. The median is the value in the 30 + 1 = 15.5 th position. 2 173 + 173 ⇒ Q2 = = 173 2 The lower quartile is the median of the lower half of the data, i.e. the 8th position.. ⇒ Q1 = 169 The upper quartile is the median of the upper half of the data, i.e. the 23rd position. ⇒ Q3 = 179 Mathematics: Statistics (Higher) Teachers notes 8 The boxplot for the above data is constructed as follows: Q1 Q2 Q3 max min 160 200 190 180 170 The ‘box’ represents the middle 50% of the data, the lower 25% of the data by the lower ‘whisker’ and the upper 25% of the data by the upper ‘whisker’. Outliers From time to time, extreme values may occur within a data set. These values are called outliers, being much smaller or much larger than the rest of the data. They can occur as a result of natural variation or by some error in the data collection. Outliers are commonly identified by using fences or boundaries within the data set. Any values which lie beyond these fences are considered to be possible outliers. The lower fence is defined as Q1 – 1.5 x IQR and the upper fence as Q3 + 1.5 x IQR, where IQR represents the interquartile range. For the above example: IQR = 179 – 169 = 10 lower fence = 169 – 1.5 x 10 = 154 upper fence = 179 + 1.5 x 10 = 194 Since there are no values of the data less than 154 or greater than 194, we can conclude that there are no outliers within the data. If, however, an outlier is identified, we adjust the boxplot of the data by clearly labelling it (usually with an asterisk) and draw the appropriate whisker to the nearest piece of data just inside the fence. Consider the following example. Example The heights of 30 adult females are recorded to the nearest centimetre. 155 169 172 160 169 171 190 171 172 156 166 163 166 170 172 156 170 163 170 172 169 164 172 169 171 153 169 174 161 175 Draw a boxplot for the above data. Mathematics: Statistics (Higher) Teachers notes 9 Answer As before, we construct an ordered stem-and-leaf diagram and then calculate the median and the quartiles. 15 15 16 16 17 17 18 18 19 3 566 01334 6699999 000111222224 5 The minimum is 153 and the maximum is 190. The median is the value in the 30 + 1 = 15.5 th position. 2 169 + 169 ⇒ Q2 = = 169 2 The lower quartile is 8th value. ⇒ Q1 = 163 0 16 9 means 169 centimetres The upper quartile is the 23rd value. ⇒ Q3 = 172 The IQR = 172 – 163 = 9, the lower fence = 163 – 1.5 x 9 = 149.5, and the upper fence = 172 + 1.5 x 9 = 185.5. Since 190 is beyond the upper fence it can be considered an outlier, although its occurrence is more likely due to natural variation rather than recording error. The boxplot for the above data is constructed as follows: ∗ 150 160 170 180 190 200 Now try Exercise 2 - Exploratory Data Analysis (EDA), Questions 6 - 13. Mathematics: Statistics (Higher) Teachers notes 10 Using Boxplots for Comparisons Example Use the boxplots from the previous two examples to compare the relative heights of adult males and adult females. Answer Adult male heights ∗ 150 160 170 180 190 Adult female heights 200 Observations on the above boxplots might include: • • • • the median of the male heights is greater than the median of the female heights the variability within both groups is broadly similar (see range and IQR) the median of the female heights is roughly equal to the lower quartile of the male heights, i.e. 50% of female heights are below 169 whereas 75% of male heights are greater than 169 some females were taller than some males, with the tallest man being only 5 centimetres taller than the tallest woman. From the above observations, there would appear to be some evidence to suggest that adult males are generally taller than adult females. Now try Exercise 3 - Interpreting an EDA. Mathematics: Statistics (Higher) Teachers notes 11 PROBABILITY Simple Probability Probability is a measure of how likely something is to happen. Some simple definitions are necessary to enable us to discuss probability in an informed manner. These include : • A random experiment or trial is one in which there are a number of possible outcomes where we have no way of predicting which outcome or outcomes will actually occur. • The sample space, usually denoted by S, is the set of all possible outcomes of the trial. • An event is any set of possible outcomes of a trial. An event is therefore a subset of the sample space S. • The relative frequency is the frequency of an event divided by the total frequency. In experimental situations it is used as an estimate for the probability of that event. Example A simple random experiment would be the rolling of an ordinary six-sided die since, before the die is rolled, we are unable to predict the outcome of the trial. The possible outcomes are 1, 2, 3, 4, 5, 6 with the sample space written as S = {1, 2, 3, 4, 5, 6}. Possible events include ‘the outcome is a prime number’ and ‘the outcome is a number greater than 2’. Probability is measured on a scale of 0 to 1: • a probability of 0 means that the event can never happen and the closer the probability of an event is to 0 the less likely it is to happen • a probability of 1 means that the event is certain to happen and the closer the probability of an event is to 1 the more likely it is to happen • events are often described as impossible, unlikely, possible, likely or certain • where all outcomes of a trial are equally likely, probability is defined as: number of favourable outcomes total number of outcomes • in experimental situations, as the number of trials increases, the probability of an event occurring is given by the limit of the relative frequency of that event • an event is usually denoted by a capital letter e.g. A , B , C etc with the probability of its occurring being denoted by P(A) Mathematics: Statistics (Higher) Teachers notes 12 • n(A) where n(A) = the number of outcomes described by the event A, n(S) n(S) = the total number of outcomes in the sample space • 0 ≤ P(A) ≤ 1 P(A) = Example A single card is selected from a standard pack of 52 playing cards. Find the probability that the card is (a) a King (b) a Heart (c) the Ace of Spades . Answer There are 52 equally likely outcomes of this trial. n(Kings) 4 1 (a) P(King) = = = n(total) 52 13 n(Hearts) 13 1 (b) P(Heart) = = = n(total) 52 4 n(Ace of Spades) 1 (c) P(Ace of Spades) = = n(total) 52 Data could also be presented in tabular form. Example A survey of 100 people revealed the following voting intentions. Labour SNP Conservative Liberal Democrat Total Women 22 18 6 3 49 Men 20 22 4 5 51 Total 42 40 10 8 100 A person is chosen at random from this group. Find the probability that the person (a) is a woman (b) intends to vote SNP (c) is a man intending to vote Liberal Democrat . Answer (a) P(woman) = n(women) 49 = = 0.49 n(total) 100 Mathematics: Statistics (Higher) Teachers notes 13 n(SNP) 40 = = 0.4 n(total) 100 n(male Lib Dem) 5 (c) P(male Lib Dem) = = = 0.05 n(total) 100 (b) P(SNP) = Now try Exercise 1 - Simple Probability. Sample Spaces & Further Simple Probability When events become more complex, it is extremely important that students can list the members of the sample space. The members of the sample space can be conveniently identified by systematic listing or by using tables or tree diagrams. Consider the following examples. Example The menu in a restaurant has 4 choices of main course and 3 choices of dessert . Main Course Chicken (C) Salmon (S) Lamb (L) Pork (P) Desserts Fruit salad (F) Ice cream (I) Gateau (G) How many different combinations could be chosen from the above menu ? Answer Representing each choice by its first letter, a systematic list could be set out as follows. Chicken (C) could be combined with any of the desserts to give Similarly, for Salmon (S) we have for Lamb (L) we have and for Pork (P) we have CF CI CG SF LF PF SI LI PI SG LG PG There are 4 x 3 = 12 different possible combinations. Example An unbiased die is rolled and a fair coin is tossed. (a) List the sample space for this experiment. (b) Calculate the probability of obtaining a head and an even number. Mathematics: Statistics (Higher) Teachers notes 14 Answer (a) The coin can land Heads (H) or Tails (T) . The die can show 1, 2, 3, 4, 5 or 6. A systematic list would produce the following H1 T1 H2 T2 H3 T3 H4 T4 H5 T5 H6 T6 Alternatively , the above list could have been set out in tabular form . 1 2 Die 3 H H1 H2 H3 H4 H5 H6 T T1 T2 T3 T4 T5 T6 4 5 6 Coin This method works well but is limited to situations where only 2 choices are to be made. A further alternative is to represent this situation using a tree diagram. Coin H T Die 1 2 3 4 5 6 Outcome H1 H2 H3 H4 H5 H6 1 2 3 4 5 6 T1 T2 T3 T4 T5 T6 This is an excellent method, but can become overly complicated if there are too many branches. The sample space S = {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6} (b) P(head and even) = n(head and even) 3 1 = = 4 n(total) 12 Now try Exercise 2 - Sample Spaces & Further Simple Probability. Mathematics: Statistics (Higher) Teachers notes 15 Mutually Exclusive & Exhaustive Events Two or more events are said to be mutually exclusive if they cannot occur at the same time. Two or more events are said to be exhaustive if they combine to form the entire sample space. Example Decide if each pair of events X and Y is mutually exclusive and/or exhaustive. (a) X : selecting a spade from a standard pack of 52 playing cards Y : selecting a red card from a standard pack of 52 playing cards (b) X : obtaining an even number on the roll of a fair six - sided die Y : obtaining a number greater than 4 on the roll of a fair six - sided die Answer (a) A spade is a black card. A black card and a red card cannot be selected at the same time, so X and Y are mutually exclusive. X and Y are not exhaustive as neither event includes the selection a club. (b) Obtaining a six allows X and Y to occur at the same time, so X and Y are not mutually exclusive. X and Y are not exhaustive as neither event includes 1 or 3 . Venn Diagrams A useful way of representing events and their probabilities is to use the Venn diagram. The Venn diagram provides a picture of how simple events are related to each other. S S A A Figure 1 A Figure 2 The area within the rectangle represents the entire sample space S and the area within the circle represents the situation when event A occurs (see Figure 1 above). The area outwith A , denoted by A (read as ' not A' ) , represents the event that A does not occur (see Figure 2 above) . A ∩ B ≠ 0 A,B not mutually exclusive S A ∩ B = 0 A,B mutually exclusive S A B Figure 3 Mathematics: Statistics (Higher) Teachers notes A B Figure 4 16 The event A and B , denoted by A ∩ B , represents the area of overlap between A and B (see Figure 3) . For mutually exclusive events there is no overlap between A and B ⇒ P(A ∩ B) = 0 (see Figure 4) . A∪B A,B not mutually exclusive S A∪B A,B mutually exclusive S A B Figure 5 A B Figure 6 The event A or B , denoted by A ∪ B , represents the area in either A or B or the area in A ∩ B (see Figure 5) . The following probability rule can therefore be deduced : P(A ∪ B) = P(A) + P(B) - P(A ∩ B) . For mutually exclusive events (see Figure 6) , since P(A ∩ B) = 0 , the above result simplifies to P(A ∪ B) = P(A) + P(B) . This result is known as the Addition Rule for mutually exclusive events. NB Only the simplified rule is required at this level. Students must verify that events are mutually exclusive before using this rule. Example An unbiased six-sided die is thrown. Calculate the probability of scoring a 2 or an odd number. Answer The events ‘scoring a 2’ and ‘scoring an odd number’ are mutually exclusive. P(2 or an odd number) = P(2) + P(odd) = = 1 3 + 6 6 2 3 Mathematics: Statistics (Higher) Teachers notes 17 Example 1 1 5 , P(B) = and P(A or B) = . 4 3 12 Are the events A and B mutually exclusive ? For events A and B , P(A) = Answer 1 1 7 + = ≠ P(A or B) 4 3 12 So events A and B are not mutually exclusive . P(A) + P(B) = Now try Exercise 3 - Mutually Exclusive and Exhaustive Events. Independent Events Two or more events are said to be independent if the occurrence of any one event does not affect the occurrence of any of the other events. Example For each pair of events X and Y listed below, decide whether or not it is likely that the events are independent. (a) (b) X : I throw a fair die and score a 5 . Y : I throw the same fair die again and score another 5 . X : I catch measles . Y : My brother catches measles . Answer (a) Scoring a 5 on the first throw does not affect what happens on the second throw. Therefore, events X and Y are independent. (b) As measles is an infectious disease, it is likely that event Y is influenced by event X and so events X and Y are not independent. For independen t events A and B we have the following rule : P(A and B) = P(A ∩ B) = P(A) x P(B) = P(A).P(B) . This result is known as the Multiplication Rule for independent events. NB Students must verify that events are independent before using this rule. Mathematics: Statistics (Higher) Teachers notes 18 Example Two fair dice, each numbered 1 to 6, are rolled. Events A, B, C and D are defined as follows: A : The first die scores 5 B : The second die scores 5 C : The total is 6 D : The total is 7 (a) Find P(A ∩ B) (b) Find P(A ∩ C) and P(A ∩ D) (c) Which of the events A , C and D are independent ? Answer (a) Since events A and B are independent , P(A ∩ B) = P(A) x P(B) 1 1 = x 6 6 1 = 36 n(A ∩ C) 1 (b) P(A ∩ C) = = n(total) 36 n(A ∩ D) 1 P(A ∩ D) = = n(total) 36 1 5 5 (c) P(A) x P(C) = x = ≠ P(A ∩ C) 6 36 216 So events A and C are not independent . 1 1 1 P(A) x P(D) = x = = P(A ∩ D) 6 6 36 So events A and D are independent . Now try Exercise 4 - Independent Events. Tree Diagrams - With and Without Replacement Example A bag contains 4 green and 5 yellow balls. One ball is selected and then replaced. A second ball is then selected. Find the probability that both balls are green. Answer In this situation the first selection is returned to the bag before the second selection is made. We would describe this type of selection as sampling with replacement. Mathematics: Statistics (Higher) Teachers notes 19 First ball Second ball Green Outcome Green and Green 5 9 4 9 Yellow Green and Yellow Green Yellow and Green 5 9 Yellow Yellow and Yellow 4 9 Green 4 9 5 9 Yellow P(both green) = P(green and green) 4 4 x 9 9 16 = 81 = The tree diagram is an extremely useful way of illustrating the outcomes of two or more trials. It provides the student with a simple but clear means of displaying what is happening in a problem. In general, the outcomes of the first trial are represented by lines extended from a fixed point. From the ends of these lines other lines are extended to represent the outcomes of the second trial. (The diagram can be extended depending on the number of trials involved in the problem.) The probability of each outcome is written above its line. The probability of the event at the end of each branch is found by multiplying the probabilities along that branch. Example An electrical system consists of three components A, B and C and will only work when all three components are in working order. The three components are manufactured by three different companies. From past experience the following information is available: P(component A is defective) = 0.03 P(component B is defective) = 0.02 P(component C is defective) = 0.04 Calculate the probability that the electrical system will not be operational. Answer P(component A defective) = 0.03 ⇒ P(component A non - defective) = 0.97 , P(component B defective) = 0.02 ⇒ P(component B non - defective) = 0.98 , P(component C defective) = 0.04 ⇒ P(component C non - defective) = 0.96 . Mathematics: Statistics (Higher) Teachers notes 20 Component A defective ? Component B defective ? 0.02 0.98 0.04 Yes YYY 0.96 0.04 No Yes YYN YNY 0.96 0.04 No Yes YNN NYY 0.96 0.04 No Yes NYN NNY 0.96 No NNN No Yes 0.02 0.97 Outcome Yes Yes 0.03 Component C defective ? No 0.98 No P(electrical system not operational) = P(YYY) + P(YYN) + P(YNY) + P(YNN) + P(NYY) + P(NYN) + P(NNY) = 1 - P(NNN) = 1 - 0.97 x 0.98 x 0.96 = 0 .087424 Now try Exercise 5 - Tree Diagrams (With Replacement). Example A bag contains 4 green and 5 yellow balls. One ball is selected and not replaced. A second ball is then selected. Find the probability that both balls are green. Answer Sometimes the probabilities of subsequent events may change as a result of earlier events. In this situation, the first selection is not returned to the bag before the second selection is made. We would describe this type of selection as sampling without replacement. The conditions for calculating probabilities have now been changed - the number of green or yellow balls has been reduced by one, as has the total number of balls. First ball 4 9 5 9 3 8 Second ball Green Outcome Green and Green Yellow Green and Yellow Green Yellow and Green Yellow Yellow and Yellow Green 5 8 1 2 Yellow 1 2 Mathematics: Statistics (Higher) Teachers notes 21 P(both green) = P(green and green) 4 3 x 9 8 1 = 6 = Example A committee consists of 5 people: 3 women and 2 men. Two members are to be chosen at random to be the Chairperson and the Vice-Chairperson. Find the probability that the two chosen are of opposite sex. Answer Chairperson 3 5 2 5 1 2 Vice - Chairperson Woman Outcome Woman and Woman Man Woman and Man Woman Man and Woman Man Man and Man Woman 1 2 3 4 Man 1 4 P(opposite sex) = P(woman and man) + P(man and woman) 3 1 2 3 x + x 5 2 5 4 3 = 5 = Now try Exercise 6 - Tree Diagrams (Without Replacement). Combinations The number of unordered arrangements of r objects selected from a collection of n n objects is denoted by n C r or (read as ‘n c r’ or ‘n choose r’). Each collection of r selected objects is called a combination. n The general formula for is : r n n! n(n - 1 )(n - 2 ) ... (n - r + 1 ) = = r!(n - r)! r(r - 1 )(r - 2 ) ... 3.2.1 r where n! = n(n - 1 )(n - 2) ... 3.2.1 and 0! = 1 . n At some point the notation should be linked to Pascal' s triangle . r Mathematics: Statistics (Higher) Teachers notes 22 Example Evaluate 8 C3 . Answer 8 C3 = 8! 8.7.6 = = 56 3! 5! 3.2.1 Example A school committee of 5 people is to be chosen from 12 volunteers. (a) In how many ways can the committee be chosen ? (b) The Headteacher selects one of the volunteers to be the chairperson of the committee. In how many ways can the committee now be chosen ? ANSWER 12 12! (a) Number of ways = = = 792 5 5!7! (b) As one member of the committee has been pre-selected, we now have a choice of 4 people from the remaining 11 volunteers. 11 11! Number of ways = = = 330 4 4!7! Now try Exercise 7 - Combinations. Combinations are a useful means of evaluating probabilities. Consider the following example. Example From a well shuffled pack of 52 cards a hand of 7 cards is dealt. Find the probability that the hand will contain (a) exactly 3 kings (b) at least 3 kings. Mathematics: Statistics (Higher) Teachers notes 23 Answer (a) To select exactly 3 kings within a hand of 7 cards we must select 3 kings (from a total of 4 kings) and any other 4 cards (from a total of 48 cards) . P(exactly 3 kings) 4 48 x 3 4 = 52 7 4 x 194580 133784560 ≈ 0.00582 = (b) P(at least 3 kings) = P(3 kings) + 4 48 x 3 4 = 52 7 4 x 194580 133784560 ≈ 0.00595 = P(4 kings) + 4 48 x 4 3 52 7 + 1 x 17296 133784560 Now try Exercise 8 - Combinations (Probability). Simulation An alternative to the calculation of probabilities is to simulate the outcomes of a random experiment using random numbers. Random numbers can be produced by tossing coins, rolling dice, drawing numbered balls from a hat etc. Experiments of this kind, however, can become very tedious and time-consuming if large samples of random numbers are required. Instead, it is possible to make use of psuedo-random numbers which, although not strictly random, have been computer-generated using a mathematical formula. Mathematics: Statistics (Higher) Teachers notes 24 In practice, the prefix ‘psuedo’ is usually omitted and the numbers are described simply as random numbers. Random numbers are usually set out in tabular form (see extract below). 37057 33724 43737 16929 10131 83986 28633 15929 84478 98571 98419 85953 19659 31341 20877 76401 82213 52804 60265 34585 15412 07827 72335 19404 22353 68418 48740 25208 27881 54505 The starting point and direction (right, left, up, down, diagonal etc.) should be predetermined before using such a list. The digits within the list can be taken as individuals (3, 7, 0, 5, 7 ...), as pairs of digits (37, 05, 78, 39, 86 ...) , as decimals (0.37057, 0.83986, 0.98419 ...) or in whatever manner is convenient. The ‘rand’ or ‘rand#’ function on most scientific and graphic calculators is designed to produce random numbers. For example, different types of random number can be produced on a graphic calculator by using the following simple routines: n x rand ENTER .............. produces random numbers between 0 and n n x rand + 1 ENTER ........ produces random numbers between 1 and n + 1 int(n x rand + 1) ENTER ... produces the whole number part of random numbers between 1 and n + 1 (i.e. the numbers 1 , 2 , 3 , ... , n) . Example Using the above list of random numbers, simulate the results of tossing a coin 10 times. Answer Let Heads be represented by the digits 0, 1, 2, 3 and 4 and Tails by the digits 5, 6, 7, 8 and 9 (or alternatively, let Heads be represented by an even number and Tails by an odd number). Starting at the sixth number on the second row and working towards the right we have: 2 H 8 T 6 T 3 H 3 H 8 T 5 T 9 T 5 T 3 H giving 4 Heads and 6 Tails. Mathematics: Statistics (Higher) Teachers notes 25 Example Simulate the results of rolling an unbiased die 30 times. Answer A calculator produced the following list of random numbers: 0.925 0.312 0.240 0.017 0.118 0.930 0.622 0.817 0.617 0.334 0.043 0.086 0.853 0.012 0.451 0.674 0.881 0.982 0.807 0.455 0.114 0.997 0.374 0.696 0.989 0.798 0.124 0.492 0.773 0.805 0.670 0.198 0.597 0.701 0.700 0.552 0.450 0.404 0.464 0.868 0.985 0.398 0.606 0.882 0.544 0.338 0.467 0.229 0.925 0.257 0.633 0.117 0.077 0.371 0.638 0.219 0.286 0.628 0.624 0.717 Discarding the first zero and the decimal point in each number produces the following table. 925 312 240 017 118 930 622 817 617 334 043 086 853 012 451 674 881 982 807 455 114 997 374 696 989 798 124 492 773 805 670 198 597 701 700 552 450 404 464 868 985 398 606 882 544 338 467 229 925 257 633 117 077 371 638 219 286 628 624 717 Starting with the fourth number on the second row and working towards the right (ignoring the digits 0, 7, 8, 9) we have the following simulated scores: 6 6 1 1 6 4 3 1 5 1 1 1 1 2 5 2 6 4 4 6 4 5 1 6 3 2 3 3 3 3 4 1 6 5 1 4 In total, the above gives 7 sixes, 4 fives, 6 fours, 6 threes, 3 twos and 10 ones. The results of this simulation do not agree exactly with the theoretical probabilities. Generally, we would expect there to be some variation between theoretical and simulated results, although this variation should reduce considerably as the size of the simulation increases. Now try Exercise 9 - Simulation. Mathematics: Statistics (Higher) Teachers notes 26 Random Variables In statistics, a variable is described as random if its value is the result of a random observation or experiment. There are two types of random variable: discrete and continuous. A discrete random variable is a variable for which a list of its possible numerical values can be made. Discrete random variables are usually associated with counting. Discrete random variable The number of heads when two fair coins are tossed. The number of sunny days in June. The total score when two unbiased dice are rolled. The number of rolls of an unbiased die until a 6 is obtained. Possible values 0, 1, 2 1, 2, 3 ... , 29, 30 2, 3, 4, 5 ... , 11, 12 1, 2, 3, 4, ... A continuous random variable can take any real numbered value within a certain range. It is not possible, however, to make a list of the numerical values of the variable. Continuous random variables are usually associated with measurement. Continuous random variable The height of an S6 pupil. The true mass of a 2kg bag of flour. The lifetime of a dog. The height of a wave in the North Atlantic. Possible range of values 1.3 m to 2.3 m 1.99 kg to 2.01kg 0 to 15 years 0.5 m to 12 m NB Random variables are usually named using upper case letters, e.g. X, Y, Z ... whereas the values of the random variable are denoted by the corresponding lower case letters x, y, z…. Discrete Probability Distributions The probability distribution of a discrete random variable X sets out the relationship between the values of the random variable and their associated probabilities. It shows how the total probability of 1 is distributed amongst the possible values of X. A formal definition could be stated as follows: X is a discrete random variable if: • for each of its values x, 0 < P(X = x) < 1 • ∑ P( X = x ) = 1 The probability distribution of a discrete random variable X is often set out in tabular form but can also be described using a formula. Mathematics: Statistics (Higher) Teachers notes 27 Example A discrete random variable X has probability distribution: X P(X = x) Find 1 2k 2 3k 3 5k 4 3k 5 7k (b) P(1 < X ≤ 4) (a) the value of the constant k Answer (a) The sum of all the probabilities must be 1. ⇒ 2k + 3k + 5k + 3k + 7k = 1 20k = 1 1 k = 20 3 3 5 (b) P(1 < X ≤ 4) = + + 20 20 20 11 = 20 Example A discrete random variable X has probability function given by: P(X = x) = k (x + 2 )2 , x = 1 , 2 , 3 , 4 . (a) Tabulate the probability distribution of X and find the value of the constant k . (b) Find P(X < 4) . Answer (a) X P(X = x) 1 9k 2 16k 3 25k 4 36k 9k + 16k + 25k + 36k = (b) 1 1 k = 86 9 16 25 P(X < 4) = + + 86 86 86 25 = 43 Mathematics: Statistics (Higher) Teachers notes 28 Example The random variable H represents the number of Heads obtaining when 3 fair coins are tossed. Find the probability distribution of H . Answer The random variable H can take the values 0, 1, 2, 3 . To evaluate the corresponding probabilities we use a tree diagram. First coin Second coin 1 2 1 2 1 2 1 2 1 1 2 2 Tail Head HHT = 2 Heads HTH = 2 Heads 1 2 Tail Head HTT = 1 Head THH = 2 Heads Tail Head THT = 1 Head TTH = 1 Head Tail TTT = 0 Heads Tail 1 2 Head Tail 1 2 Outcome HHH = 3 Heads Head Head 1 2 Third Coin Head 1 2 1 2 1 2 Tail 1 2 As the results obtained on each coin are independent of each other, each branch of the tree 1 1 1 1 has probability × × = . 2 2 2 8 P(1 Head) = P(HTT) + P(THT) + P(TTH) 1 1 1 + + 8 8 8 3 = 8 3 . Similarly , P(2 Heads) = 8 The probability distribution of H can be set out as follows: = h P(H = h) 0 1 8 1 3 8 2 3 8 3 1 8 . Now try Exercise 10 - Discrete Probability Distributions. Mathematics: Statistics (Higher) Teachers notes 29 Discrete Probability Distributions - Expectation and Variance The mean or expected value of a random variable X is denoted by E(X) or µ and is given by ∑ xP(X = x). Example x 0 1 2 P(X = x) 1 1 8 2 1 4 3 1 8 Find the expected value of X. Answer E(X) = 0 × 1 1 1 1 + 1× + 2 × + 3 × = 1 8 4 8 2 Example A man buys 20 tickets out of a total of 1000 tickets sold in a raffle. The price of a ticket is 50p and there is only one prize of £100. Calculate the man's expected gain or loss. Answer x P(X=x) -10 980 E(X) = - 10 × 1000 90 20 1000 20 980 + 90 × 1000 1000 = -8 The man would make an expected loss of £8 . The variance of a random variable X is denoted by Var(X) or σ2 and is given by Var(X) = E(X2) – {E(X)}2 Where E(X) = ∑ xP(X = x) and E(X2) = ∑ x 2 P(X = x) . NB (a) σ2 represents the variance of the population whereas s2 represents the variance of a sample taken from the population. (b) Since probabilities and squared real quantities are never negative we can deduce that Var(X) ≥ 0 or E(X2) ≥ {E(X)}2. (c) The standard deviation of X, denoted by SD(X) or σ, is simply the square root of the variance of X. Mathematics: Statistics (Higher) Teachers notes 30 Example x 0 1 2 P(X = x) 1 1 8 2 1 4 3 1 8 Find the variance of X. Answer 1 1 1 1 = 1 +3 x +2 x +1 x 8 4 8 2 1 1 1 1 1 = 24 + 32 x + 22 x +12 x E(X 2 ) = 02 x 8 4 8 2 E(X) = 0 x 1 1 Var(X) = 2 4 - 12 = 1 4 Example A box contains 2 yellow marbles and 3 green marbles. Two marbles are taken at random without replacement. If G represents the number of green marbles selected, find Var(G). Answer Using a tree diagram the following probability distribution can be found: g P(G = g) 0 1 10 1 +1 x 10 1 +12 E(G 2 ) = 02 x 10 Var(G) = 1.8 - (1.2) 2 E(G) = 0 x 1 3 5 2 3 10 3 3 = 12 +2 x . 10 5 3 3 = 1.8 + 22 x x 10 5 = 0.36 Now try Exercise 11 - Discrete Probability Distributions (Expectation and Variance) . Discrete Probability Distributions - Simulation The results of a random experiment can be modelled by the probability distribution of a suitable discrete random variable. We now consider how results can be simulated from such distributions. Example The discrete random variable X represents the number of heads when 3 unbiased coins are tossed. X has the following probability distribution. x 0 1 2 3 P(X = x) 1 8 3 8 3 8 1 8 Mathematics: Statistics (Higher) Teachers notes 31 Use the following sequence of calculator generated random numbers to simulate the tossing of 3 unbiased coins on 24 occasions. 0.921 0.836 0.255 0.726 0.247 0.101 0.731 0.222 0.594 0.820 0.934 0.492 0.095 0.402 0.646 0.352 0.815 0.729 0.020 0.389 0.367 0.233 0.187 0.235 0.784 0.451 0.331 0.718 0.942 0.730 Answer Firstly, convert the probabilities within the distribution into decimals. x 0 1 2 3 P(X = x) 0.125 0.375 0.375 0.125 We can now assign the above random numbers (r) in the following way: 0.001 ≤ r ≤ 0.125 ⇒ x = 0 0.126 ≤ r ≤ 0.500 ⇒ x = 1 0.501 ≤ r ≤ 0.875 ⇒ x = 2 0.876 ≤ r ≤ 0.999 ⇒ x = 3 The 24 simulations are: 0.921 0.836 0.255 0.726 0.247 0.101 0.731 0.222 0.594 0.820 3 2 1 2 1 0 2 1 2 2 0.934 0.492 0.095 0.402 0.646 0.352 0.815 0.729 0.020 0.389 3 1 0 1 2 1 2 2 0 1 0.367 0.233 0.187 0.235 1 1 1 1 giving a total of 3 zeros, 11 ones, 8 ones and 2 threes. Theoretically, we would have expected 3 zeros, 9 ones, 9 twos and 3 threes. Now try Exercise 12 - Discrete Probability Distributions (Simulation). Mathematics: Statistics (Higher) Teachers notes 32 Continuous Probability Distributions A continuous random variable X can take any value within an interval on the real number line. As there are an infinite number of possible values within this interval, it is not possible to assign probabilities to each and every value within this interval. We would say that P(X = x) = 0 for all possible values, x, within this interval. A continuous random variable does not have a probability distribution but is described using the concept of probability density. Consider the following example. A sample of men's heights is taken and illustrated in the histogram below. The histogram shows relative frequency density against height where relative frequency density = relative frequency . width of interval Thus relative frequency = relative frequency density x width of interval = height of bar x width of bar = area of bar . For large samples, the relative frequency becomes the probability. The area of each bar, therefore, represents the probability associated with each interval. Mathematics: Statistics (Higher) Teachers notes 33 As the number of intervals is increased (i.e. the width of each interval is being decreased as the accuracy of the measurement is improved), we can see (above) that the overall shape of the distribution of heights is tending towards a continuous curve . This curve is called the probability density function. For a continuous random variable X, the probability that X lies in a particular interval is represented by an area under the probability density curve and can be found by integrating the probability density curve over the given interval. f(x) probability density a p q b x The probability density function (pdf) of a continuous random variable X is given by the function f(x) such that q P(p ≤ X ≤ q) = ∫ f ( x)dx where p • f(x) ≥ 0 for all values of x (probabilities cannot be negative) b • ∫ f ( x)dx = 1 (for X defined on the interval a ≤ x ≤ b) a NB (a) For any continuous random variable X, P(a ≤ X ≤ b) = P(a < X < b) (b) The mode occurs at the maximum point on the probability density curve. Example The continuous random variable X has probability density function given by : kx( 4 - x) f(x) = 0 (a) Find k and sketch the graph of f(x) . (b) Write down the mode of X. (c) Calculate P(1 < X < 2). Mathematics: Statistics (Higher) Teachers notes for 0 ≤ x ≤ 4 elsewhere 34 NB The statement that f(x) = 0 ‘elsewhere’ reminds us that our attention should be solely restricted to the interval 0 ≤ x ≤ 4. Answer (a) ∫ 4 ∫ 4 0 0 kx(4 - x) dx = 1 f(x) (4kx - kx 2 ) dx = 1 4 kx 3 2 2kx = 1 3 0 64k = 1 32k 3 32k = 1 3 3 k = 32 3 8 0 2 4 x (b) From the graph of , we can see that f the mode occurs at x = 2 . (If the pdf is a more complex function, it may be necessary to find the mode by solving the equation f '(x) = 0 .) (c) P(1 < X < 2) = ∫ 2 1 = = 3 32 ∫ x( 4 - x) dx 2 3 1 ( 8x- 3 16 x2 - [ 1 32 3 32 x 2 ) dx x3 ] 2 1 11 = 32 Now try Exercise 13 - Continuous Probability Distributions. Continuous Probability Distributions - Expectation and Variance The definition for the expected value and variance of a continuous random variable X, defined on the interval a ≤ x ≤ b , are given below: E(X) = ∫ b a x f(x) dx and Var(X) = E(X 2 ) - {E(X)} 2 where E(X 2 ) = ∫ x 2 f(x) dx . b a Mathematics: Statistics (Higher) Teachers notes 35 Example The lifetime, X years, of an electrical component is a continuous random variable with pdf given by: 92 x(3 − x) for 0 ≤ x ≤ 3 f ( x) = 0 elsewhere Calculate (a) E(X) (b) Var(X) (c) SD(X) Answer (a) E(X) = ∫ 3 ∫ =∫ = = x. 0 3 2 0 9 3 = = x 2( 3 - x) dx 2 2 9 3 1 18 x4 x2 . 2 9 - 2 9 x 3 ) dx ] 3 0 1 2 (b) E(X 2 ) = ∫ = x( 3 - x) dx ( 3 x2 - 0 [x =1 2 9 3 0 ∫ 3 ∫ 3 0 0 [x 1 6 2 9 x( 3 - x) dx x 3( 3 - x) dx 2 ( 3 x3 4 - 2 45 x5 2 9 x 4 ) dx ] 3 0 7 10 =2 Var(X) = E(X 2 ) - {E(X)} 2 7 1 = 2 10 - (1 2 ) 2 = (c) SD(X) = 9 20 9 ≈ 0.671 20 Now try Exercise 14 - Continuous Probability Distributions (Expectation and Variance). Mathematics: Statistics (Higher) Teachers notes 36 The Cumulative Distribution Function The pdf of a continuous random variable does not directly calculate probabilities. Probabilities can only be found indirectly by integrating the pdf over an interval. However, a function which does calculate probabilities is the cumulative distribution function (cdf). It is defined as follows: f ( x) for a ≤ x ≤ b If X is a continuous random variable with pdf given by elsewhere 0 then the cumulative distribution function, F(x), is given by: x F(x) = P(X ≤ x) = ∫ f (t )dt a (t is a dummy variable since x has been used as the upper limit of integration). EXAMPLE The continuous random variable X has pdf given by : 6 x( 5 - x) f(x) = 125 0 for 0 ≤ x ≤ 5 elsewhere . Find and sketch the cumulative distribution function F(x) . Answer F(x) = ∫ = x 6 125 0 ∫ [ x 0 ( t( 5 - t) dt 6 25 t- 3 25 t2 - 2 125 = 3 25 x2 - 2 125 = 1 125 = 6 125 t3 t 2 ) dt ] x 0 x3 x 2( 15 - 2 x) The cdf is then written as follows : 0 1 2 F( x) = 125 x ( 15 - 2 x) 1 A sketch of the cdf : for x < 0 for 0 ≤ x ≤ 5 for x > 5 Mathematics: Statistics (Higher) Teachers notes 37 for a ≤ x ≤ b f(x) A continuous random variable X has a pdf given by elsewhere . 0 The above definition of the cdf allows us to find the value of the median , m , by solving any one of the equations below : 1 2 F(m) = 1 2 b 1 . or P(X ≥ m) = ∫ f(x) dx = m 2 Quartiles can also be found by solving similar equations : l 1 • Lower quartile - solve ∫ f(x) dx = a 4 u 3 • Upper quartile - solve ∫ f(x) dx = . a 4 or P(X ≤ m) = ∫ m f(x) dx = a EXAMPLE The continuous random variable X has pdf given by : 18 (x + 2 ) for 1 ≤ x ≤ 3 f(x) = elsewhere . 0 Find (a) the median value of X (b) the interquartile range of X . Answer m 1 1 8 (a) The median value is given by ∫ ∫ m 1 1 (8 x + [ 1 16 ( 1 16 1 ) ( m2 + 4 m - 1 16 1 16 1 4 1 ) 1 13 16 ] m x1 .12 + 4 .1 = m2 + 4 m - 1 2 . 1 2 1 = 2 ) dx = 1 4 x2 + (x + 2 ) dx = 1 2 = 0 m 2 + 4m - 13 = 0 - 4 ± 68 ⇒ m = - 6.12 or 2.12 2 Hence the median value of X is m = 2.12 (since X is defined on the interval 1 ≤ x ≤ 3) . Using the quadratic formula , m = Mathematics: Statistics (Higher) Teachers notes 38 l (b) The lower quartile is given by ∫ 1 8 ( x + 2)dx = 1 l ∫ 1 8 ( x + 2)dx = 1 16 x 2 + 14 x 1 = 1 [ ] l 1 4 1 4 ( 161 l 2 + 14 l ) − ( 161 .12 + 14 .1) = 1 16 1 4 1 4 l 2 + 14 l − 169 = 0 l 2 + 4l − 9 = 0 − 4 ± 52 ⇒ l = −5.61 or 1.61 2 Hence the lower quartile of X is l = 1.61 (since X is defined on the interval 1 ≤ x ≤ 3). u 3 The upper quartile is given by ∫ 18 ( x + 2)dx = 4 1 Using the quadratic formula, l = l ∫ 1 8 ( x + 2)dx = 1 16 x 2 + 14 x 1 = 1 [ ] u 3 4 3 4 ( 161 u 2 + 14 u ) − ( 161 .12 + 14 .1) = 1 16 3 4 u 2 + 14 u − 17 16 = 0 u 2 + 4u − 17 = 0 − 4 ± 84 ⇒ u = −6.58 or 2.58 2 Hence the upper quartile of X is u = 2.58 (since X is defined on the interval 1 ≤ x ≤ 3). Using the quadratic formula, u = The interquartile range = 2.58 – 1.61 = 0.97 Now try Exercise 15 - The Cumulative Distribution Function followed by Exercise 16 - Continuous Probability Distributions (Miscellaneous Examples). Mathematics: Statistics (Higher) Teachers notes 39 CORRELATION & LINEAR REGRESSION Correlation When starting to work with bivariate data i.e. data involving two variables, it is always best to draw a scattergraph. The resulting scattergraph should give some indication of the presence of a linear relationship (in this course, we will be concerned only with linear relationships) between the variables and how strong this linear relationship might be. If two variables are related in this way, they are said to have a linear correlation. Consider the following examples. Diagram 1 Diagram 2 80 70 60 50 40 30 20 10 0 0 50 10 15 20 25 30 35 40 90 85 80 75 70 65 8 12 16 20 24 strong , positive linear correlation moderate , negative linear linearcorrelation correlation Diagram 3 Diagram 4 6 20 5 4 15 3 10 2 5 1 0 0 1 2 3 4 5 6 zero linear correlation 0 0 2 4 6 8 10 zero linear correlation The first diagram illustrates a strong positive linear correlation where both variables increase together. The second diagram illustrates a moderate negative linear correlation where one variable decreases as the other increases. In the third diagram, as one variable increases, there appears to be no clear pattern as to how the other variable behaves - this is an example of a zero linear correlation. The fourth diagram is also an example of zero linear correlation although there appears to be some non-linear (possibly quadratic) relationship between the variables. In mathematical terms, the strength of the association between the two variables is measured using a correlation coefficient. The most commonly used correlation Mathematics: Statistics (Higher) Teachers notes 40 coefficient is Pearson's Product Moment Correlation Coefficient, r, and is defined as follows: r = ∑ (x - x )(y - y ) ∑ (x - x ) ∑ (y - y ) 2 or 2 S xy . S xx S yy As the above form of the correlation coefficient can be difficult to calculate, we convert it to the more useful version shown below: r = 2 ∑ x ∑ xy (∑ x ) 2 n ∑x ∑y n 2 ∑ y (∑ y ) 2 n (Most graphic calculators will calculate the correlation coefficient.) The correlation coefficient, r, has the following properties: • • • -1 ≤ r ≤ 1 r > 0 positive correlation (Sxy positive) r < 0 negative correlation (Sxy negative) (Note that Sxx and Syy will always be positive) r = 1 perfect positive correlation r = -1 perfect negative correlation r = 0 zero correlation NB (a) Care needs to be taken when interpreting a correlation coefficient. For instance, a high level of correlation between variables A and B does not imply that A causes B or that B causes A. It may well be that a third variable C causes both A and B. Alternatively, the relationship between the variables may be coincidental - this is said to be an example of spurious correlation. (b) Any outliers within the data set can have a major effect on the value of the correlation coefficient. If it can be established that these outliers are incorrectly recorded data points then they may be removed from the data set and omitted from subsequent calculations. (c) Scattergraphs should be closely scrutinised before a correlation coefficient is calculated. Take care that a single correlation coefficient has not been calculated for data which are clearly separated into two or more distinct groups. Calculation of a single correlation coefficient would be inappropriate in such circumstances. Similarly, care must also be taken with data which appear as a single data set but, after more careful scrutiny, can be separated into more than one distinct group. For Mathematics: Statistics (Higher) Teachers notes 41 (d) example, different relationships may exist for males and females but these relationships may go undetected if the data are analysed as a single data set. Example Student IQ (x) maths score (y) A 112 B 106 C 127 D 102 E 134 F 128 G 98 H 109 I 115 J 123 53 62 75 41 70 68 47 76 63 71 (a) Plot a scattergraph for the above data. (b) Calculate the correlation coefficient and comment on the relationship between x and y. maths score Answer (a) 80 70 60 50 40 30 20 10 0 95 100 105 110 115 120 125 130 135 IQ (b) Totals IQ maths score x2 x y 112 53 12544 106 62 11236 127 75 16129 102 41 10404 134 70 17956 128 68 16384 98 47 9604 109 76 11881 115 63 13225 123 71 15129 1154 626 134492 y2 2809 3844 5625 1681 4900 4624 2209 5776 3969 5041 40478 xy 5936 6572 9525 4182 9380 8704 4606 8284 7245 8733 73167 The summary statistics are: n = 10, ∑ x = 1154, ∑ y = 626, ∑ x 2 = 134492, ∑ y 2 = 40478, ∑ xy = 73167 Mathematics: Statistics (Higher) Teachers notes 42 NB The summary statistics may be given in an examination. (∑ x ) 2 S xx = ∑x 2 n (∑ y ) = 134492 - 1154 2 = 1320.4 10 = 40478 - 626 2 = 1290.4 10 2 S yy = ∑ y S xy = r = 2 ∑ xy - n (∑ x )(∑ y ) n S xy 926.6 = S xx S yy 1154 x 626 = 926.6 10 = 73167 - 1320.4 x 1290.4 = 0.710 . This represents a moderately strong positive correlation. Now try Exercise 17 - Correlation. Linear Regression When a scattergraph suggests the presence of a linear correlation , it is useful to know the equation of the best fitting straight line. An attempt could be made to draw this best fitting line by eye and y = mx + c or y - b = m(x - a) used to determine its equation. Drawing a line by eye, however, can be an unreliable method, particularly if the data are reasonably well scattered. We now consider a method which will produce the equation of the best fitting straight line - the method of least squares. y • rn • (x2 , y2 ) • • r2 •, y • (xn n ) r3 ( x •, y εr1i • (x , y 1 3 1 3 ) ) 0 x For a set of bivariate data (x1, y1), (x2, y2), (x3, y3),…, (xn, yn), the equation of the best fitting line is of the form y = α + βx. The difference between a predicted y-value (using the equation) and its actual y-value is given by εi = (α + βxi) – yi. These εi are called residuals (or errors). To obtain the best fitting straight line the εi must be reduced as much as possible. Since the εi can be positive, negative or zero, we square them and proceed to 2 find values of α and β which minimise ∑ ε i . Mathematics: Statistics (Higher) Teachers notes 43 n Let Z = ∑ (α + βx i - y i ) 2 . i =1 Since Z has to be a minimum with respect to both α and β , we partially differentiate as follows : Treating β as a constant .......... n ∂Z = 2∑ (α + βxi - y i ) = 0 when Z is a minimum ∂α i =1 n ∑ n ∑ xi - + β i=1 i=1 n ∑y = 0 i i=1 nα + β n x - n y = 0 α + β x = y ..... equation (1) NB This tells us that the point ( x , y ) always lies on the best fitting line . Treating α as a constant .......... n ∂Z = 2∑ (α + β xi - y i )xi = 0 when Z is a minimum ∂β i=1 n n α ∑ xi + β ∑ xi2 - i=1 x x equation (1) 1 n αx + β : 1 β n Subtracting gives n ∑x 2 i = i=1 () ∑ x - (x ) αx + : ∑x y i i = 0 i=1 1 n 2 1 n x = xi y i ..... equation (2) ∑ i n i∑ n i=1 =1 Dividing through by n gives α x + β equation (2) i=1 n β x 2 n 2 2 i 1 n n ∑x y i i i=1 = xy = i=1 1 n n ∑x y i i - xy i=1 β S xx = S xy Substituting for β in equation (1) gives S xy β = α = y - βx . S xx . NB The above proof is beyond the scope of this course. The equation of the least squares regression line of y on x is given by y = α + βx Assuming that we only have a random sample, the values of α and β have to be estimated as follows: βˆ = b = S xy S xx = αˆ = a = y - b x ∑ x∑ y ∑ xy ∑x 2 - n (∑ x )2 , n . Mathematics: Statistics (Higher) Teachers notes 44 In line with current technology it is often the practice to use a for α̂ and b for β̂ so that y = a + bx is our estimate of y = α + βx NB (a) a and b are calculated from samples and so are estimates of the population parameters α and β. (b) Most graphic calculators will calculate estimates a and b. (c) Care needs to be taken when using the equation of the linear regression line for prediction purposes. Interpolation, prediction within the range of the data, is generally reliable, if the correlation is high. However, extrapolation, prediction outwith the range of data, should be avoided as an unjustified assumption is being made that the linear relationship extends outwith the range of data. (d) Any outliers within a data set can have a considerable effect on the equation of the regression line. If it can be established that the outliers are incorrectly recorded data points then they may be removed from the data set and omitted from subsequent calculations. (e) Scattergraphs should be closely scrutinised before the equation of a regression line is calculated. Take care that a single regression line is not being used to represent data which are clearly separated into two or more distinct groups. Calculation of a single regression equation would be inappropriate in such circumstances. For example, in data involving both males and females, separate regression lines, one for each of the sexes, may provide more reliable predictions. Example Student IQ (x) maths score (y) A 112 B 106 C 127 D 102 E 134 F 128 G 98 H 109 I 115 J 123 53 62 75 41 70 68 47 76 63 71 (a) Find the least squares regression line of y on x and draw it on a scattergraph. (b) Predict the maths score of a student with an IQ of 100. Mathematics: Statistics (Higher) Teachers notes 45 Answer (a) From the correlation example above we have the following summary statistics : n = 10 , b= ∑ x = 1154 , ∑ y = 626 , ∑ x ∑ xy ∑x ∑ x∑ y 2 - n (∑ x )2 n 2 = 134492 , ∑y 2 = 40478 , ∑ xy = 73167 . 1154 x 626 926.6 10 = 0.702 , = 2 1320.4 1154 134492 10 73167 = 626 926.6 1154 = - 18.383 . x 10 1320.4 10 The equation of the regression line of y on x is y = -18.383 + 0.702 x . a = y - bx = 80 70 60 50 maths score 40 30 20 10 0 95 100 105 110 115 120 125 130 135 IQ (b) A student with an IQ of 100 ⇒ x = 100. Prediction for his/her maths score is y = -18.383 + 0.72 x 100 = 51.817 ≈ 52% This should be a reliable prediction as we have been interpolating within the range of the data and the correlation is moderately high. Now try Exercise 18 - Linear Regression. Mathematics: Statistics (Higher) Teachers notes 46 STUDENT EXERCISES – PREVIOUS KNOWLEDGE Exercise 1 - Average/ Variability 1. Two sets of workers earn the following weekly wages. Set A Set B 96 80 98 90 100 100 102 105 105 102 108 110 98 120 Determine the mean and interquartile range for both groups. Comment on your findings. 2. The scores awarded by two judges, A and B, in an ice-skating competition were as follows: Judge A Judge B 9.7 8.8 9.5 9.0 9.7 9.1 9.8 9.2 9.9 8.8 9.6 9.8 9.0 9.4 9.1 9.8 9.1 9.1 9.3 9.3 For each judge find the mean score and the semi-interquartile range. How does the scoring of each judge compare? 3. To check the weight (in grams) of biscuits in a packet, a random sample of 10 packets was weighed. The contents were as follows: 198, 198, 200, 200, 201, 201, 201, 202, 202, 203 Find the mean weight and the standard deviation of the sample 4. In a physics experiment, a student measured the electrical resistance of a piece of wire. The same experiment is repeated 9 times. The results were as follows: 64.2 63.7 63.7 65.0 65.0 63.7 64.0 64.5 64.1 Calculate the mean and standard deviation of this sample. 5. The weights (kg) of five workers in a particular office block are given below. 69, 71, 73, 77, 82 (a) Calculate the mean and standard deviation of this sample. (b) The office lift has a maximum safe load of 1000 kg. Suggest a safe limit for the number of people in the lift. 6. For a random sample of five numbers Σx = 131 and Σx2 = 3451. Find the mean and standard deviation of this sample. 7. Given that for a certain random sample of five numbers Σx = 225 and Σx2 = 10165, find the mean and standard deviation of this sample. Mathematics: Statistics (Higher) Students exercises 1 8. For a set of ten numbers Σx = 306 and Σx2 = 9746. Find the mean and sample standard deviation. 9. The mean of 5, 9, 10, y, 14 is 10. Find the standard deviation of this sample of numbers. 10. The eight forwards in a rugby team weigh (kg) 95, 90, 100, 85, y, 82, 102, and 100. If the mean is 93.125 kg what is the standard deviation of this sample of forwards? Mathematics: Statistics (Higher) Students exercises 2 Exercise 2 - Exploratory Data Analysis (EDA) 1. The weights (in kilograms) of 11 members of a football team are given below. 59 65 68 68 75 72 81 79 75 75 72 (a) Draw a dot-plot of this data. (b) What is the median and mode of this data? 2. The weights of 20 newly born babies are given below. 2.7 2.2 2.8 2.8 3.0 3.7 3.1 3.5 3.1 4.2 3.2 4.0 3.3 3.6 3.4 3.4 3.4 3.7 3.5 3.9 (a) Draw a dot plot of this data. (b) Find the median and modal baby weight. 3. The times in seconds for 15 people to complete a jig-saw are: 64 53 76 51 48 83 53 67 68 64 56 55 74 45 60 (a) Draw a stem and leaf diagram using this data. (b) Find the median time. (c) What is the range of times? 4. A group of children were given a reading test when they were age 7 and another at age 8. Their scores are given below. Age 7 Age 8 376 332 341 369 350 332 388 385 326 298 350 356 304 323 361 397 328 337 383 404 310 366 335 415 326 314 392 370 328 315 374 422 342 290 426 400 294 311 381 399 (a) Draw a back to back stem and leaf diagram to illustrate this data. (b) Find the median, mode and quartiles for each set of data. (c) What conclusions, if any, can you draw? 5. The heart beat of 28 men and 28 women (at rest) was measured as part of an experiment to measure the effect of exercise. The rates are given below. Men 64 68 60 76 58 62 62 62 62 76 66 72 66 90 66 60 64 80 74 62 74 92 70 70 Mathematics: Statistics (Higher) Students exercises 84 68 74 68 3 Women 61 66 80 86 64 84 82 76 94 62 76 82 60 66 87 88 72 78 90 72 58 68 78 86 88 72 68 67 (a) Draw a back to back stem and leaf diagram to illustrate this data. (b) Find the median, mode and quartiles for each set of data. (c) What conclusions, if any, can you draw about the heart beat of men and women? 6. The reaction times of a class of students was measured. The results were as follows (in tenths of a second). Median Lower Quartile 8 7 10 10 Girls Boys Upper Quartile Min. Max. 15 13 6 4 19 16 (a) Draw two box plots to compare the reaction times of the boys and the girls. (b) Comment on the relative reaction times of the boys and the girls. 7. A random sample of 30 people were asked to record, to the nearest mile, the distances that they travelled by car in one week. The results were as follows: 41 69 88 63 61 38 89 49 39 41 54 85 37 59 60 67 61 61 69 70 80 57 61 45 78 55 64 84 63 72 (a) Construct a stem and leaf diagram to represent this data. (b) From the stem and leaf diagram identify the median and the quartiles of this data. (c) Draw a box plot to represent the data. 8. The temperatures (°F) during the month of September were as follows: 96 81 81 77 (a) (b) (c) (d) 77 79 79 73 73 73 73 72 72 72 70 70 70 70 66 68 70 68 69 68 56 61 63 64 64 64 63 61 For this data draw a stem and leaf diagram. Hence obtain a box plot for this distribution. Use the box plot to identify any ‘outliers’, if any exist. Comment on your results. Mathematics: Statistics (Higher) Students exercises 4 9. A local Health Board compiled the following data relating to the length of stay of patients in hospital. A random sample of 21 patients yielded the following data on length of stay in days. 4 3 10 (a) (b) (c) (d) 4 6 13 12 15 5 18 7 7 9 3 1 6 55 23 12 1 9 Determine the interquartile range. Obtain the five number summary. Identify any possible outliers. Construct and interpret a box plot. 10. As part of an evaluation programme, a college gave a sample of students a nonverbal reasoning test. The scores of 25 randomly selected students are given below. 91 102 95 88 (a) (b) (c) (d) 96 96 111 129 106 124 105 112 116 115 101 82 97 121 86 104 118 127 102 66 98 Determine the interquartile range. Obtain the five number summary. Identify any possible outliers. Construct and interpret a box plot. 11. As part of an experiment in Chemistry, a class was asked to measure the time, in seconds, taken for a particular reaction to be completed. The results were 51 35 18 45 85 27 43 62 31 97 20 16 22 18 51 23 57 34 49 35 22 (a) Draw the box plot and use it to identify any possible outliers. (b) Comment on your results. Mathematics: Statistics (Higher) Students exercises 5 12. The data below gives the amount, in pounds, spent each day by 20 holiday makers. 43 67 68 (a) (b) (c) (d) 39 61 93 65 58 62 72 60 71 49 51 51 63 47 62 70 52 Determine the interquartile range. Obtain the five number summary. Identify any possible outliers. Construct and interpret a box plot. 13. The number of hours spent studying each week by 20 students is given below. 10 12 12 (a) (b) (c) (d) 12 20 13 1 28 14 14 15 12 14 10 9 15 13 14 11 14 Determine the interquartile range. Obtain the five number summary. Identify any possible outliers. Construct and interpret a box plot. Mathematics: Statistics (Higher) Students exercises 6 Exercise 3 - Interpreting an EDA 1. Researchers who wished to find out if there was a relationship between death and age obtained data from fifteen western countries. Their results are summarised below in two boxplots. 30 25 20 15 10 5 Age 20 -25 Age 60 -65 Each boxplot shows the percentage of the population in the fifteen countries who die between the ages 20 -25 and 60-65. In what ways do the death rates differ between the two age groups? 2. In an attempt to compare the effectiveness of two different types of soil, identical plants (azaleas) were grown and their heights measured after four weeks. The resulting data was used to construct the boxplots below. 12 * 10 8 6 4 Soil A Soil B (a) From the evidence of the boxplots, is there any difference between the two soils? Give reasons for your answer. (b) Suggest an explanation for the outlier marked. Mathematics: Statistics (Higher) Students exercises 7 3. Twenty five thirteen year old boys and girls took part in an experiment which measured the time taken to complete a sorting task. Each student was blind folded and asked to place each of four shapes in its correct place in a box. The time taken, in seconds, was recorded. The resulting data is presented below in a stem and leaf diagram. Boys Girls 8 9 9 9 8 6 6 5 3 2 2 1 9 6 5 5 4 4 3 2 1 5 1 2 0 3 4 means 34 seconds 1 3 6 7 2 0 1 2 2 3 5 6 7 7 7 8 8 3 0 0 0 0 1 1 3 4 5 7 4 5 (a) Find the median and the lower and upper quartiles for each group. (b) Is there any difference in the performance between the groups? Justify your answer. 4. A major retail company records the number of sales of two different computers over a period of twenty weeks. The sales of Type A and Type B are shown below in a back to back stem and leaf diagram. Type A Type B 9 8 9 7 5 8 6 5 6 5 5 9 5 5 3 5 5 2 2 3 3 4 5 6 7 0 9 0 1 1 6 4 2 3 6 4 4 6 5 8 5 9 8 9 9 56 means 56 Comment on any difference in the sales of the two types computer. Mathematics: Statistics (Higher) Students exercises 8 5. A pharmaceutical company compared the weights (grams) of two groups of mice in an experiment into the effectiveness of a weight control drug. The weights of Group A and Group B are shown below in a back to back stem and leaf diagram. Group A Group B 9 7 9 7 9 8 6 5 7 6 6 4 2 6 1 4 5 3 1 2 1 1 0 4 5 6 7 8 9 8 7 1 1 5 2 1 6 2 1 6 5 3 5 3 5 5 5 6 8 7 9 8 67 MEANS 67G IF GROUP A RECEIVED THE DRUG WHILE GROUP B RECEIVED A PLACEBO, COMMENT ON THE EFFECTIVENESS OF THIS DRUG IN REDUCING WEIGHT. Mathematics: Statistics (Higher) Students exercises 9 PROBABILITY Exercise 1 - Simple Probability 1. One card is drawn at random from a standard pack of 52 playing cards. Find the probability of drawing (a) the two of diamonds (b) a Jack (c) a black card (d) a face card. 2. A fair die, numbered 1 to 6, is thrown once. Find the probability of obtaining (a) a three (b) an odd number (c) a number greater than two (d) a one or a four. 3. Unbiased 50p and 20p coins are tossed at the same time. Find the probability of obtaining (a) two tails (b) a head and a tail. 4. A box contains 10 coloured pencils, 6 red and 4 blue. (a) Find the probability of selecting at random (i) a red pencil (ii) a blue pencil. (b) One red pencil is removed from the box. Find the new probability of selecting at random (i) a red pencil (ii) a blue pencil. 5. One letter is selected at random from the word MISSISSIPPI. Find the probability of selecting the letter(s) (a) M (b) P (c) I (d) M or S . 6. A game has a regular pentagonal spinner with faces numbered from 1 to 5. When the spinner is spun, what is the probability of obtaining (a) a number 5 (b) a number 1 (c) an even number (d) a number less than 4 (e) a number greater than 4 ? 7. In a class of 32 children, 18 picked Art as their favourite school subject, 8 picked Science and the rest picked PE. What is the probability that a child, chosen at random from the class, picked PE as their favourite subject ? Mathematics: Statistics (Higher) Students exercises 10 8. In a game of Scrabble, I have the following eight letters on my rack: AE O O D P S V As only seven letters are required, one of the other players removes one letter without looking at it . What is the probability that she takes: (a) the A (b) an O (c) a vowel (d) not a D (e) P , S or V ? 9. Ahmed has a bag of 20 marbles. 7 of the marbles are red. He selects a marble at random from the bag. What is the probability that (a) he gets a red marble? (b) he gets a marble which is not red? 10. The distribution of pupils by age in a secondary school at the start of a new session is given below. Age in years 11 Frequency 73 12 116 13 123 14 128 15 106 16 99 17 90 18 45 A pupil is chosen at random from the school roll. Find the probability that the pupil is (a) 14 years old (b) less than 16 years old . 11. On a particular day the output of eggs on two poultry farms was compared. Eggs were graded as large, standard or small. Farm A Farm B Large 81 129 Standard 243 215 Small 126 86 An egg is chosen at random from the entire output. Find the probability that the egg will be (a) graded small (b) from Farm A (c) from Farm B and graded standard. Mathematics: Statistics (Higher) Students exercises 11 Exercise 2 - Sample Spaces & Further Simple Probability 1. Ruth goes for a meal with her friends to the local burger bar. To drink she can have either juice (J), milk (M) or tea (T). For eating she can have a chickenburger (C), a beefburger (B) or a veggieburger (V). Make a list of all the possible outcomes for her choice of meal. 2. A shop sells three flavours of ice cream , vanilla (V) , strawberry (S) and chocolate (C). Cones are sold with a topping of raspberry (R) or mint sauce (M). Cones are also available without a topping (W). Make a list of all possible types of ice cream cone . 3. Three girls, Ann, Mary and Linda, decide to have a swimming competition. They will do the crawl and then the backstroke. If Ann wins the crawl and Mary wins the backstroke, record this outcome as AM. (a) In a similar way, make a list of all the 9 possible outcomes. (b) If only Ann and Mary take part in the competition there will be fewer possible outcomes. List the outcomes in this case. (c) If Judith also takes part in the competition, list all the possible outcomes for the four competitors. 4. Packets of crisps contain free football team pictures. There are five different pictures: Arsenal (A), Manchester United (M), Liverpool (L), Blackburn Rovers (B) and Newcastle (N). A mother buys two packets of crisps for her children. List all the possible combinations of the cards when the packets are opened. 5. At a school Prizegiving three different sorts of prizes are given out. One is a Book Token (B), one is a CD voucher (C) and the other a general gift voucher (G). List all the possible outcomes for a girl who wins (a) two prizes (b) three prizes. 6. A fair die is rolled twice. List the sample space for this situation. Find the probability of obtaining: (a) a total of 6 from the two throws (b) a total of 10 from the two throws (c) a total between 7 and 11 , inclusive from the two throws (d) a number on the second throw which is three times the number on the first throw (e) a number on the first throw which is double the number on the second throw . Mathematics: Statistics (Higher) Students exercises 12 7. A (regular) pentagonal spinner, numbered 1 to 5, is spun and a fair die is thrown at the same time. List the sample space for this situation. Find the probability of obtaining: (a) a total of 4 (b) a total less than 7 (c) a total greater than 10 (d) the same number on the die and the spinner (e) a win, if a win occurs when the number on the spinner is greater than or equal to the number on the die. 8. Three unbiased coins are tossed at the same time. List the sample space for this situation. Find the probability of obtaining: (a) three heads (b) no heads (c) one head and two tails (d) at least one tail. 9. Four unbiased coins are tossed at the same time. List the sample space for this situation. Find the probability of obtaining: (a) two heads and two tails (b) four heads (c) at least one head (d) one head and three tails. 10. (a) How many members of the sample space are there when five unbiased coins are tossed together ? (b) What is the probability of tossing five tails ? Mathematics: Statistics (Higher) Students exercises 13 Exercise 3 - Mutually Exclusive & Exhaustive Events 1. When a card is selected from a pack of cards , the following outcomes can be used to describe the result. P Q R S : : : : A red card is obtained A black card is obtained A diamond is obtained A club is obtained (a) Write down pairs of outcomes which are mutually exclusive. (b) Write down pairs of outcomes which are not mutually exclusive. 2. Which of the following pairs of events are mutually exclusive and/or exhaustive: (a) X Y (b) X Y (c) X Y (d) X Y (e) X Y : : : : : : : : : : obtaining an even number on the roll of a die obtaining an odd number on the roll of a die selecting a heart from a pack of playing cards selecting a queen from a pack of playing cards winning a football match losing a football match a dog has two pups - both are black a dog has two pups - one is female it rains tomorrow it is sunny tomorrow 3. When Annie , Fatima and Kate play a computer game, the probability that Annie wins is ½ and the probability that Fatima wins is 3 8 . What is the probability that they lose ? 4. When Motherwell football team play in a league match the probability that they win is 0.3 and the probability that they draw is 0.4. What is the probability that they lose ? 5. S A 5 4 3 1 8 7 C 2 6 B Mathematics: Statistics (Higher) Students exercises 9 10 14 (a) (b) (c) (d) (e) The sample space S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, A = {2, 4, 6, 8}, B = {6, 7, 8} and C = {1, 2}. Find P(A), P(B) and P(C). Find P( A ), P( B ) and P( C ). Find P( A ∪ B ), P( A ∪ C ) and P( B ∪ C ). Which pair of events are mutually exclusive ? Are events A, B and C exhaustive ? 6. For events A and B, P(A) = 0.43 , P(B) = 0.18 and P(A or B) = 0.5. (a) Are events A and B mutually exclusive ? (b) Are events A and B exhaustive ? 7. An integer is chosen at random from the set of integers from 1 to 25 inclusive. What is the probability that the integer is (a) at least 5 (b) a prime number greater than 11 ? 8. A large bag of crisps contains the following flavours: 10 cheese and onion, 6 salt and vinegar and 4 plain. A packet of crisps is selected from the bag at random. Find the probability that the packet of crisps is: (a) cheese and onion (b) not plain (c) either cheese and onion or plain (d) either salt and vinegar or plain (e) neither cheese and onion nor salt and vinegar . 9. A box contains a 100 balls of different colours. The probability of obtaining a ball of a particular colour is given in the table below. Colour Black White Red Blue Probability 0.33 0.41 0.19 0.07 What is the probability that a ball taken from the bag is: (a) black or white (b) neither red nor blue ? 10. A spinner with unequal sectors is numbered 1, 2, 3, 4, 5. The probability of obtaining each number is shown in the table below: Number Probability 1 2 3 4 5 0.32 0.05 0.2 0.15 0.28 (a) The spinner is spun once. What is the probability of getting i) either of the numbers 2 or 4 ii) a number less than 4 ? Mathematics: Statistics (Higher) Students exercises 15 (b) The spinner is spun 200 times. Approximately how many times would you expect to get i) the number 3 ii) a number greater than 3 ? 11. Two dice are thrown together. Find the probability of getting (b) a total of 10 or 11 (b) a total less than 6. 12. Two dice are thrown together and the difference between the two numbers is recorded. So if the dice show a '2' and a '6', the difference recorded would be 4. Find the probability of obtaining a difference of 2 or 4. 13. In an opinion poll, people were asked to state the party for which they were going to vote. The probability that people vote Labour or Liberal Democrat is 0.6. The probability that people vote SNP is 0.3. (a) What is the probability that people vote for another party ? (b) The probability of voting Labour is twice the probability of voting Liberal Democrat. What is the probability that people vote SNP or Liberal Democrat ? 14. Every day Mrs Scott buys only one of the following newspapers: The Daily Herald, The Moon or The Daily Post. The probability of her buying The Daily Herald or The Moon is 56 . The probability of her buying The Moon or The Daily Post is 12 . Find the probability of her buying each newspaper. Mathematics: Statistics (Higher) Students exercises 16 Exercise 4 - Independent Events 1. For each pair of events X and Y listed below, decide whether or not it is likely that the events are independent. (a) X : The sun is shining today. Y : The sun will be shining tomorrow. (b) X : It rains on Saturday this week. Y : It rains on Saturday next week. (c) A baby is born. X : The baby has blue eyes. Y : The baby's mother has blue eyes. (d) A die is rolled and a coin is tossed. X : The die produces a 4. Y : The coin produces a tail. (e) Alice and Karen are sisters. X : Alice catches the cold. Y : Karen catches the cold. (f) X : David is good at Art. Y : David is good at Physics. 2. (a) For events A and B, P(A) = 0.3 and P(B) = 0.7. If A and B are independent find P( A ∩ B ). (b) For two events A and B, P(A) = 0.4 and P( A ∩ B ) = 0.3. Given that A and B are independent, find P(B). 3. Two fair dice, each numbered 1 to 6, are rolled. Events A, B and C are defined as follows: A : The first die scores 1 B : The second die has an even score C : Their total is 3 Which of the above events are independent ? 4. A coin is tossed and a die is thrown. A is the event that a tail is obtained on the coin and B is the event that an even number is thrown on the die. Write down the values of P(A), P(B) and P( A ∩ B ). 5. A card is drawn from a pack of playing cards and a die is thrown. Events X and Y are as follows: X - an ace is drawn from the pack Y - a prime number is thrown on the die . Write down the values of P(X), P(Y) and P( X ∩ Y ). 6. A die is thrown twice. Find the probability that (a) two even numbers are obtained (b) the same two numbers are obtained . Mathematics: Statistics (Higher) Students exercises 17 7. A pentagonal spinner has five equal sections; three sections are coloured white and two sections are coloured black. B is the event that the spinner points to Black and W is the event that the spinner points to White. (a) Find the following probabilities : i) P(B) ii) P(W) (b) The spinner is spun twice. Using your answers to (a) find the probabilities of the following outcomes: i) Black is obtained both times. ii) A different colour is obtained on each spin. iii) The same colour is obtained on each spin. 8. A box contains 8 blue cubes and 2 red cubes. A cube is taken out and replaced. B is the event that a Blue cube is selected. R is the event that a Red cube is selected. (a) Find the following probabilities i) P(B) ii) P(R) (b) If two cubes are taken in turn, use your answers to (a) to find the probabilities of the following outcomes: i) they are both blue ii) they are different colours iii) they are the same colour. 9. In a school 5% of the children have red hair and 25% wear glasses. If a child is selected at random, what are the probabilities that they: (a) have red hair and wear glasses (b) have red hair but do not wear glasses. 10. Five questions in an examination are multiple choice in format with five possible answers for each question. What is the probability of guessing the correct answers to all five questions ? Mathematics: Statistics (Higher) Students exercises 18 Exercise 5 - Tree Diagrams (With Replacement) 1. Ben and Julie are playing a game. Before they can start they must select a red card from a standard pack of playing cards. After a card is selected it is returned to the pack and the pack shuffled before the next player selects their card. Ben starts first. (a) Copy and complete the tree diagram by adding the appropriate probabilities to each branch. (b) Calculate the probability of each outcome shown on the tree diagram. Ben's turn Julie's turn Outcome Red and Red Red Red and Black Black and Red Black Black and Black (c) Find the probability that: i) both children start the game on their first selection ii) only one of them starts the game on their first selection iii) neither of them starts the game on their first selection. 2. A bag contains 10 marbles, 7 are red and 3 blue. A marble is selected, and then replaced. A second marble is then selected. (a) Copy and complete the tree diagram by adding the appropriate probabilities to each branch. (b) Calculate the probability of each outcome shown on the tree diagram. First marble Second marble Outcome Red and Red Red Red and Blue Blue and Red Blue Blue and Blue (c) Find the probability of the following: i) both marbles are blue ii) both marbles are red. Mathematics: Statistics (Higher) Students exercises 19 3. To pass your driving test you must pass a theory test and a practical driving test. The probability of passing the theory test is 0.85 and the probability of passing the practical test is 0.65. (a) Copy and complete the tree diagram below. Theory Practical Outcome Pass and Pass Pass Pass and Fail Fail and Pass Fail Fail and Fail (b) What is the probability that someone: i) passes both tests ii) fails both tests iii) passes only one of the tests ? 4. A bag contains 8 red balls and 4 green balls. A ball is drawn and then replaced before a second ball is drawn. Draw a tree diagram to show all the possible outcomes. Find the probability that: (a) two green balls are drawn (b) the first ball is red and the second is green. 5. Draw a tree diagram to show the possible outcomes when two coins are tossed. Include the probabilities on your tree diagram. Find the probability of obtaining: (a) two heads (b) no heads (c) only one head . 6. The probability of a footballer being injured before a match is 0.2. Brian has two important games in one week. Find the probability that he is able to play (a) in both games (b) in only one game (c) in neither of the games. 7. Tariq and Ahmed play two sets of tennis together. The probability that Tariq wins a set is 0.55. Find the probability that: (a) Tariq wins two sets (b) Ahmed wins two sets (c) they win one set each. Mathematics: Statistics (Higher) Students exercises 20 8. (a) Draw a tree diagram to show the possible outcomes when a coin is tossed three times . (b) Find the probability of obtaining : i) 3 tails ii) at least 2 heads iii) exactly one head. 9. A fair die is thrown three times. What is the probability of throwing: (a) 3 sixes (b) exactly two sixes (c) at least two sixes ? 10. When a seed is planted the probability that it will grow is 0.6. Three seeds are planted. Find the probability that (a) all three grow (b) one of them grows . 11. A bag contains 7 black and 3 white marbles. A marble is drawn at random and then replaced. Two further draws are made, again with replacement. Find the probability of drawing: (a) three black marbles (b) two white marbles (c) at least two black marbles (d) at least one white marble . 12. A card is drawn at random from a pack of 52 playing cards. The card is replaced and a second card is drawn. This card is replaced and a third card is drawn. What is the probability of drawing : (a) three spades (b) at least two spades (c) exactly one spade ? Mathematics: Statistics (Higher) Students exercises 21 Exercise 6 - Tree Diagrams (Without Replacement) 1. A bag contains 4 red discs and 4 green discs. Two discs are taken out of the bag. Draw a tree diagram to illustrate the above probabilities. (a) Calculate the probability that both discs are green. (b) Calculate the probability that the discs are different colours. 2. Bill's hobby is walking. If it is a sunny day, the probability that he goes for a walk is 0.95. If the day is not sunny, the probability that he goes for a walk is 0.75. The probability that tomorrow will be sunny day is 0.7. (a) Draw a tree diagram to illustrate these probabilities. (b) Calculate the probability that Bill will go for a walk tomorrow. 3. Jack and Arnold play three rounds of golf. The probability that Jack wins the first round is 0.5. If Jack wins a round, the probability of his winning the next round is 0.8. If Arnold wins a round, the probability of his winning the next round is 0.7. (a) Draw a tree diagram to illustrate the above probabilities. (b) Calculate the probability that Arnold wins all three rounds. (c) Calculate the probability that Jack wins less than two rounds. 4. The probability that Lee will pass his driving test at the first attempt is 0.45. If he fails his first test then the probability that he will pass the test on any subsequent attempt is 0.8. (a) Draw a tree diagram to illustrate these probabilities. (b) Calculate the probability that Lee passes on his second attempt. (c) Calculate the probability that he passes the test after three attempts . 5. A card is drawn at random from a standard pack of 52 playing cards. It is not replaced. A second card is then drawn from the same pack. Find the probability that (a) both cards are hearts (b) only one card is a spade (c) both cards are aces (d) neither card is a jack . 6. The probability that a tennis player wins her next match is 0.3 if she won her previous match, but 0.8 if she lost her previous match. Find the probability that, if a match is lost, the next two will be won. 7. A bag contains 10 pairs of socks - 5 blue pairs and 5 green pairs. Two socks are taken from the bag at random. What is the probability that a colour match is obtained ? 8. A box contains 4 blue marbles, 3 green marbles and 3 yellow marbles. Two marbles are chosen at random from the box. What is the probability that: (a) both marbles are green (b) both marbles are the same colour (c) both marbles are different colours ? Mathematics: Statistics (Higher) Students exercises 22 9. There are 5 boys and 10 girls in a Higher Statistics class. Two pupils are chosen at random. What is the probability that: (a) both are boys (b) both are girls (c) one is a boy and one is a girl ? 10. In a box of 200 electrical components 5 are known to be defective. Two components are chosen at random. What is the probability that: (a) both are defective (b) neither are defective (c) just one is defective ? 11. As part of a card trick a magician asks a member of his audience to select two cards at random from a pack of 32 cards. His pack consists of 8 black cards, 8 white cards, 8 blue cards and 8 green cards. Find the probabilities that a player selects 2 cards of the same colour. 12. In a batch of 500 packets of cereal 50 are known to be underweight. Three packets of cereal are chosen at random. What is the probability that: (a) all three are underweight (b) none are underweight ? 13. The box of chocolates contains 8 soft centres, 5 hard centres and 7 nutty centres. 3 chocolates are chosen at random. Find the probability that: (a) no hard centres were chosen (b) the 3 chocolates had the same centres (c) one of each kind was chosen. 14. A bag contains 3 black cubes, 4 yellow cubes and 3 white cubes. Three cubes are chosen without replacement. Find the probability that the three cubes chosen are: (a) all yellow (b) all white (c) one of each colour If this selection process was repeated 6000 times, how often would you expect to choose three black cubes ? 15. There are 6 boys and 9 girls in a class. Three children are chosen at random. What is the probability that: (a) all three are boys (b) all three are girls (c) one is a girl and two are boys ? 16. On a snooker table pocket X contains 3 colours and 6 reds. Pocket Y contains 2 colours and 4 reds. A ball is taken at random from pocket X and placed in pocket Y. A ball is then chosen from pocket Y. What is the probability that the ball taken from Y is red ? Mathematics: Statistics (Higher) Students exercises 23 17. A student has only managed to revise 50% of the topics for a multiple choice exam. If a question appears on a topic she has revised she will get that question right. If she has not revised the topic then she will make a guess at one of the five possible answers. The paper has 40 questions. What mark do you expect her to get ? 18. A box of assorted chocolates contains p dark chocolates and q white chocolates. Two chocolates are selected at random. Find, in terms of p and q, the probability of choosing: (a) two dark chocolates (b) two white chocolates (c) one of each sort. Mathematics: Statistics (Higher) Students exercises 24 Exercise 7 - Combinations 1. Evaluate (a) 7 C 3 (b) 6 C4 (c) 8 C1 (d) 5 C0 2. Verify that n n = n - r r for the cases (a) n = 9 , r = 3 (b) n = 7 , r = 4 . 3. A shop stocks 8 different kinds of cereal. In how many ways can 3 packets of cereal, each of a different variety, be chosen ? 4. How many different combinations of five letters can be chosen from the letters A, B, C, D, E, F, G, H if each letter is chosen only once ? 5. In how many ways can (a) 3 stamps be chosen from a book of 10 different stamps (b) a team of 14 players be selected from a pool of 17 footballers (c) 6 representatives be chosen from 30 students (d) a hand of 5 cards be dealt from a standard pack of 52 cards ? 6. Find the number of different combinations of two letters which can be made from the letters of the word INTEGRAL. How many of these selections do not contain a vowel ? 7. How many different hands of seven cards can be dealt from a suit of thirteen cards? If one of the cards dealt is the ace, how many different hands of seven cards are there ? 8. A team of five children is to be selected from a class of thirty children to compete in an inter-school quiz competition. In how many ways can the team be chosen if (a) any five children can be chosen (b) the five chosen must include the oldest in the class ? 9. A debating team of 4 players is to be selected from 12 pupils. In how many ways can the team be chosen if (a) the best debater is to be included (b) the best debater and the oldest pupil are to be included ? 10. A shop stocks eight different kinds of chocolate biscuits. In how many ways can a shopper buy four packets of chocolate biscuits if (a) each packet is a different kind (b) two packets are the same kind ? Mathematics: Statistics (Higher) Students exercises 25 11. A large box of chocolates contains nine different varieties. In how many ways can four chocolates be chosen if (a) all four are different varieties (b) two are the same and the others different (c) three are the same - and the fourth is different ? 12. A committee of 8 is to be formed from 12 men and 8 women. In how many ways can the committee be selected given that (a) it must consist of 5 men and 3 women (b) it must have at least one member of each sex ? Mathematics: Statistics (Higher) Students exercises 26 Exercise 8 - Combinations (Probability) 1. There are only 3 girls in a group of 8 pupils. A group of 5 pupils is to be selected. Find the probability that all three girls in the group are selected. 2. A bag contains four black discs and one white disc. If two discs are removed at random, what is the probability that the white disc is not removed ? 3. An exam consists of selecting 4 questions from a choice of 8 questions. The questions are numbered 1, 2, 3, 4, 5, 6, 7 and 8. Assuming the questions are selected at random, find the probability that a pupil's selection will include two even numbered questions. 4. A box contains 10 cubes of which 4 are green and 6 are yellow. If 4 cubes are selected at random, find the probability that 2 green and 2 yellow cubes are selected. 5. Four letters are chosen at random from the word COMPLEX. Find the probability that both vowels are in the group chosen. 6. A random sample of five children is chosen from a class of 8 girls and 12 boys. What is the probability that the sample contains (a) all boys (b) at least one girl ? 7. A bag contains 8 red sweets, 5 yellow sweets and 3 green sweets. Two sweets are selected. What is the probability that two sweets chosen at random are both red ? 8. Four cards are chosen at random from an ordinary pack of 52 playing cards. What is the probability that the four cards (a) are all black (b) are all kings (c) contain at least one king ? 9. There are 5 green, 4 yellow and 3 blue discs in a bag from which 4 discs are chosen at random. Find the probability that the 4 discs selected will contain (a) exactly 3 blue discs (b) exactly 3 yellow discs (c) at least one green disc. 10. From a well shuffled pack of 52 cards a hand of 7 cards is dealt. Find the probability that the hand will contain (a) 4 aces (b) exactly 3 aces (c) at least 3 aces. Mathematics: Statistics (Higher) Students exercises 27 11. A hand of 6 cards is dealt from a shuffled pack of 52 cards. Find the probability that the hand will contain (a) all black cards (b) exactly 5 black cards (c) at least 5 black cards. 12. Three cards are dealt from a well shuffled pack of eight cards . The cards are numbered 1, 2, 3, 4, 5, 6, 7, 8. Find the probability that (a) the three cards are all even (b) the product of the numbers drawn is odd. Mathematics: Statistics (Higher) Students exercises 28 Exercise 9 - Simulation In questions 1 - 4 use the list of random numbers below. 37057 33724 43737 16929 10131 83986 28633 15929 84478 98571 98419 85953 19659 31341 20877 76401 82213 52804 60265 34585 15412 07827 72335 19404 22353 68418 48740 25208 27881 54505 1. Simulate the results of tossing a coin 20 times. Start at the eleventh number on the first row and work to the right. 2. Simulate the results of rolling an unbiased die 10 times. Start at the first number on the third row and work to the right. 3. Simulate the selection of six numbers, from the numbers 1 - 49, for the national lottery. Start at the sixteenth number on the first row and work to the right. 4. In a school there are 78 pupils in S5. Simulate the selection of 6 S5 pupils. Start at the first number on the second row and work to the right. In questions 5 - 8 use the calculator generated list of random numbers below. 0.925 0.312 0.240 0.017 0.118 0.930 0.622 0.817 0.617 0.334 0.043 0.086 0.853 0.012 0.451 0.674 0.881 0.982 0.807 0.455 0.114 0.997 0.374 0.696 0.989 0.798 0.124 0.492 0.773 0.805 0.670 0.198 0.597 0.701 0.700 0.552 0.450 0.404 0.464 0.868 0.985 0.398 0.606 0.882 0.544 0.338 0.467 0.229 0.925 0.257 0.633 0.117 0.077 0.371 0.638 0.219 0.286 0.628 0.624 0.717 5. In a card game, a hand of 7 cards is dealt from a standard pack of 52 cards. Simulate the 7 cards dealt from the pack. 6. Simulate the selection of 10 dates from any non - leap year. 7. A driver, approaching a roundabout from the North, is equally likely to go South, East or West. Simulate the directions taken by 10 drivers. 8. Simulate the results of rolling a biased die where P(3) = 0.5. Mathematics: Statistics (Higher) Students exercises 29 Exercise 10 - Discrete Probability Distributions 1. Which of the following could describe discrete probability distributions ? Find the value of k when a probability distribution is defined. u 0 1 2 3 4 P(U = u) 1 3 1 6 k 1 6 1 4 (a) v -1 0 1 P(V = v) 0.35 k 0.55 w 2 3 4 5 P(W = w) k 2 3 1 4 1 6 (b) (c) 2. A discrete random variable X has probability distribution: Find x 1 2 3 4 5 P(X = x) k 2k 3k 4k 5k (b) P(X ≤ 4) (a) the value of constant k 3. A discrete random variable Y has probability distribution : Find y 1 2 3 4 5 P(Y = y) 0.22 0.35 k 0.07 0.29 (a) the value of constant k (b) P(3 ≤ Y ≤ 5) (c) P(Y ≥ 2) 4. A discrete random variable S has probability distribution : Find s 1 2 3 4 P(S = s) 1 3 1 4 k 1 6 (a) the value of constant k Mathematics: Statistics (Higher) Students exercises (b) P(2 ≤ S < 4) (c) P(S < 3) 30 5. A discrete random variable T has probability distribution : Find t 0 1 2 3 P(T = t) k 6 k 4 k 3 k 2 (a) the value of constant k (b) P(T ≥ 2) (c) P(0 < T < 4) . 6. The probability function of a discrete random variable is given by: P(X = x) = kx , x = 1 , 2 , 3 , 4 . (a) Tabulate the probability distribution of X and find the value of the constant k. (b) Find P(X < 3). 7. The probability function of a discrete random variable is given by: 1 P(Y = y) = ky , y = 1 , 2 , 3 , 4 , 5 . 5 (a) Tabulate the probability distribution of Y and find the value of the constant k. (b) Find P(2 < Y ≤ 5). 8. The discrete random variable S has the probability function given by: P(S = s) = k(7 - s) , s = 0 , 1 , 2 , 3 , 4 . Find (a) the value of the constant k (b) P(1 < S ≤ 4) . 9. The random variable X has the following probability distribution : x 2 6 10 P(X = x) p 0.25 q where p and q are constants . Given that P(X < 5) = P(X > 5) and P(X ≤ 6) = 3P(X > 6) find the values of p and q . Mathematics: Statistics (Higher) Students exercises 31 10. Find the probability distribution for each of the following random variables: (a) H , the number of heads obtained when two fair coins are tossed. (b) S , the number of sixes obtained when two normal dice are rolled. (c) 30% of a population have blue eyes. Two people are selected at random. Find the probability distribution of B, the number of people with blue eyes. (d) T , the sum of the scores when two normal dice are rolled. (e) D , the difference of the scores when two normal dice are rolled . (f) G , the number of girls in a family of three children . 11. In a game, a fair die is rolled and a fair coin is tossed. If a head occurs then S is the score on the die minus one. If a tail occurs then S is twice the score on the die. Find the probability distribution of S. 12. Two tetrahedral dice each numbered 1, 2, 3, 4 are rolled together. Let S = the sum of the two scores and let D = the difference between the two scores. 1 (a) Show that P(S = 7) = . 8 (b) Find the probability distribution of the random variable S. 1 (c) Show that P(D = 0) = . 4 (d) Find the probability distribution of the random variable D . 13. In a gambling game using a normal pack of playing cards, a heart wins 40p, a diamond wins 20p, a spade loses 10p and a club loses 40p. Two playing cards are selected at random, with replacement. The random variable X represents the profit, in pence, made after each selection. (a) Show that 1 1 (i) P(X = 10) = (ii) P(X = - 50) = 8 8 (b) Find the probability distribution of X. 14. A fair die is rolled repeatedly until a six appears or 3 rolls of the die have been made. The random variable R represents the number of rolls of the die. 5 (a) Show that P(R = 2) = . 36 (b) Find the probability distribution of R. (c) The random variable S represents the number of sixes. Find the probability distribution of S. 15. A box contains three blue and two red pens. Three pens are taken at random from the box. The random variable R is the number of red pens obtained. Find the probability distribution of R. 16. Three committee members are to be selected from 5 men and 4 women. The random variable M is the number of men appointed to the committee assuming the selection is done at random. Find the probability distribution of M. Mathematics: Statistics (Higher) Students exercises 32 Exercise 11 - Discrete Probability Distributions (Expectation and Variance) 1. Find the expected value of X for each of the following probability distributions. x 1 2 3 4 P(X = x) 0.2 0.4 0.3 0.1 x 1 2 3 4 P(X = x) 1 12 1 6 1 3 5 12 ( a) ( b) x -2 -1 0 1 2 P(X = x) 0.15 0.05 0.27 0.3 0.23 ( c) 2. Y is a random variable with probability distribution given in the table below. y 2 3 5 p 10 P(Y = y) 0.1 0.4 0.2 0.1 0.2 The expected value of Y is 5.2 . Find p. 3. Z is a random variable with probability distribution given in the table below. z 1 2 3 4 5 P(Z = z) 0.15 x 0.1 y 0.25 The expected value of Z is 2.9 . Find x and y. 4. The probability function of a discrete random variable T is given by P(T = t) = 2kt , t = 1, 2, 3, 4 and k is a constant. Find k then E(T). 5. A man buys 10 tickets from a total of 500 tickets in a raffle where there is only one prize of £40. The price of a ticket is 20p. If all the tickets are sold, calculate his expected loss. 6. The School Fair runs a stall offering a £25 prize for a £2 stake to anyone who can roll a total of eleven or more on two dice. Calculate the expected gain or loss made by the school if 240 people take part. Mathematics: Statistics (Higher) Students exercises 33 7. In a multiple-choice examination, a candidate is awarded three marks for a correct answer but loses one mark for an incorrect answer. Each question has 5 alternative answers. Assuming a candidate selects answers at random, find the expected marks gained or lost per question. 8. An unbiased tetrahedral die has four faces labelled 1, 2, 3, 4. If the die lands on the face marked 1, the player has to pay 40p. If it lands on a face marked 2 or 3 the player wins 20p, and if it lands on the face labelled 4 then the player wins 10p. (a) What is the expected profit or loss of the player on each roll of the die ? (b) A fair game is one where the expected profit is zero on each roll of the die. To ensure that the above game is fair , what should be the stake for each roll of the die ? 9. Two cards are selected at random from a normal set of 52 playing cards (the first card being replaced before the second card is selected). £1 is paid for selecting a heart, 50p for selecting a diamond and nothing for selecting a club or a spade. The entrance stake is also lost regardless of a win or loss. If the game is to be fair, how much should the entrance stake be ? 10. Find the variance of X for each of the following probability distributions: x 0 1 2 3 P(X = x) 0.15 0.3 0.35 0.2 (a ) x -2 -1 0 1 2 P(X = x) 0.3 0.3 0.2 0.1 0.1 ( b) x 1 2 3 P(X = x) 1 3 1 2 1 6 ( c) x -10 0 10 20 P(X = x) 1 5 3 10 2 5 1 10 ( d) Mathematics: Statistics (Higher) Students exercises 34 11. Y is a random variable with probability distribution given in the table below. y 1 a P(Y = y) 1 4 3 4 The variance of Y is 6.75 . Find a, given that it is positive. 12. The random variable Z has the following probability distribution: z 3 4 5 P(Z = z) b a b where a and b are constants . (a) Write down E(Z). (b) Given that Var(Z) = 0.8 , find the values of a and b. 13. Birds of a particular species lay either 0, 1, 2, or 3 eggs in their nests. The random variable N, the number of eggs laid, has the following probability distribution: n 0 1 2 3 P(N = n) 0.25 0.35 0.3 0.1 Calculate the expectation and variance of N. 14. The number, X, of people queuing at a bus stop has the following probability function: P(X = x) = k(7 - x)(x + 1) , x = 1, 2, 3, 4, 5, 6 and k is a constant . (a) Find k. (b) Find the expected value and variance of X. 15. X represents the score when a single unbiased cubical die is rolled. Find E(X) and Var(X). 16. A fair coin is tossed twice and the random variable T represents the number of tails recorded. Calculate E(T) and Var(T). 17. Two fair cubical dice are rolled. The random variable S is the sum of their scores and the random variable D is the difference between their scores. (a) Calculate E(S) and Var(S). (b) Calculate E(D) and Var(D). 18. A fair cubical die is rolled repeatedly until a six appears or three rolls of the die have been made. The random variable R represents the number of rolls of the die. Calculate E(R) and Var(R). Mathematics: Statistics (Higher) Students exercises 35 19. A debating team of two has to be chosen from two boys and three girls. The number of boys in the team is the random variable N. Find E(N) and Var(N). 20. Two counters are drawn without replacement from a box containing three blue and five red counters. The random variable R represents the number of red counters selected. Find E(R) and Var(R). 21. A committee of three has to be selected from three men and four women. The number of women on the committee is the random variable W. Find E(W) and Var(W). Mathematics: Statistics (Higher) Students exercises 36 Exercise 12 - Discrete Probability Distributions (Simulation) In the questions below use the following calculator generated list of random numbers. 0.921 0.836 0.255 0.726 0.247 0.101 0.731 0.222 0.594 0.820 0.934 0.492 0.095 0.402 0.646 0.352 0.815 0.729 0.020 0.389 0.367 0.233 0.187 0.235 0.784 0.451 0.331 0.718 0.942 0.730 1. For each of the following probability distributions simulate 10 observations. x 0 1 2 3 P(X = x) 0.3 0.4 0.2 01 . x 1 2 3 4 5 P(X = x) 0.2 0.2 0.4 0.1 0.1 (a) (b) x 1 2 3 4 P(X = x) 1 5 1 4 1 4 3 10 (c) x 3 4 5 6 P(X = x) 1 10 7 20 3 10 1 4 (d) 2. The number of days taken by a builder to complete the construction of a garage is represented by the discrete random variable N. N has the following probability distribution: n 5 6 7 8 9 P(N = n) 0.2 0.2 0.4 0.1 0.1 Simulate the times taken to complete 10 such constructions. Mathematics: Statistics (Higher) Students exercises 37 3. An Advanced Higher Maths class has 5 students. The number of students who attend class on a Friday is represented by the discrete random variable S. S has the following probability distribution: s 1 2 3 4 5 P(S = s) 0.05 012 . 0.31 0.45 0.07 Simulate the attendance at the class on 10 such Fridays. 4. The number of consecutive days spent in a hotel by business executives is represented by the discrete random variable D. D has the following probability distribution: d 1 2 3 4 5 6+ P(D = d) 21 50 1 4 1 5 2 25 1 20 0 Simulate the length of stay at the hotel by 10 business executives. Mathematics: Statistics (Higher) Students exercises 38 Exercise 13 - Continuous Probability Distributions 1. (a) Sketch the graph of f(x) and verify that it is a probability density function. x f(x) = 2 0 (b) Find P( X > 1) . for 0 ≤ x ≤ 2 elsewhere 2. (a) Sketch the graph of f(x) and verify that it is a probability density function. for 0 ≤ x ≤ 1 1 (4 x + 3) f(x) = 5 elsewhere 0 (b) Find P(0 ≤ X ≤ 0.75). 3. (a) Sketch the graph of f(x) and verify that it is a probability density function. 3 x( 2 - x) f(x) = 4 0 for 0 ≤ x ≤ 2 elsewhere (b) Find P(1 < X < 2) . 4. The continuous random variable X has probability density function given by: kx 2 for - 2 ≤ x ≤ 2 f(x) = elsewhere 0 (a) Find k and sketch the graph of f(x). (b) Calculate P(-1 < X < 0). 5. The continuous random variable X has probability density function given by; 13 x for 1 ≤ x ≤ k = f(x) elsewhere 0 (a) Find k and sketch the graph of f(x). (b) Calculate i) P( 32 < X < 2 ) ii) P(2 < X < 3) 6. The lifetime, X years, of a light bulb has a continuous probability distribution with the following probability density function: for 0 ≤ x ≤ 4 kx( 4 - x) f(x) = 0 elsewhere (a) Find the value of constant k. (b) Find the probability that the light bulb will last for less than one year. Mathematics: Statistics (Higher) Students exercises 39 7. The length, X metres, of a certain species of snake has a continuous probability distribution with the following probability density function: kx 2( 3 - x) for 0 ≤ x ≤ 3 f(x) = 0 elsewhere (a) Find the value of constant k. (b) i) Find the proportion of snakes under 2 metres. ii) Find the proportion of snakes between 1 metres and 3 metres in length. 8. When a boy throws a discus, the distance, X metres, it travels has a continuous probability distribution with probability density function given by: k( 900 - x 2 ) for 0 ≤ x ≤ 30 f(x) = 0 elsewhere 1 (a) Show that k = and sketch the graph of f(x). 18000 (b) Find the probability that he throws the discus further than twenty metres. 9. At a garage the volume of weekly sales, X, in thousands of gallons, has a continuous probability distribution with probability density function given by: kx( 2 - x)2 for 0 ≤ x ≤ 2 f(x) = 0 elsewhere (a) Find the value of the constant k. (b) Find the probability that less than 1500 gallons are sold. 10. For each of the following random variables, sketch the probability density function and calculate the modal value of X. 1 x for 0 ≤ x ≤ 6 (a) f(x) = 3 18 0 elsewhere (b) 2 x( 3 - x) f(x) = 9 0 (c) 12 x 2( 1 - x) f(x) = 0 (d) 0.0064 x 3( 5 - x) f(x) = 0 for 0 ≤ x ≤ 3 elsewhere for 0 ≤ x ≤ 1 elsewhere Mathematics: Statistics (Higher) Students exercises for 0 ≤ x ≤ 5 elsewhere 40 Exercise 14 - Continuous Probability Distributions (Expectation and Variance) 1. A continuous random variable X has probability density function given by: kx 2 for 0 ≤ x ≤ 3 f(x) = elsewhere 0 where k is a positive constant. Find the values of (a) k (b) E(X) (c) Var(X). 2. A continuous random variable X has probability density function given by: 2(x + 1 ) f(x) = 3 0 Calculate (a) E(X) for 0 ≤ x ≤ 1 elsewhere (b) Var(X). 3. A continuous random variable X has probability density function given by: 3 (1 - x 2 ) f(x) = 4 0 Calculate (a) E(X) for - 1 ≤ x ≤ 1 elsewhere (b) Var(X). 4. A continuous random variable X has probability density function given by: 1 for - 12 ≤ x ≤ 12 f(x) = elsewhere 0 Calculate (a) E(X) (b) Var(X). 5. A continuous random variable X has probability density function given by: kx 3 for 2 ≤ x ≤ 3 f(x) = elsewhere 0 where k is a positive constant. Find the values of (a) k (b) E(X) (c) Var(X) . 6. A continuous random variable X has probability density function given by: k f(x) = x 4 0 for 1 ≤ x ≤ 2 elsewhere where k is a positive constant. Find the values of (a) k (b) E(X) (c) Var(X). Mathematics: Statistics (Higher) Students exercises 41 7. The incubation period , X days , for a particular disease is a continuous random variable with probability density function given by: k( 25 - x 2 ) for 0 ≤ x ≤ 5 f(x) = 0 elsewhere where k is a positive constant. 3 (a) Show that k = 250 (c) Calculate i) the expected incubation time ii) the probability that a particular individual will catch the disease during the third day. 8. The lifetime of an electrical component is X years, where X is a continuous random variable with probability density function given by: kx 2( 6 - x) 0≤ x ≤6 f(x) = 0 elsewhere where k is a positive constant. Find the values of (a) k (b) the expected lifetime µ (c) P(X < µ) . 9. In a Greek holiday resort the number, X hours, of sunshine per day from 7.00 a.m. to 7.00p.m. is a continuous random variable with probability density function given by: 1 [(x - 3 ) 2 + k ] f(x) = 300 0 for 0 ≤ x ≤ 12 elsewhere where k is a positive constant. Calculate (a) k (b) E(X) (c) Var(X) (d) the probability that , on any randomly chosen day, there will be more than ten hours of sunshine. Mathematics: Statistics (Higher) Students exercises 42 Exercise 15 - Cumulative Distribution Function 1. For each of the following, find the cumulative distribution function, F(x), of the random variable X with probability density function given by: for 2 ≤ x ≤ 3 1 (a) f ( x) = elsewhere 0 1 ( 2 - x) for 0 ≤ x ≤ 2 (b) f ( x) = 2 0 elsewhere 3 2 x for - 1 ≤ x ≤ 1 (c) f ( x) = 2 0 elsewhere 6 x - 1 - 3 x 2 for 0 ≤ x ≤ 1 (d) f ( x) = 0 elsewhere 3 x( 2 - x) (e) f ( x) = 4 0 20 x 3( 1 - x) (f) f ( x) = 0 for 0 ≤ x ≤ 2 elsewhere for 0 ≤ x ≤ 1 elsewhere 2. For each of the following, calculate the median of the random variable X with probability density function given by: 1 x (a) f ( x) = 72 0 x 1 (b) f ( x) = 8 + 4 0 3 2 x (c) f ( x) = 2 0 for 0 ≤ x ≤ 12 elsewhere for 1 ≤ x ≤ 3 elsewhere for - 1 ≤ x ≤ 1 elsewhere 2 (x + 2) (d) f ( x ) = 5 0 3 x( 2 - x) (e) f ( x) = 4 0 4 x( 1 - x ) (f) f ( x) = 0 2 for 0 ≤ x ≤ 1 elsewhere for 0 ≤ x ≤ 2 elsewhere for 0 ≤ x ≤ 1 elsewhere Mathematics: Statistics (Higher) Students exercises 43 3. The probability density function of the random variable X is given by: for 0 ≤ x ≤ 1 2( 1 - x) f ( x) = elsewhere 0 Find (a) the median value of X (b) the interquartile range of X . 4. The continuous random variable X has probability density function given by: for 0 ≤ x ≤ 1 a + bx f ( x) = elsewhere 0 (a) Given that F(0.2) = 0.6, find the values of a and b. (b) Calculate the median value of X. 5. For each of the following, find and sketch the probability density function of the random variable X: 0 (a) F ( x) = x 2 1 for x < 0 for 0 ≤ x ≤ 1 for x > 1 0 (b) F(x) = 14(x - 4 ) 1 for x < 4 for 4 ≤ x ≤ 8 for x > 8 0 (c) F(x) = 18(x 2 - 1 ) 1 for x < 1 for 1 ≤ x ≤ 3 for x > 3 0 (d) F(x) = 2 x - x 2 1 for x < 0 for 0 ≤ x ≤ 1 for x > 1 Mathematics: Statistics (Higher) Students exercises 44 Exercise 16 - Continuous Probability Distributions (Miscellaneous) 1. The random variable X has probability density function given by: for 1 ≤ x ≤ 3 0.5 f ( x) = elsewhere 0 (a) (b) (c) (d) Sketch the probability density function of X . Find P(1.5 ≤ X ≤ 2.5). Find the cumulative distribution function of X. Find the mean and median of X. 2. The random variable X has probability density function given by: for 10 ≤ x ≤ 11 k - 10 + x f ( x) = 0 elsewhere where k is a positive constant. (a) Find k and sketch the probability density function of X. (b) Find the cumulative distribution function of X. (c) Find the mode and median of X. 3. The random variable X has probability density function given by: x2 f(x) = 21 0 (a) (b) (c) (d) for 1 ≤ x ≤ 4 elsewhere Sketch the probability density function of X. Find E(X) and Var(X). Find the cumulative distribution function of X. Find the mode and median of X. 4. The random variable X has probability density function given by: 3 ( 4 - x2 ) f(x) = 16 0 (a) (b) (c) (d) for 0 ≤ x ≤ 2 elsewhere Sketch the probability density function of X. Find E(X) and Var(X). Find the cumulative distribution function of X Find the mode and median of X. Mathematics: Statistics (Higher) Students exercises 45 5. The random variable X has probability density function given by: 3x 2 + 1 f(x) = 4 0 (a) (b) (c) (d) for -1 ≤ x ≤ 1 elsewhere Sketch the probability density function of X . Find E(X) and Var(X). Find the cumulative distribution function of X. Find the mode and median of X. 6. The height, X metres, of a particular type of tree is a continuous random variable with probability density function given by: (a) (b) (c) (d) (e) kx 2( 6 - x) for 0 ≤ x ≤ 6 f(x) = 0 elsewhere Find the value of the constant k. Find the expected height. Calculate the probability that any tree chosen at random will be less than 4 metres high. Find the cumulative distribution function of X. Find the mode of X. 7. The age, X years, to which a newborn infant will live is a continuous random variable with a probability density function given by: (a) (b) (c) (d) (e) kx 3( 100 - x) for 0 ≤ x ≤ 100 f(x) = 0 elsewhere Find the value of the constant k. Find the expected lifespan of the infant. Calculate the probability that any infant chosen at random will live longer than 80 years. Find the cumulative distribution function of X. Find the mode of X. 8. The cumulative distribution function of a continuous random variable X is given by: 0 x F(x) = ( 4 + x) 12 1 (a) (b) (c) (d) for x < 0 for 0 ≤ x ≤ 2 for x > 2 Find the probability density function of X. Find the mode of X. Find the median of X. Find the lower quartile of X. Mathematics: Statistics (Higher) Students exercises 46 9. The cumulative distribution function of a continuous random variable X is given by: 0 F(x) = 2 x 2 - x 4 1 for x < 0 for 0 ≤ x ≤ 1 for x > 1 (a) Find the probability density function of X. (b) Find the mode of X. (c) Find P(0.3 < X < 0.6). 10. On any day the amount of time, X hours, that a person spends watching television is a continuous random variable with cumulative distribution function given by: (a) (b) (c) (d) (e) for x < 0 0 20 x - x 2 F(x) = for 0 ≤ x ≤ 10 100 for x > 10 1 Find the probability density function of X. Find the probability that on any day a person chosen at random will spend between 6 and 8 hours watching television. Find mode, mean and median of X. Find the variance of X. Find the interquartile range of X Mathematics: Statistics (Higher) Students exercises 47 CORRELATION AND LINEAR REGRESSION Exercise 1 - Correlation Note that in the following questions “correlation coefficient” refers to Pearson’s Product Moment Correlation Coefficient. 1. For each of the following sets of data: (a) Plot a scattergraph. (b) Calculate the correlation coefficient. (c) Comment on the relationship between x and y. A 2 3 4 5 6 x 1 y 4 11 11 18 20 24 B x y 1 20 2 12 3 16 C x y 1 2 2 5 3 9 4 10 D x y 1 0 2 9 3 4 4 2 4 12 5 8 5 4 5 7 6 8 6 1 6 3 2. The marks for 9 students in a maths and a physics test are given below. Maths mark Physics mark 55 70 66 48 42 60 73 73 81 79 57 72 64 69 74 81 37 70 (a) Plot a scattergraph for this data and calculate the correlation coefficient. (b) Comment on the significance of both. 3. The data below relates to the systolic blood pressure and percentage body fat of six heart patients. Blood Pressure Body Fat 110 16 120 9 135 20 135 22 140 25 150 23 (a) Plot a scattergraph for this data and calculate the correlation coefficient. (b) Comment on the significance of both. Mathematics: Statistics (Higher) Students exercises 48 4. The heights of 8 sons and their fathers were measured in attempt to establish a link. Fathers’ height (cm) 162 Sons’ height (cm) 180 170 182 176 179 180 182 187 187 190 185 192 199 192 175 (a) Plot a scattergraph for this data and calculate the correlation coefficient. (b) Comment on the significance of both. Mathematics: Statistics (Higher) Students exercises 49 Exercise 2 - Linear Regression 1. (a) Find the regression line, using the method of least squares, for the following data. x y 1 1 1 3 5 2 5 4 (b) Use the regression equation to predict y when x = 3. 2. (a) Find the least squares regression line for the following data points x y 0 4 2 2 2 0 5 -4 6 -4 (b) Plot a scattergraph for these data points and the regression line. 3. A scientist is investigating the effectiveness of a particular weed killer. He uses different concentrations of weed killer and counts the number of surviving weeds in a fixed area. The results were as follows: Concentration (mg/litre) No. of Weeds (per 10 m2) 1 3 5 7 9 11 13 15 30 24 22 19 16 13 10 6 (a) Plot a scattergraph for this data. (b) Obtain the least squares regression equation and plot it on your graph. (c) Use the regression equation to predict the number of weeds when the concentration of weed killer is 10 mg/litre. 4. Maximum heart rate decreases with age and is used as a guide for exercising safely. The data below was obtained from a treadmill experiment. Age(x) 26 40 43 44 41 27 40 39 40 26 Max. Heart Rate (y) 192 178 172 175 173 191 173 175 179 191 Mathematics: Statistics (Higher) Students exercises 50 (a) Plot a scattergraph of these data and comment on any relationship. (b) Determine the least squares regression equation. (Σx = 366, Σy = 1799, Σx2 = 13688, Σxy = 65329) (c) Predict the maximum heart rate of a person who is 35 years old. 5. A government survey obtained the following data about the money spent annually by a family of four on food. Income (x) (£1000s) 22 20 24 27 16 24 19 25 Expenditure on food (y) (£100s) 50 45 42 44 37 26 39 43 (a) Determine the least squares regression equation for the data. (Σx = 177, Σy = 326, Σx2 = 4007, Σxy = 7228) (b) Predict the annual food expenditure of a family whose (disposable) income is £21 000. 6. A Maths class has two tests per session, Christmas and summer. The marks for ten randomly selected pupils are given below. Christmas (x) Summer (y) 69 75 42 66 43 63 40 63 100 78 80 73 90 73 77 68 47 62 68 65 (a) Plot a scattergraph of this data. Comment on the relationship between the marks in both tests. (b) Determine the equation of the least squares regression line. (Σx = 656, Σy = 686, Σx2 = 47236, Σxy = 45956) (c) Predict what a pupil who scored 50 at Christmas would get in the summer test. 7. A factory manager records the cost of production and the number of units produced over a 6 month period. Units produced (x) (1000s) Cost of Production (y) (£1000s) 13.1 29.3 13.2 29.2 13.3 29.4 14.1 30.4 14.2 31.0 15.0 32.4 (a) Plot a scattergraph of this data and comment upon the nature of the relationship between the number of units produced and the production costs. (b) Obtain the least squares regression equation. (Σx = 82.9, Σy = 181.7, Σx2 = 1148.19, Σxy = 2315.13) Mathematics: Statistics (Higher) Students exercises 51 (c) Suggest a practical interpretation of the slope estimate. (d) Estimate the production costs of 14 000 units. 8. When a metal bar is heated it expands. The amount by which it expands and the increase in temperature of such a bar are given below. Temperature rise (x) (°C) Expansion (y) (cm) 50 100 150 200 250 300 0.35 0.85 1.20 1.54 1.92 2.32 (a) Plot a scattergraph for this data. (b) Obtain the least squares regression equation. (Σx = 1050, Σy = 8.81, Σx2 = 227500, Σxy =1766.5) (c) Use this equation to predict the expansion of the rod when its temperature is increased by 175 °C. 9. The stopping distance and velocity of six cars is given below. Velocity (m/s) (x) Stopping distance (m) (y) 9 14 18 23 27 32 15 22 37 50 77 93 (a) Plot a scattergraph for this data and comment on the nature of the relationship. (b) Calculate the least squares regression equation. (Σx = 123, Σy = 294, Σx2 = 2883, Σxy =7314) (c) Comment on the practical significance of your answer. (d) Can you predict the stopping distance of a car travelling at 120m/s ? 10. In a particular chemical reaction the volume of gas produced and the concentration of acid used was measured. Acid concentration (moles/litre) (x) Volume of gas (cm3) 0.10 200 0.12 270 0.15 320 0.17 391 0.20 440 (a) Plot a scattergraph for this data. (b) Obtain the least squares regression equation. (Σx = 0.74, Σy = 1621, Σx2 = 0.1158, Σxy =254.87) (c) Predict the volume of gas produced when the acid concentration is 0.16 moles/litre. Mathematics: Statistics (Higher) Students exercises 52 11. The height and weight of 10 randomly selected male students are as follows: Height (cm) 164 170 180 180 167 190 170 177 180 175 Weight (kg) 80 60 84 74 57 90 70 72 69 70 (a) Determine the least squares regression equation for the data. (Σx = 1753, Σy = 726, Σx2 = 307839, Σxy = 127693) (b) Plot the data points and the regression line. (c) Use the regression equation to predict the weight of a student who is: (i) 172 cm (ii) 185 cm. Mathematics: Statistics (Higher) Students exercises 53 Mathematics: Statistics (Higher) Students exercises 54 ANSWERS - PREVIOUS KNOWLEDGE Exercise 1 - Average/Variability 1 Set A: mean = 101; Q1 = 98; Q3 = 105; IQR = 7. Set B: mean = 101; Q1 = 90; Q3 = 110; IQR = 20. The means of both groups are very similar. However the spread of B is much greater. 2 Judge A: mean = 9.47; Q1 = 9.1; Q3 = 9.7; IQR = 0.6. Judge B: mean = 9.23; Q1 = 9.0; Q3 = 9.4; IQR = 1.0. Judge A has a higher mean, but his scores cover a smaller range of values. 3 n = 10; Σx = 2006 ; Σx 2 = 402428 ; mean = 200.6 1 2 ( Σx) 2 Σx − = n - 1 n sample s tan dard deviation, s = 1 (2006) 2 . 402428 − = 165 9 9 4 n = 9 ; Σx = 577.9 ; Σx 2 = 37109.77 ; mean = 64.21 sample s tan dard deviation = 1 1 2 ( Σx) 2 (577.9) 2 Σx − = 37109.77 − = 0.521 n -1 n 8 9 5 (a) mean = 74.4 ; sample standard deviation = 5.18 (b) 1000/74.4 = 13.4 so a minimum of 13 people but 12 would be safer 6 n = 5 ; Σx = 131 ; Σx 2 = 3451 ; mean = 26.2 sample s tan dard deviation = 7 8 9 10 mean = 45 ; mean = 30.6; y = 12 ; y = 91 ; 1 2 ( Σx) 2 Σx − = n -1 n 1 (131) 2 3451 − = 2.17 4 5 sample standard deviation = 3.16 sample standard deviation = 3.53 sample standard deviation = 3.39 sample standard deviation = 7.38 Exercise 2 - Exploratory Data Analysis 1 • • • • • • • • • • • 59 65 68 72 75 79 81 (b)median = 72 kg mode = 75 kg Mathematics: Statistics (Higher) Student Exercise Answers 1 2 • • • • • • • • • • • • • • • • • • • • 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 median = 3.4 ; mode = 3.4 4 5 6 7 8 64 means 64 seconds median = 60 seconds ; 5 1 0 4 3 3 4 (a) 8 3 3 5 6 4 4 7 8 6 range = 83-35 = 48 seconds. Age 7 8 4 0 4 5 4 1 0 8 8 6 6 3 7 2 2 2 0 6 6 Age 8 29 30 31 32 33 5 34 1 35 0 36 1 37 0 38 1 39 2 40 0 41 5 42 2 6 9 4 3 5 8 7 9 4 6 356 means 356 (b) Age 7 median = 326 mode = 326 lower quartile = 310.75 upper quartile = 333.25 Age 8 median = 384 mode = none lower quartile = 367 upper quartile = 399.25 (c) The higher median and lower and upper quartiles suggest an improvement in reading from age 7 to age 8. Women 5 8 4 8 8 8 4 8 2 7 2 8 2 7 1 6 2 6 2 6 4 8 0 6 2 6 0 6 0 Men 5 5 6 6 7 7 8 8 9 9 8 0 6 0 6 0 0 6 0 6 4 0 2 Mathematics: Statistics (Higher) Student Exercise Answers 2 6 2 2 8 4 2 8 4 2 8 4 2 4 4 2 72 means 72 Women 79 72 72 86 median mode Q1 Q3As Men 70 66 66 74.5 The data suggests that the men’s heartbeats are, on average, slower than those of women and have less variability 6. Girls Boys 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Although the medians are the same, the boys appear to be quicker than the girls for whom the maximum, minimum, and interquartile range are all smaller. 7 3 4 5 6 7 8 7 1 4 0 0 0 median mode Q1 Q3 8 1 5 1 2 4 9 5 7 1 8 5 9 9 1 1 8 9 3 3 4 7 9 3 3 9 65.5 61 54 70 54 8 5 5 6 6 1 1 3 3 6 6 8 8 8 7 0 0 0 0 7 7 7 9 9 8 1 1 9 9 6 72 means 72°F 4 9 2 70 4 4 2 3 3 Mathematics: Statistics (Higher) Student Exercise Answers 3 median mode Q1 Q3 72.5 70 65 73 As 96> Q3 .5×IQR , this is an outlier. This could be an instrument error or an error in recording. * 65 73 The boxplot shows a distribution skewed towards the lower end of the temperature range. The interquartile range is 8 indicating that fifty percent of the temperatures lie in this range. 9 (a) Interquartile range = 7 days (b) min Q1 median mode Q3 max 1 4 7 1 11 55 (c) 23 and 55 days are both outliers. (d) * * 1 2 3 4 5 6 7 8 9 10 This is a fairly symmetrical distribution with two significant outliers. These could be patients with unusual conditions or, for example, elderly patients needing nursing care but none is available. 10 (a) IQR = 19 (b) Min 66 Q1 95.5 Median 102 Q3 115.5 Max 129 (c) There are no outliers. (d) 66 95.5 115 129 11 Min=16, Q1=22, Median=35, Q3=51, Max=97 Mathematics: Statistics (Higher) Student Exercise Answers 4 * 16 22 35 51 There is one outlier at 97 seconds. The boxplot shows a distribution skewed towards the lower end. Probably some explanation other than experimental/ human error . 12 (a) The interquartile range is 16.5 (b) Min 39 Q1 51 Median 61.5 Q3 67.5 Max 93 (c) The value 93 is an outlier. (d) * 39 51 67.5 This is a distribution skewed towards the high end with an outlier at 93. 13 (a) The interquartile range is 2.5 (b) Min 1 Q1 11.5 Median 13 Q3 14 Max 28 (c) The values 1 and 28 represent outliers. (d) * 11.5 14 There are a variety of possible explanations for the these two outliers. * Exercise 3 - Statistical Graphs 1 The median for the age group 20 -25 is lower than for the older age group, suggesting that for the 15 countries the death rate is lower among this age group. The range and interquartile range for the older age group is higher than for the 2025 group. This suggests that although the death rate in this group is higher, there is a wider variation between the 15 countries for this age group. This could be because of poverty, variation in welfare provision, exposure to disease, environmental effects, etc. 2 The plants grown in soil A appear to be taller than those grown in soil B. Both soils show similar distributions, with similar ranges and interquartile ranges, but with soil A having a higher median. The outlier shown could arise from a measurement error or the plant may have grown in a favourable position, receiving, for example, more sunlight, fertiliser, etc., than the others. 3 (a) For the boys the median is 31, and the quartiles are 25.5 and 35.5. The corresponding statistics for the girls are 27, 22 and 30.5 Mathematics: Statistics (Higher) Student Exercise Answers 5 (b) The girls completed this task more quickly. Their median time was lower and the spread (interquartile range) was smaller. 4 There appears to be little difference between the sales of Type A or Type B. The median and IQR of both is very similar. 5 The median of group A is lower than that of Group B, suggesting that the drug has made a difference. However there is a greater spread of weights in this group shown by the larger interquartile range. Mathematics: Statistics (Higher) Student Exercise Answers 6 ANSWERS - PROBABILITY Exercise 1 - Simple Probability 1 1 3 (c) (d) 13 2 13 1 2 1 2 (b) (c) (d) 2 3 3 1 3 (b) 2 5 4 4 (a) (i) 0.6 (ii) 0.4 (b) (i) (ii) 9 9 5 4 5 (a) (i) 0.6 (ii) 0.4 (b) (i) (ii) 9 9 1 1 2 3 1 6 (a) (b) (c) (d) (e) 5 5 5 5 5 3 7 16 1 1 1 7 5 8 (a) (b) (c) (d) (e) 8 4 2 8 8 7 13 9 (a) (b) 20 20 32 7 10 (a) (b) 195 10 53 45 43 11 (a) (b) (c) 220 88 176 1 1 52 1 (a) 6 1 (a) 4 (a) (b) Mathematics: Statistics (Higher) Student Exercise Answers 7 Exercise 2 - Sample Spaces & Further Simple Probability 1 JC JB JV MC MB MV TC TB TV 2 VR SR VM SM VW SW CR CM CW 3 (a) AA MM LL AM MA LM AL ML LA 4 AA MA LA BA NA AM MM LM BM NM 5 (a) BB BC BG CB CC CG GB GC GG 6 11 21 31 41 51 61 (a) 7 11 21 31 41 51 (a) 8 12 22 32 42 52 62 5 36 12 22 32 42 52 1 10 AL ML LL BL NL (b) AA MA AM MM AB MB LB BB NB (b) 14 24 34 44 54 64 1 12 13 23 33 43 53 (b) (c) 14 24 34 44 54 1 2 (c) MA LA MM LM ML LA MJ LJ JA JM JL JJ AN MN LN BN NN (b) BBB BBC BBG BCB BCC BCG BGB BGC BGG 13 23 33 43 53 63 (c) AA AM AL AJ 15 25 35 45 55 65 5 9 CBB CBC CBG GBB GBC GBG CCB CCC CCG GCB GCC GCG CGB CGC CGG GGB GGC GGG 16 26 36 46 56 66 (d) 1 18 15 25 35 45 55 16 26 36 46 56 1 30 (d) (e) 1 6 1 12 (e) 1 2 HHH HHT HTH THH HTT THT TTH TTT 1 1 3 7 (a) (b) (c) (d) 8 8 8 8 Mathematics: Statistics (Higher) Student Exercise Answers 8 9 HHHH HTHH THHH TTHH 3 (a) 8 HHHT HHTH HHTT HTHT HTTH HTTT THHT THTH THTT TTHT TTTH TTTT 1 15 1 (b) (c) (d) 16 16 4 10 (a) 32 (b) 1 32 Exercise 3 - Mutually Exclusive and Exhaustive Events 1 (a) PQ , PS , QR , RS (b) PR , QS 2 (a) (b) (c) (d) (e) 3 4 5 mutually exclusive yes no yes no no exhaustive yes no no no no 1 8 0.3 2 3 1 , , 5 10 5 3 7 4 (b) , , 5 10 5 1 1 1 ( c) , , 2 2 2 (d) B and C are mutually exclusive (a) (e) 3 , 5 , 9 and 10 are in S but not in either of A , B or C so A , B and C are not exhaustive . 6 A and B are neither mutually exclusive nor exhaustive . 7 (a) 8 21 25 1 (a) 2 4 25 4 (b) 5 (b) 9 (a) 0.74 10 (i) (a) 0.2 (ii) (a) 40 (c) 7 10 (d) 1 2 (e) 1 5 (b) 0.74 (b) 0.57 (b) 86 Mathematics: Statistics (Higher) Student Exercise Answers 9 11 (a) 12 1 3 13 14 5 5 ( b) 36 18 (a) 0.1 (b) 0.5 P(Post) = 1 1 1 , P(Herald) = , P(Moon) = 2 3 6 Exercise 4 - Independent Events 1 (a) unlikely (d) definitely (b) likely (e) unlikely 2 (a) 0.21 (b) 0.75 3 A & B and B & C are pairs of independent events . 4 5 1 1 1 , , 2 2 4 1 1 1 , , 13 2 26 6 (a) 1 4 (b) 1 6 7 (i) (a) 2 5 (b) 8 (i) 9 10 (c) unlikely (f) likely 4 5 16 (ii) (a) 25 (a) 3 5 1 5 8 (b) 25 (b) (c) 17 25 (a) 0.0125 (b) 0.0375 1 = 0.00032 3125 Mathematics: Statistics (Higher) Student Exercise Answers 10 Exercise 5 - Tree Diagrams (With Replacement) 1 (a) Ben's turn 1 2 Julie's turn Red Outcome Red and Red P(RR) = 0.25 Black Red and Black P(RB) = 0.25 Red Black and Red P(BR) = 0.25 Black Black and Black P(BB) = 0.25 Red 1 2 1 2 1 1 2 2 Black 1 2 (b) (i) 0.25 2 (a) 0.09 3 (a) Theory 0.15 0.65 Practical Pass Outcome Pass and Pass P(PP) = 0.5525 0.35 Fail Pass and Fail P(PF) = 0.2975 0.65 Pass Fail and Pass P(FP) = 0.0975 0.35 Fail Fail and Fail P(FF) = 0.0525 Pass Fail (i) 0.5525 (ii) 0.0525 (iii) 0.395 4 First ball Second ball Red Outcome Red and Red 1 3 Green Red and Green 2 3 Red Green and Red Green Green and Green 2 3 Red 2 3 1 3 Green 1 3 (a) (iii) 0.25 (b) 0.49 0.85 (b) (ii) 0.5 1 9 (b) 2 9 Mathematics: Statistics (Higher) Student Exercise Answers 11 5 First coin Second coin Head Outcome Head and Head 1 2 Tail Head and Tail 1 2 Head Tail and Head Tail Tail and Tail 1 2 Head 1 2 1 2 Tail 1 2 (a) 1 4 (b) 1 4 (c) 1 2 6 (a) 0.64 7 (a) 0.3025 (b) 0.2025 (c) 0.495 8 (a) (b) 0.32 (c) 0.04 First coin Second coin 1 2 1 2 Third Coin Head Outcome HHH 1 2 Tail Head HHT HTH Tail Head HTT THH Tail Head THT TTH Tail TTT Head Head 1 2 1 2 Tail 1 2 1 2 1 2 1 2 1 2 Head Tail 1 2 1 2 1 2 Tail 1 2 (b) (i) 1 8 1 216 (ii) 1 2 (iii) 3 8 5 72 (c) 2 27 9 (a) 10 (a) 0.216 (b) 0.288 11 (a) 0.343 (b) 0.189 (c) 0.784 (d) 0.657 12 (a) 1 64 (b) (b) 5 32 (c) 27 64 Mathematics: Statistics (Higher) Student Exercise Answers 12 Exercise 6 - Tree Diagrams (Without Replacement) 1 (a) First disc Second disc Red Outcome Red and Red Green Red and Green Red Green and Red Green Green and Green Goes for walk ? Yes Outcome Yes and Yes No Yes and No 0.75 Yes No and Yes 0.25 No No and No 3 7 Red 1 2 4 7 4 7 1 2 Green 3 7 (b) 2 3 14 (a) 0.7 0.3 (c) 4 7 Sunny day ? 0.95 Yes 0.05 No (b) 0.89 3 (a) First round 0.7 Arnold 0.5 0.3 0.5 0.2 Jack 0.8 (b) 0.245 Second round 0.7 Arnold 0.3 0.2 Jack 0.8 0.7 Arnold 0.3 0.2 Jack 0.8 Third Round Arnold Outcome AAA Jack Arnold AAJ AJA Jack Arnold AJJ JAA Jack Arnold JAJ JJA Jack JJJ (c) 0.45 Mathematics: Statistics (Higher) Student Exercise Answers 13 4 (a) 1st attempt 2nd attempt Pass .................................................. Pass at 2nd attempt 0.8 0.55 Fail 0.2 13 34 (c) 1 221 (a) 1 15 (b) 4 15 (c) 11 15 (a) 2 21 (b) 3 7 6 0.24 7 9 19 8 9 Pass ......................... Pass at 3rd attempt 0.2 Fail .......................... Fail at 3rd attempt (c) 0.088 (b) (a) 0.8 Fail 3 51 5 (c) (d) 188 221 10 21 1 ≈ 0.0005 (b) 1990 3783 ≈ 0.951 3980 10 (a) 11 7 31 12 (a) 0.00095 13 (a) 75 101 ≈ 0.493 (b) ≈ 0.0886 152 1140 14 (a) 1 30 (b) 1 120 15 (a) 4 91 (b) 12 65 (c) 39 ≈ 0.049 796 (b) 0.729 17 2 3 24 correct out of 40 18 (a) 16 Outcome Pass .......................................................................... Pass at 1st attempt 0.45 (b) 0.44 3rd attempt (c) (c) p(p - 1) (p + q)(p + q - 1) (c) 14 ≈ 0.246 57 3 ; on 50 occasions 10 27 91 (b) q(q - 1) (p + q)(p + q - 1) Mathematics: Statistics (Higher) Student Exercise Answers (c) 2pq (p + q)(p + q - 1) 14 Exercise 7 - Combinations 1 2 3 4 5 6 7 8 9 10 11 12 (a) 35 9 (a) = 6 56 56 (a) 120 28 , 10 1716 , 924 (a) 142506 (a) 165 (a) 70 (a) 126 (a) 44352 (b) 15 (c) 8 (d) 1 9 7 7 = 84 (b) = = 35 3 3 4 (b) 680 (b) (b) (b) (b) (c) 593775 (d) 2598960 23751 45 (assuming best debater and oldest pupil are different) 168 252 (c) 72 (b) 125474 Exercise 8 - Combinations (Probability) 1 2 3 4 5 5 28 3 5 18 35 3 7 2 5 6 (a) 7 7 30 8 (a) 9 (a) 10 (a) 11 (a) 12 (a) 33 646 46 833 1 55 1 7735 253 22372 1 14 (b) 613 646 (b) (b) (b) (b) (b) 1 270725 32 495 9 1547 3289 39151 1 2 15229 54145 92 (c) 99 46 (c) 7735 14927 (c) 156604 (c) Mathematics: Statistics (Higher) Student Exercise Answers 15 Exercise 9 - Simulation 1 Let Heads be represented by the digits 0 , 1 , 2 , 3 and 4 and Tails by the digits 5 , 6 , 7 , 8 and 9 . 9 T 8 T 4 H 1 H 9 T 7 T 6 T 4 H 0 H 1 H 1 H 5 T 4 H 1 H 2 H 6 T 8 T 4 H 1 H 8 T This simulation produced 11 Heads and 9 Tails . 2 Ignoring the digits 0 , 7 , 8 , 9 , the 10 rolls of the unbiased die are simulated as follows : 4 3 3 1 5 2 1 6 5 5 . After dividing the data into pairs and ignoring two digit numbers of 50 and above (and any duplication which occurs) , the following selection is obtained : 40 4 3 11 12 41 37 24 . Again, dividing the data into pairs and ignoring two digit numbers of 79 and above (and any duplication which occurs) , the following selection is obtained pupils numbered 33 72 42 38 22 13 Mathematics: Statistics (Higher) Student Exercise Answers . 16 Exercise 10 - Discrete Probability Distributions 1 1 12 (b) probability distribution with k = 0.1 (a) probability distribution with k = (c) not a probability distribution since 2 3 4 5 6 1 15 (a) 0.07 1 (a) 4 4 (a) 5 2 3 (b) 0.43 1 (b) 2 2 (b) 3 x 1 (a) (c) 0.78 7 (c) 12 13 (c) 15 2 3 9 , k = k 2k 1 k P(Y = y) 5 1 12 (a) (b) 25 25 p = 0.5 , q = 0.25 3k 4k 2 2k 5 3 3k 54 0 1 2 1 4 0 1 2 1 1 4 25 36 10 36 (a) h 10 > 1 4 (a) y 8 i (b) P(X = x) 7 ∑p 1 10 4 4k 5 (b) 3 10 5 , k = k 1 3 (b) 4 5 (a) P(H = h) s 2 (b) P(S = s) 1 36 b 0 1 2 P(B = b) 0.49 0.42 0.09 (c) (d) t P(T= t) 2 1 36 3 1 18 4 1 12 t P(T= t) 10 1 12 11 1 18 12 1 36 5 1 9 6 5 36 Mathematics: Statistics (Higher) Student Exercise Answers 7 1 6 8 5 36 9 1 9 17 d 0 1 2 3 4 5 P(D = d) 1 6 5 18 2 9 1 6 1 9 1 18 g 0 1 2 3 P(G = g) 1 8 3 8 3 8 1 8 1 12 4 1 6 (e) (f) 11 s P(S= s) 0 1 12 1 1 12 S P(S= s) 10 1 12 12 12 2 1 6 3 5 6 8 1 12 1 12 1 12 1 12 s 2 3 4 5 6 7 8 P(S = s) 1 16 1 8 3 16 1 4 3 16 1 8 1 16 d 0 1 2 3 P(D = d) 1 4 3 8 1 4 1 8 (b) (d) s -80 - 50 - 20 0 10 30 40 60 80 P(S = s) 1 16 1 8 3 16 1 8 1 8 1 8 1 16 1 8 1 16 13 14 r 1 2 3 P(R = r) 1 6 5 36 25 36 (b) s 0 1 P(S = s) 125 216 91 216 (c) Mathematics: Statistics (Higher) Student Exercise Answers 18 r 0 1 2 P(R = r) 1 10 6 10 3 10 15 m 0 1 2 3 P(M = m) 1 21 5 14 10 21 5 42 16 Exercise 11 - Discrete Probability Distributions (Expectation and Variance) 1 (a) 2.3 2 p=8 x = 0.4 , y = 0.1 1 k= , E(T) = 3 20 a loss of £1.20 3 4 5 6 7 8 1 (b) 3 12 (c) 0.41 15 a loss of £20 a loss per question of 0.2 marks (a) 2.5p (b) 2.5p 75p 17 (a) 0.94 (b) 1.64 (c) (d) 84 36 a = 7 (a) E(Z) = 4a + 8b (b) a = 0.2 , b = 0.4 E(N) = 1.25 , Var(N) = 0.8875 1 3 46 (a) k = (b) E(X) = 3 11 , Var(X) = 2 121 77 11 E(X) = 3 12 , Var(X) = 2 12 16 E(T) = 1 , Var(T) = 12 17 (a) E(S) = 7 , Var(S) = 5 56 9 10 11 12 13 14 17 (b) E(D) = 1 17 , Var(D) = 2 324 18 Mathematics: Statistics (Higher) Student Exercise Answers 19 19 755 18 E(R) = 2 36 , Var(R) = 1296 19 E(N) = 0.8 , Var(N) = 0.36 20 45 E(R) = 1 14 , Var(R) = 112 21 E(W) = 1 75 , Var(W) = 24 49 Exercise 12 - Discrete Probability Distributions (Simulation) (Random numbers have been selected from the top left hand corner and then to the right using the first digit of each triple.) 1 (a) (b) (c) (d) 2 Simulations are 9 , 8 , 6 , 7 , 6 , 5 , 7 , 6 , 7 , 8 . 3 Simulations are 4 , 4 , 3 , 4 , 3 , 2 , 4 , 3 , 4 , 4 . 4 Simulations are 4 , 3 , 1 , 3 , 1 , 1 , 3 , 1 , 2 , 3 . Simulations are 3 , 2 , 0 , 2 , 0 , 0 , 2 , 0 , 1 , 2 Simulations are 5 , 4 , 2 , 3 , 2 , 1 , 3 , 2 , 3 , 4 Simulations are 4 , 4 , 2 , 4 , 2 , 1 , 4 , 2 , 3 , 4 Simulations are 6 , 6 , 4 , 5 , 4 , 4 , 5 , 4 , 5 , 6 . . . . Exercise 13 - Continuous Probability Distributions 1 (a) f(x) (b) 3 4 (b) 0.675 1 0 ∫ 2 (a) 2 2 0 x f(x) dx = 1 f(x) 1.4 0 ∫ 1 1 0 x f(x) dx = 1 Mathematics: Statistics (Higher) Student Exercise Answers 20 3 (a) (b) ∫ 2 0 4 f(x) dx = 1 (a) (b) k = 5 (a) 0.5 1 16 3 16 f(x) (b) (i) 7 24 (ii) 5 6 7 3 1 3 6 7 8 0 1 k = 7 3 32 4 (a) k = 27 (a) k = (a) (b) x 7 5 32 (b) (i) 16 27 (ii) 8 9 f(x) (b) 13 8 20 0 9 (a) k = 3 4 15 (b) 30 4 27 x 243 256 Mathematics: Statistics (Higher) Student Exercise Answers 21 10 (a) f(x) (b) f(x) 1 3 1 2 0 6 x Modal value is 0 . (d) f(x) 16 9 27 64 2 3 Modal value is 23 3 x Modal value is 1.5 . (c) f(x) 0 1.5 0 3 1 x 0 . Modal value is 3 34 . 3 4 5 x Exercise 14 - Continuous Probability Distributions (Expectation and Variance) 1 9 1 1 (a) k = 2 (a) E(X) = 3 (a) E(X) = 0 4 (a) E(X) = 0 5 (a) k = 6 (a) k = 3 7 (b) E(X) = 1 7 7 (b) (i) 1.875 (ii) 0.224 8 (a) k = 9 (a) k = 4 (b) E(X) = 2 4 5 9 4 65 3 1 108 (c) Var(X) = 27 80 13 162 1 (b) Var(X) = 5 1 (b) Var(X) = 12 (b) Var(X) = (b) E(X) = 2 325 ≈ 2.6 194 2 (c) Var(X) = (c) Var(X) = ≈ 0.0765 3 49 µ3 (8 - µ ) 432 (b) µ = 3.6 (c) (b) E(X) = 8.88 (c) Var(X) = 8.3136 Mathematics: Statistics (Higher) Student Exercise Answers 24242 316875 (d) 41 90 22 Exercise 15 - Cumulative Distribution Function 1 0 (a) F(x) = x 1 for x < 2 for 2 ≤ x ≤ 3 for x > 3 0 1 (b) F(x) = 4 x(4 - x) 1 0 1 (c) F(x) = 2 x 3 1 for x < 0 for 0 ≤ x ≤ 2 for x > 2 for x < -1 for -1 ≤ x ≤ 1 for x > 1 0 2 (d) F(x) = 3x - x - x 3 1 0 1 2 (e) F(x) = 4 x (3 - x) 1 0 4 (f) F(x) = x (5 - 4x) 1 2 3 (a) 8.49 (a) 0.29 4 (a) a = 7 , b = - 5 (b) 0.16 2 5 (a) (b) 2.12 (b) 0.37 for x < 0 for 0 ≤ x ≤ 1 for x > 1 for x < 0 for 0 ≤ x ≤ 2 for x > 2 for x < 0 for 0 ≤ x ≤ 1 for x > 1 (c) 0 (d) 0.55 for 0 ≤ x ≤ 1 elsewhere 2x f(x) = 0 f(x) (e) 1 1 (b) f(x) = 4 0 f(x) for 4 ≤ x ≤ 8 elsewhere 1 4 2 0 (f) 0.54 1 x Mathematics: Statistics (Higher) Student Exercise Answers 0 4 8 x 23 1 4 x (c) f(x) = 0 5 for 1 ≤ x ≤ 3 elsewhere for 0 ≤ x ≤ 1 2 - 2x (d) f(x) = 0 f(x) elsewhere f(x) 2 3 4 1 4 1 0 3 x 0 1 x Exercise 16 Continuous Probability Distributions (Miscellaneous) 1 (a) f(x) (b) 0.5 0 (c) F(x) = 05 .x 1 1 2 0 2 1 3 x 1 k = 2 (a) f(x) 3 2 for x< 1 fo r 11≤ x ≤ 3 for for x > 3 mode 0.5 , median 2 (d) mean (b) 0 1 F(x) = 2 x(x -19) 1 (c) mode 11 , median 10.62 for x < 10 for 10 ≤ x ≤ 11 for x > 11 1 2 0 3 10 11 x (b) E(X) = 3 1 , Var(X) = 2067 ≈ 0.527 (a) f(x) 28 16 21 1 21 0 1 4 x (c) 0 1 3 F(x) = x 63 1 (d) mode 4 , median 3.19 Mathematics: Statistics (Higher) Student Exercise Answers 3920 for x < 1 for 1 ≤ x ≤ 4 for x > 4 24 4 19 3 , Var(X) = 80 4 0 1 (c) F(x) = 16 x(12 - x 2 ) 1 (a) f(x) (b) E(X) = 3 4 2 0 5 (d) mode 0 , median 0.69 x f(x) (a) (b) E(X) = 0 , Var(X) = 3 4 1 4 6 (a) k = 1 108 1 (b) E(X) = 3.6 0 1 3 (d) F(x) = 432 x (8 - x) 1 (e) mode is 4 7 1 (a) k = 500 000 000 7 15 0 1 2 (c) F(x) = 4 x(x + 1) 1 -1 for x < 0 for 0 ≤ x ≤ 2 for x > 2 for x < -1 for -1 ≤ x ≤ 1 for x > 1 (d) mode -1 or 1 , median 0 x 16 27 for x < 0 for 0 ≤ x ≤ 6 (c) for x > 6 (b) E(X) = 66 23 (c) 0.26272 0 1 (d) F(x) = 2 500 000 000 x 4 (125 - x) 1 for x < 0 for 0 ≤ x ≤ 100 for x > 100 (e) mode is 75 8 9 1 (2 + x) for 0 ≤ x ≤ 2 (a) f(x) = 6 0 elsewhere (b) mode is 2 (c) 1.16 (d) 0.65 4 x(1 - x 2 ) for 0 ≤ x ≤ 1 (a) f(x) = 0 elsewhere 1 (b) mode is (c) 0.4185 3 Mathematics: Statistics (Higher) Student Exercise Answers 25 501 (10 - x) 10 (a) f(x) = 0 (b) 0.12 5 (d) 5 9 for 0 ≤ x ≤ 10 elsewhere 1 (c) mode is 0 , mean is 3 3 , median is 2.93 (e) IQR is 3.66 Mathematics: Statistics (Higher) Student Exercise Answers 26 ANSWERS - CORRELATION AND LINEAR REGRESSION Exercise 1 - Correlation 1 (a) Strong linear relationship with narrow spread or scatter- data is highly correlated. 1 1 1 S XX = 91 − (21) 2 = 17.5 S YY = 1558 − (88) 2 = 267.3 S XY = 375 − (21)(88) = 67 6 6 6 S XY 67 r= = = 0.980 S XX S YY (17.5)(267.33) (b) Data appears to be linearly related with negative slope. Linear model seems appropriate. 1 1 1 S XX = 91 − (21) 2 = 17.5 S YY = 1072 − (76) 2 = 109.3 S XY = 228 − (21)( 76) = −38 6 6 6 S XY −38 r= = = −0.869 S XX S YY (17.5)(109.33) i.e. a strong negative correlation. (c) Scattergraph shows a curvilinear relationship. Data are not independent although r is close to zero (r = -0.102). There is a quadratic relationship in this example. (d) No obvious relationship between data and coefficient is close to zero. (r = 0.113) No relationship. 2 (a) Graph shows a weak positive relationship with a fair degree of scatter. r = 0.365 (b) Both suggest that there is little evidence for claiming a linear relationship between the maths and physics marks. 3 (a) Graph looks roughly linear; r = 0.758 (b) There is some evidence for a linear relationship between body fat and high blood pressure. 4 (a) r = 0.389 (b) There is evidence of a linear relationship between father and son’s height up to 190 cm. Mathematics: Statistics (Higher) Student Exercise Answers 27 Exercise 2 - Linear Regression 1 x 1 1 5 5 Σ x = 12 y 1 3 2 4 Σ y = 10 xy 1 3 10 20 Σ xy = 34 x2 1 1 25 25 2 Σ x = 52 y2 1 9 4 16 2 Σ y = 30 x2 0 4 4 25 36 2 Σ x = 69 y2 16 4 0 16 16 2 Σ y = 52 1 1 Sxx = ∑ x 2 − (∑ x) 2 = 52 - (12) 2 = 16 4 n 1 1 Sxy = ∑ xy − (∑ x)(∑ y ) = 34 − (12)(10) = 4 4 n The least squares regressiom line is y = α + βx where, Sxy 4 β is estimated by b = = = 0.25 Sxx 16 α is estimated by a = y − bx = 2.5 − 0.25 × 3 = 1.75 Thus the fitted least squares model is yˆ = 0.25 x + 1.75 (b) 0.25 × 3 + 1.75 = 2.5 2 x 0 2 2 5 6 Σ x = 15 y 4 2 0 -4 -4 Σ y = -2 xy 0 4 0 -20 -24 Σ xy = -40 1 1 ( ∑ x ) 2 = 69 - (15) 2 = 24 5 n 1 1 S xy = ∑ xy − ( ∑ x )( ∑ y ) = −40 − (15)( −2) = −34 5 n S xx = ∑ x 2 − -34 -34 )(3) = 3.85 . and a = -0.4 - ( = −142 24 24 Thus the fitted least squares model is y = 3.85 - 1.42x b= Mathematics: Statistics (Higher) Student Exercise Answers 28 3. x 1 3 5 7 9 11 13 15 Σ x = 64 y 30 24 22 19 16 13 10 6 Σ y = 140 xy 30 72 110 133 144 143 130 90 Σ xy = 852 x2 1 9 25 49 81 121 169 225 2 Σ x = 680 y2 900 576 484 361 256 169 100 36 2 Σ y = 2882 1 1 Sxx = ∑ x 2 − (∑ x) 2 = 680 - (64) 2 = 168 8 n 1 1 Sxy = ∑ xy − (∑ x)(∑ y ) = 852 − (64)(140) = −268 8 n - 268 b= = −1.60 and a = 17.5 - (-1.60)(8) = 30.3 168 Thus the fitted least squares model is y = 30.3 - 1.60 x (c) The predicted number of plants is 14.3. 4 (a) There appears to be a linear relationship between Maximum Heart Rate and Age. (b) 1 (366) 2 = 472.4 10 1 S XY = 65329 − (366)(1799) = −514.4 10 − 514.4 b= = −1.09 472.4 a = 179.9 − (−1.09)(36.6) = 220 Equation is y = 220 - 1.09 x S XX = 13868 − (c) Max. Heart Rate of a 35 year old = 182 5 (a) y = 0.17 x + 37.0 (b) Family with income of £21 000 spends £4057 annually on food. 6 (a) There appears to be a positive linear relationship (b) y = 0.23 x + 54 (c) Pupil scores 66 in summer test. 7 (a) There appears to be a strong positive linear relationship (b) y = 1.66 x + 7.28 (c) cost per unit (d) Production costs = £30 500 Mathematics: Statistics (Higher) Student Exercise Answers 29 8 (b) y = 0.008 x + 0.023 (c) Expansion is 1.42 mm 9 (a) There appears to be a strong positive linear relationship (b) y = 3.56 x - 24.0 (c) Stopping distance can be predicted for speeds between 9 and 32m/s. (d) No, speed is too great (>32m/s). 10 (b) y = 2380 x - 28.4 (c) Volume of gas produced is 352 cm3. 11 (a) y = 0.790 x - 65.9 (b) weight is 69.98 kg ; weight is 80.25 kg. Mathematics: Statistics (Higher) Student Exercise Answers 30