F73PS1 - Term 1 - 2006-7 Tutorial 1 – Basic concepts, probabilities and frequencies Solutions 1. Consider the following probabilities and discuss which of them can be interpreted as frequencies. If so, describe the repeatable experiment to which the frequency refers. Otherwise, give a reason why the probability has no frequentist interpretation. i) the probability that you pass this module; ii) the probability that a person randomly selected from the population scores 20 with a single throw of a dart. iii) the probability that you score 20 with a single throw of a dart. (What might happen if you tried to repeat the experiment?) iv) the probability of scoring a double 6 with a pair of dice. v) the probability that the human race will become extinct within 1000 years due to global warming. Solution: i) There is no repeatable experiment here, since the proposition refers to a set of circumstances which is not repeatable. Therefore there is no frequentist interpretation to this probability. ii) There is a repeatable experiment here. We could repeatedly sample from the population and the probability has an interpretation as a frequency. iii) Arguably this experiment is not repeatable under identical conditions because each time you throw the dart, your competence may be expected to increase due to practice effects. iv) Yes - there is a clear interpretation as a frequency, since you could repeatedly throw the dice under identical conditions. v) This is definitely a one-off event and there is no interpretation of the probability as a frequency. 2. Let S be the sample space for an experiment and let P() define a probability function on the subsets of S. Use the axioms of probability to convince yourself that: i) P(E) ≤ 1, for any event E. ii) If E = {s1, s2, s3, …, sk} then P(E) = P(s1) + P(s2) + … + P(sk) iii) P(AB) = P(A) + P(B) – P(AB) Solution: The axioms (see notes) state that: A) P(E) 0 for all events, B) P(S) = 1; C) If E and F have no outcomes in common then P(EF) = P(E) + P(F). This leads to the following solutions for i)- iii) above. i) Let E be any event (subset of sample space S) and let F be the set of all outcomes that are not in E. (Notation: F = S\E or Ec). Then E and F are disjoint and EF = S. It follows from B and C above that P(E) + P(F) = 1, so that P(E) = 1 - P(F). Moreover, P(F) 0 so that P(E) ≤ 1. ii) We can write E = {s1} {s2} {s3} ... {sk}, a union of disjoint events. Applying axiom C to this situation, it follows that P(E) = P(s1) + P(s2) + ... + P(sk). iii) Let B\A denote the set of elements that are in B but not in A and let A\B denote the set of elements in A but not in B. Then AB = (A\B) (AB) (B\A) and these three sets are disjoint from each other. From C) we have that P(A) = P(A\B) + P(AB) P(B) = P(B\A) + P(AB) P(AB) = P(A\B) + P(B\A) + P(AB) = P(A) + P(B) - P(AB), 3. Random sampling from a finite population. Suppose that we have n objects of which r share some a particular characteristic. Suppose we select an object at random is such a way that each of the objects is equally likely to be chosen. i) Let A denote the event that the selected object has the characteristic. Show from the axioms that P(A) = r/n. Solution: There are n possible outcomes from this experiment corresponding to the n different objects that can be drawn. Now r of these (s1, ..., sr}, say, form the event A. Now for any outcome, s, P(s) = 1/n since all objects are equally likely. It follows that P(A) = P(s1) + P(s2) + ... + P(sn) = r/n. ii) Now suppose that the first object is not replaced and a second object is then drawn. Let B denote the event that the second object drawn has the characteristic. Discuss whether the events A and B are independent. Solution: If A occurs then there are only (r-1) objects with the property out of the remaining (n-1) and the probability that B occurs in this case is (r-1)/(n-1). If A doesn't occur then after the removal of the first object there are still r objects with the property in which case P(B) = r/(n-1). Therefore the probability of B occurs differs depending on whether A has occurred or not - they are not independent. iii) Discuss the circumstances under which might we reasonably claim that A and B are independent. How do they depend on the values of n and r? Solution: Suppose n and r are large e.g. suppose n= 1000, r = 200. Then P(B) = 199/999 if A has occurred, or if A doesn't occur P(B) = 200/999. Both of these probabilities are very close to 0.2, so we can claim that the probability that B occurs is not significantly affected by whether A has or hasn't occurred, so that the 2 events are independent. 4. 1000 randomly selected school pupils from Edinburgh and 1000 randomly selected school pupils from Glasgow take a test of mathematical ability. Out of the 10 highest-scoring students, 2 are from Edinburgh and 8 are from Glasgow. Suppose that the variation in ability to undertake the test in pupils is identical in the two cities. i) Under this assumption what is the probability that the best student is from Glasgow? Solution: If the variation in ability is identical then the best student can be considered to a random draw from the set of all 2000 and by the results of question 3, the probability should be 1/2. ii) What is the probability that the 2nd best student is from Glasgow: i) when the best student is from Edinburgh? ii) when the best student is from Glasgow? Solution: Again, under the assumption of identical variation in ability in the two groups, the second best student is equally likely to be any of the 1999 pupils who remain after the best student is identified. Therefore in case i) this is 1000/1999, or case ii) 999/1999. Both of these probabilities are approximately 1/2. iii) Explain qualitatively why (roughly speaking) the number of students from Glasgow in the top 10 has the same probability function as the number of heads obtained when a fair coin is tossed ten times. Solution: Repeating the logic of ii) then we can see that (more or less) the probability that the next best student is from Glasgow is only weakly affected by the city of origin of the students above them. For the 10th student, the probability that they are from Glasgow ranges from 991/1991 (= 0.497) to 1000/1991 (= 0.502), depending on how many students from Glasgow are placed above them. Therefore we can think of the origin of the person in ith position, i = 1, 2, ...., 10 as being selected with equal probability from the set {Edinburgh, Glasgow}regardless of who's above them. iv) Calculate a p-value to quantify the strength of evidence against the hypothesis that there is no difference in mathematical ability of pupils in the two cities. Solution: Our outcome of 8 from Glasgow out of 10 in the top 10 looks a little extreme (perhaps) under the hypothesis that the variation is identical in the two cities. Under this hypothesis, the number of Glasgow students in the top 10, which we call X, follows a Binomial(10, 0.5) distribution by the logic of part iii). To get a p-value we need to calculate the probability that we obtain an observation at last as extreme as the current one. This is P(X2) + P(X8) = 2P(X2) (by symmetry). From tables this probability is 20.0547 = 0.109. When the variation is identical in the two cities we would get an observation at least as extreme around 11% of the time. This does not represent strong evidence against the hypothesis of equal ability. 5. The Binomial(n, r) distribution describes the distribution of the number of successes X that are recorded from n independent repetitions of a trial when the probability of success on any trial is p. Consider the following experiments on individuals each of which involves counting the number of successes: i) A randomly chosen individual is asked to take 10 ‘shots’ at a basketball goal and the number of ‘baskets’ are counted. ii) In a parapsychology experiment on telepathy, a randomly chosen person is asked to identify the (hidden) score on a fair die on 10 successive rolls, and the number of correct guesses is counted. iii) Out of 20 identical boxes, 10 contain cash prizes and the remainder are empty. A contestant is asked to select 10 different boxes and number of prizes is counted. Discuss the extent to which the outcome of each of these experiments could be modelled with a Binomial(10, p) distribution for some suitable p. If not give reasons, and suggest how and why the distribution of outcomes might deviate from a binomial distribution, for example by having too many extreme values. Solution: The assumptions underlying the binomial distribution are that it counts the number of successes X out of n independent trials where the probability of success is constant for each trial. We need to see whether these assumptions are valid for the three cases. i) Probably not Binomial since the probability of success p would vary between subjects, depending on ability at ball games, and for a given subject may tend to increase depending on how many trials had been performed (a practice effect). Over many subjects you would expect to see a number of very high scores and very low scores. ii) It seems plausible that the outcome of this experiment would follow a binomial distribution (particularly if you're sceptical about telepathy as a real phenomenon). For any subject and any trial the probability that they guess correctly is, arguably, 1/6, so the scores out of 10 would follow a Binomial(10, 1/6) distribution. iii) The key thing to notice is that since they must select 10 different boxes, each time they select a box with a prize, it becomes harder to choose one of the remaining prizes. (See discussion in question 3). Therefore we can't think of the outcome of 10 trials as being independent with fixed probabilities, so the Binomial distribution will not be a good representation.