Module H1 Practical 10 Review of probability concepts and distributions 1. Probability Concepts During an epidemic of disease, a doctor sees 110 people who have symptoms commonly associated with the disease. Of these, 45 are women, of whom 20 actually have the disease. 15 of the men also have the disease. Suppose a person is selected at random from those with symptoms seen by the doctor. Define events: W: the selected person is a woman D: the selected person has the disease (a) Working in small groups of 3-4 persons per group, draw a Venn diagram for this problem in the box below. (b) Describe in words the events W, W D, W D, and W|D, and compute probabilities associated with each of these events. SADC Course in Statistics Module H1 Practical 10 – Page 1 Module H1 Practical 10 (c) If three people are selected at random, what is the probability that (i) all three of them have the disease (ii) exactly one of them has the disease? (d) Of people with the disease, 95% react positively to a diagnostic test, as also do 8% of people without the disease. What is the probability of a person selected at random (a) reacting positively (b) having the disease given that he or she reacted positively? SADC Course in Statistics Module H1 Practical 10 – Page 2 Module H1 Practical 10 2. Joint Probability Distributions Two discrete random variables X and Y have the joint probability distribution given in the table below: X Y 0 1 2 0 c 0.05 0.10 1 2 0.10 0.15 0.15 0.10 0.15 0.10 (i) Find the value of c. (ii) Find the marginal distributions of X and Y, and use these to find the expectations and variances of X and Y. (iii) State, giving a reason, whether X and Y are independent. SADC Course in Statistics Module H1 Practical 10 – Page 3 Module H1 Practical 10 3. Probabilities associated with the Binomial distribution Consider families with 3 children. Suppose that the proportion of girls in the whole population is 0.488. Assume that the outcome of each birth in the family is independent of previous birth outcomes. By (a) Draw a tree diagram to show all possible outcomes. Use this tree diagram to find the probability that in a given family (i) there will be exactly 3 boys; (ii) there will be 1 boy and 2 girls. (iii) there will be less than 2 girls. (b) Define an appropriate binomially distributed random variable concerning the birth outcomes in a family, and use it to determine the same set of probabilities requested in (a) above. Check that you get the same answers. (c) Discuss in small groups of 3 or 4 persons, whether the above probabilities are likely if they are calculated using the frequentists’ approach to probability, i.e. based on examining the records of a very large number of families experiencing three live births. If you think the probabilities will be different, can you explain why? How does this relate to assumptions under which the binomial model is valid? SADC Course in Statistics Module H1 Practical 10 – Page 4 Module H1 Practical 10 4. Probabilities associated with the Poisson distribution (a) The number of text messages received by a student in an hour follows a Poisson distribution with mean 2. Find the probability that in a given hour the student will receive (i) no text messages (ii) exactly 2 text messages (iii) at least one text message. (b) Often, as in example above, some sort of interval of time, distance, or other is specified when dealing with the Poisson distribution. There is one further property that is useful. Suppose for example it is expected that there are 4 flaws in 10 metres of fabric, then we would “expect” 2 flaws in 5 metres, 40 flaws in 100 metres, etc. A special property of the Poisson distribution is that the number of flaws in different lengths of the fabric will also have a Poisson distribution with the appropriate new mean (2, 40, etc). With this knowledge, what is the probability that the student will receive no text messages in (i) a half-hour period? (ii) a five-hour period? SADC Course in Statistics Module H1 Practical 10 – Page 5 Module H1 Practical 10 5. Probabilities associated with the Normal distribution Try either (a) or (b) by "hand", then check your answer using an appropriate Excel function. In either case, start by identifying and writing down a description for a suitable random variable X. Then proceed to determine the answer. (a) A bloodbank provides on average 144 pints of blood per day, with a standard deviation of 10 pints. The capacity of the blood bank has been worked out assuming that not more than 160 pints per day will be requested. On what percentage of days will this 160 pint limit be exceeded? (b) It is known that a dry spell of more than 10 days, during the period of establishment of a certain crop, will result in crop failure. The crop is to be grown at a new site where past rainfall records show that the number of continuous dry days during the crop establishment period has a normal distribution with mean 6.5 days and a standard deviation of 2.2 days. What is the chance that the crop will fail if grown at this site? SADC Course in Statistics Module H1 Practical 10 – Page 6 Module H1 Practical 10 6. Identifying the appropriate distribution Open the file H1_data.xls and move to the sheet named Mothers. This data concerns a subset of data from a small survey carried out in one region of a southern African country, to collect information on mothers who have had a child in the previous 12 months. A description of the data appears on page 8. Look at the data set and discuss in small groups of 3 or 4 persons, which variables could be identified as being random variables having binomial, Poisson or Normal distributions. If you have time, you could produce frequency tables or graphs of these variables to check the extent to which your answers are appropriate. Note down any comments you have below. SADC Course in Statistics Module H1 Practical 10 – Page 7 Module H1 Practical 10 Listing of data from survey of mothers -----------------------------------------------------------------------------storage display value variable name type format label variable label -----------------------------------------------------------------------------area byte %8.0g arealab Area of dwelling (1=Rural, 2=Urban) pov byte %9.0g poorlab Level of poverty in HH (1=not poor,2=poor,3=very poor) age byte %8.0g Age of mother(years) length byte %8.0g Length of pregnancy period (approx.weeks) labour byte %8.0g Length of labour(hours) prebirths byte %8.0g Number of previous live births hlthcare byte %8.0g yesno Had access to health advice during pregnancy hhead byte %8.0g yesno Is mother head of household (1=Yes,2=No) sex byte %8.0g sexlab Sex of child(1=F,2=M) weight float %9.0g Birth weight of child (kg) ------------------------------------------------------------------------------ Note: These data are come from a survey of many years ago and have been modified to suit the case study work here. It should not be regarded as showing real usable results, but for the purpose of the exercise, you should assume the data are real. SADC Course in Statistics Module H1 Practical 10 – Page 8