Review of probability concepts and distributions

advertisement
Module H1 Practical 10
Review of probability concepts and distributions
1.
Probability Concepts
During an epidemic of disease, a doctor sees 110 people who have symptoms commonly
associated with the disease. Of these, 45 are women, of whom 20 actually have the disease.
15 of the men also have the disease. Suppose a person is selected at random from those
with symptoms seen by the doctor. Define events:
W: the selected person is a woman
D: the selected person has the disease
(a) Working in small groups of 3-4 persons per group, draw a Venn diagram for this
problem in the box below.
(b) Describe in words the events W, W  D, W  D, and W|D, and compute probabilities
associated with each of these events.
SADC Course in Statistics
Module H1 Practical 10 – Page 1
Module H1 Practical 10
(c) If three people are selected at random, what is the probability that
(i) all three of them have the disease
(ii) exactly one of them has the disease?
(d) Of people with the disease, 95% react positively to a diagnostic test, as also do 8% of
people without the disease. What is the probability of a person selected at random
(a) reacting positively
(b) having the disease given that he or she reacted positively?
SADC Course in Statistics
Module H1 Practical 10 – Page 2
Module H1 Practical 10
2.
Joint Probability Distributions
Two discrete random variables X and Y have the joint probability distribution given in the
table below:
X
Y
0
1
2
0
c
0.05
0.10
1
2
0.10
0.15
0.15
0.10
0.15
0.10
(i) Find the value of c.
(ii) Find the marginal distributions of X and Y, and use these to find the expectations and
variances of X and Y.
(iii) State, giving a reason, whether X and Y are independent.
SADC Course in Statistics
Module H1 Practical 10 – Page 3
Module H1 Practical 10
3.
Probabilities associated with the Binomial distribution
Consider families with 3 children. Suppose that the proportion of girls in the whole
population is 0.488. Assume that the outcome of each birth in the family is independent of
previous birth outcomes. By
(a) Draw a tree diagram to show all possible outcomes. Use this tree diagram to find the
probability that in a given family
(i) there will be exactly 3 boys;
(ii) there will be 1 boy and 2 girls.
(iii) there will be less than 2 girls.
(b) Define an appropriate binomially distributed random variable concerning the birth
outcomes in a family, and use it to determine the same set of probabilities requested in (a)
above. Check that you get the same answers.
(c) Discuss in small groups of 3 or 4 persons, whether the above probabilities are likely if
they are calculated using the frequentists’ approach to probability, i.e. based on examining
the records of a very large number of families experiencing three live births. If you think
the probabilities will be different, can you explain why? How does this relate to
assumptions under which the binomial model is valid?
SADC Course in Statistics
Module H1 Practical 10 – Page 4
Module H1 Practical 10
4.
Probabilities associated with the Poisson distribution
(a) The number of text messages received by a student in an hour follows a Poisson
distribution with mean 2. Find the probability that in a given hour the student will receive
(i)
no text messages
(ii) exactly 2 text messages
(iii) at least one text message.
(b) Often, as in example above, some sort of interval of time, distance, or other is specified
when dealing with the Poisson distribution. There is one further property that is useful.
Suppose for example it is expected that there are 4 flaws in 10 metres of fabric, then we
would “expect” 2 flaws in 5 metres, 40 flaws in 100 metres, etc. A special property of the
Poisson distribution is that the number of flaws in different lengths of the fabric will also
have a Poisson distribution with the appropriate new mean (2, 40, etc).
With this knowledge, what is the probability that the student will receive no text messages
in
(i) a half-hour period?
(ii) a five-hour period?
SADC Course in Statistics
Module H1 Practical 10 – Page 5
Module H1 Practical 10
5.
Probabilities associated with the Normal distribution
Try either (a) or (b) by "hand", then check your answer using an appropriate Excel
function. In either case, start by identifying and writing down a description for a suitable
random variable X. Then proceed to determine the answer.
(a)
A bloodbank provides on average 144 pints of blood per day, with a standard
deviation of 10 pints. The capacity of the blood bank has been worked out
assuming that not more than 160 pints per day will be requested. On what
percentage of days will this 160 pint limit be exceeded?
(b)
It is known that a dry spell of more than 10 days, during the period of
establishment of a certain crop, will result in crop failure. The crop is to be grown
at a new site where past rainfall records show that the number of continuous dry
days during the crop establishment period has a normal distribution with mean 6.5
days and a standard deviation of 2.2 days. What is the chance that the crop will fail
if grown at this site?
SADC Course in Statistics
Module H1 Practical 10 – Page 6
Module H1 Practical 10
6. Identifying the appropriate distribution
Open the file H1_data.xls and move to the sheet named Mothers. This data concerns a
subset of data from a small survey carried out in one region of a southern African country,
to collect information on mothers who have had a child in the previous 12 months. A
description of the data appears on page 8.
Look at the data set and discuss in small groups of 3 or 4 persons, which variables could be
identified as being random variables having binomial, Poisson or Normal distributions. If
you have time, you could produce frequency tables or graphs of these variables to check
the extent to which your answers are appropriate.
Note down any comments you have below.
SADC Course in Statistics
Module H1 Practical 10 – Page 7
Module H1 Practical 10
Listing of data from survey of mothers
-----------------------------------------------------------------------------storage display
value
variable name
type
format
label
variable label
-----------------------------------------------------------------------------area
byte
%8.0g
arealab
Area of dwelling (1=Rural,
2=Urban)
pov
byte
%9.0g
poorlab
Level of poverty in HH (1=not
poor,2=poor,3=very poor)
age
byte
%8.0g
Age of mother(years)
length
byte
%8.0g
Length of pregnancy period
(approx.weeks)
labour
byte
%8.0g
Length of labour(hours)
prebirths
byte
%8.0g
Number of previous live births
hlthcare
byte
%8.0g
yesno
Had access to health advice
during pregnancy
hhead
byte
%8.0g
yesno
Is mother head of household
(1=Yes,2=No)
sex
byte
%8.0g
sexlab
Sex of child(1=F,2=M)
weight
float %9.0g
Birth weight of child (kg)
------------------------------------------------------------------------------
Note: These data are come from a survey of many years ago and have been modified to
suit the case study work here. It should not be regarded as showing real usable results, but
for the purpose of the exercise, you should assume the data are real.
SADC Course in Statistics
Module H1 Practical 10 – Page 8
Download