1180:Lab9 1 Diagnosing Bloaty Head James Moore

advertisement
1180:Lab9
James Moore
March 26th, 2012
1
Diagnosing Bloaty Head
We’ve already discussed how to make differential diagnosis using Bayes theorem. In the original
example, all tests could only return two results:positive or negative.
Let’s look at the key diagnostic tool for analyzing bloaty head: head circumference. The mean
head circumference in healthy individuals is 55cm versus 56.6cm in BH patients. Both populations
are normally distributed with a standard deviation of 1.2cm. Suppose 75% of individuals are normal
and 25% have BH. We can generate a joint probability distribution to represent these facts. One
random variable represents whether or not a patient has BH and is either 0 or 1. The other random
variable is head circumeference. We can’t list the probability for every value of head circumference
because it is a continuous random variable. Instead we will plot the probability density.
First, plot the distribution for the ‘normal’ group. Note that we multiply by .75 to reflect that
75% of the population is normal.
head_circlist=seq(51,61,.1)
plot(head_circlist,.75*dnorm(head_circlist,mean=55,sd=1.2),
xlab=’Head Circumference’,type=’l’)
Then we add on the distribution for the ‘Bloaty Head’ group.
lines(head_circlist,.25*dnorm(head_circlist,mean=56.6,sd=1.2))
Save This Plot Next, we calculate the marginal distributions for head circumference. We do this
simply by adding together the distributions from the two groups
headcirc_mpf=.75*dnorm(head_circlist,mean=55,sd=1.2)
+.25*dnorm(head_circlist,mean=56.6,sd=1.2)
Now we can use this marginal distribution to calculate the conditional probability that someone
has bloaty head. Note that the formula below is exactly Bayes’ theorem. It’s also the same taking
a single column (or row) and divide by the total.
1
plot(head_circlist,.25*dnorm(head_circlist,mean=56.6,sd=1.2)/headcirc_mpf,
xlab="Head Circumference",ylab="Probability of Bloaty Head")
We want to know when we are 95% sure that someone has bloaty head. We put a line at .95 to
help read the graph.
lines(head_circlist,.95+0*headcirclist)
Save this plot and record what circumference is needed to be 95% sure We can use the
pnorm function (which is just the cdf) to figure out what percentage of BH positive people have
an undectable disease. Suppose we determine the threshold to be 58. It’s not that but suppose
that it were. Evaluates the cumulative distribution function at 58. In other words, it tells us what
percentage of BH patients have a head circumference below 58.
pnorm(58,mean=56.6,sd=1.2)
[1] 0.8783275
Record what percentage of BH positive people won’t be detectable with 95% confidence
2
Adding a test for Height
The test in the previous section kind of sucked. This is because of the large amount of variability
in head circumference. Much of this variability can actually be explained by height. As we don’t
yet have much (any) experience with multivariate statistics, we’ll do all of this with simulation.
First, we assign patients to be either BH positive or not. Their heights are then drawn from a
predetermined distribution. The head circumference is based both on height and whether it is
bloated. There is also some randomness in that measurement. We collect all variables into a data
frame for convenience.
N=10000
BH_pos=rbinom(N,size=1,prob=.25)
height=rnorm(N,mean=165,sd=12)
hcirc=rnorm(N,mean=.1*height+38.5+1.6*BH_pos,sd=.3)
Patients=data.frame(BH_pos,height,hcirc)
We have used the summary command before. Now, we can see that this actually gives us an
estimate of the marginal probability distributions for all of our random variables.
> summary(Patients)
BH_pos
height
Min.
:0.0000
Min.
:122.2
hcirc
Min.
:50.61
2
1st Qu.:0.0000
Median :0.0000
Mean
:0.2484
3rd Qu.:0.0000
Max.
:1.0000
1st Qu.:156.9
Median :165.1
Mean
:165.1
3rd Qu.:173.4
Max.
:222.9
1st Qu.:54.43
Median :55.38
Mean
:55.41
3rd Qu.:56.36
Max.
:61.20
This is not very useful by itself, but it’s good practice to always do this with a data frame. In
addition, we should always take a look at what we’ve actually computed with the head command.
head(Patients)
BH_pos
height
1
0 157.3137
2
1 175.2231
3
1 146.6770
4
0 162.3073
5
0 151.8542
6
1 165.7490
hcirc
54.70650
57.99186
54.41488
55.03319
53.38065
57.28193
Another nice thing about data frames is that R can ’intuitively understand’ them. So we can give
the basic plot command, with only one extra option to make the plotted points a bit smaller.
plot(Patients,pch=’.’)
Save the resulting plot. Explain what is plotted in each box. Answer this question
later if it’s unclear immediately. Althought this plot is a good start, we’d like to visualize the
relationship between all three variables. First we’ll plot the joint distribution of height and head
circumference in the normies.
plot(height~hcirc,Patients,subset=BH_pos==0,pch=’.’)
Add the joint distribution of bloaty headed patients to this plot as red dots. Then
save the plot
3
More Questions
1. Create a histogram to show the marginal distributions of height and head circumference. Save
both plots. (Hint:use the function hist)
2. To make a diagnosis, which conditional probability distribution will we need to calculate?
3. Use the cov and cor R functions on your data frame. Record the results. Analyze what each
result in the corr table means. Why is the table symmetric?
4. How does the addition of the height measurement help? Justify your answer using the joint
probability distributions with and without the extra measurement.
3
Download