Lab Assignment 1

advertisement
Math 58B - Introduction to Biostatistics
Spring 2015
Jo Hardin
Lab Assignment 1
Lab Goals:
There are two goals for this lab.
1.
To familiarize yourself with using R.
2.
To understand how to compute binomial probabilites using R.
Data from class:
http://pages.pomona.edu/~jsh04747/courses/math58/TimFace.txt
TimFace <read.delim("http://pages.pomona.edu/~jsh04747/courses/math58/TimFace.txt")
TimFace
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
TimLft TimRt
1
0
1
0
1
0
1
0
0
1
0
1
0
1
0
1
1
0
1
0
0
1
0
1
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
0
1
1
0
0
1
0
1
attach(TimFace)
load(url("http://www.rossmanchance.com/iscam2/ISCAM.RData"))
library(mosaic)
In class
During the lab, go through sections (a) through (n) of Investigation 1.2. Make sure you
know how to create a barplot (also, try using bargraph from mosaic!) and use the Binomial
probability calculator. (See page 19 in your book.)
Test the binomial probability calculators in two different ways.
Using ISCAM binomial probabilities:
iscambinomprob(k=7, n=24, prob=.5, lower.tail=FALSE)
## Probability 7 and above = 0.9886721
## [1] 0.9886721
iscambinomprob(k=7, n=24, prob=.5, lower.tail=TRUE)
## Probability 7 and below = 0.03195733
## [1] 0.03195733
iscambinomprob(k=7, n=24, prob=.25, lower.tail=FALSE)
## Probability 7 and above = 0.3925877
## [1] 0.3925877
iscambinomprob(k=18, n=24, prob=.5, lower.tail=FALSE)
## Probability 18 and above = 0.01132792
## [1] 0.01132792
Using mosaic binomial probabilities. Note the pbinom ALWAYS finds probabilities to the
left (i.e., the lower tail). The iscambinomprob function has an option to find probabilities to
the left or right.
# note X
plotDist("binom", params=list(24,.30), groups= x < 3)
# note Y
plotDist( "binom", params=list(24,.25), groups= y < .05 )
pbinom(q = 7, size = 24, prob = 0.5)
## [1] 0.03195733
pbinom(q = 7, size = 24, prob = 0.25)
## [1] 0.7662042
pbinom(q = 18, size = 24, prob = 0.5)
## [1] 0.9966946
[This part is just to play around with. Nothing to turn in until below that says "to turn in".]
Now let's investigate what happens as n = the number of trials (here a trial is a student
choosing a face) gets bigger. Let 𝑝^ = # successes / # trials. For each of n = 24, 240, 2400,
find
1.
2.
P( 𝑝^ = 0.5)
P( 0.45 ≤ 𝑝^ ≤ 0.55 )
Also, be sure you know how to label your plots (xlab for the label on the x -axis, ylab for the
label on the y -axis).
summary(TimFace)
##
##
##
##
##
##
##
TimLft
Min.
:0.0000
1st Qu.:0.0000
Median :1.0000
Mean
:0.5417
3rd Qu.:1.0000
Max.
:1.0000
TimRt
Min.
:0.0000
1st Qu.:0.0000
Median :0.0000
Mean
:0.4583
3rd Qu.:1.0000
Max.
:1.0000
table(TimFace)
##
TimRt
## TimLft 0 1
##
0 0 11
##
1 13 0
barplot(table(TimFace), xlab="side", ylab="frequency", names.arg=c("TimLft",
"TimRt"))
To turn in
Turn in answers to the following questions in the setting of Practice Problem 1.2
a.
State the null and alternative hypotheses for testing whether Marine is more likely to
pick the cancer patient than if he was just randomly guessing between the five
patients.
b.
Assume Marine was correct in 30 of the 33 attempts. Determine the p-value using
BOTH the function iscambinomtest and pbinom (you should get the same answer!)
for Marine and provide a detailed interpretation of the p-value you find. (Your answer
should start "This p-value represents the probability that ... given that ...").
c.
Consider a situation where you have one third the amount of data (10 correct out of 11
attempts). Determine the p-value for the new setting. Again, give a detailed
interpretation of the p-value you find.
d.
Using R, make a barplot (= bargraph) of your data. Remember to label your axes
appropriately. Can you make a conclusion (about your hypotheses) based on the
barplot only? Why or why not? (Note that in this setting we are not testing 𝜋 = 0.5.)
e.
Summarize the conclusions you would draw from this study. Do you think Marine got
lucky or do you think something other than random chance was at play? How strong is
the evidence?
f.
Why does the same proportion of correct identifications give such different evidence
against the null when the sample size changes from 33 to 11? (i.e., why is the p-value
much smaller when you have 33 trials?)
Notes on write-ups
You are welcome to answer questions in the enumerated order above. However, please use
complete sentences and complete explanations. Single numbers will never get full credit.
Download