Math 58B - Introduction to Biostatistics Spring 2015 Jo Hardin Lab Assignment 1 Lab Goals: There are two goals for this lab. 1. To familiarize yourself with using R. 2. To understand how to compute binomial probabilites using R. Data from class: http://pages.pomona.edu/~jsh04747/courses/math58/TimFace.txt TimFace <read.delim("http://pages.pomona.edu/~jsh04747/courses/math58/TimFace.txt") TimFace ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 TimLft TimRt 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0 0 1 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 0 1 1 0 0 1 0 1 attach(TimFace) load(url("http://www.rossmanchance.com/iscam2/ISCAM.RData")) library(mosaic) In class During the lab, go through sections (a) through (n) of Investigation 1.2. Make sure you know how to create a barplot (also, try using bargraph from mosaic!) and use the Binomial probability calculator. (See page 19 in your book.) Test the binomial probability calculators in two different ways. Using ISCAM binomial probabilities: iscambinomprob(k=7, n=24, prob=.5, lower.tail=FALSE) ## Probability 7 and above = 0.9886721 ## [1] 0.9886721 iscambinomprob(k=7, n=24, prob=.5, lower.tail=TRUE) ## Probability 7 and below = 0.03195733 ## [1] 0.03195733 iscambinomprob(k=7, n=24, prob=.25, lower.tail=FALSE) ## Probability 7 and above = 0.3925877 ## [1] 0.3925877 iscambinomprob(k=18, n=24, prob=.5, lower.tail=FALSE) ## Probability 18 and above = 0.01132792 ## [1] 0.01132792 Using mosaic binomial probabilities. Note the pbinom ALWAYS finds probabilities to the left (i.e., the lower tail). The iscambinomprob function has an option to find probabilities to the left or right. # note X plotDist("binom", params=list(24,.30), groups= x < 3) # note Y plotDist( "binom", params=list(24,.25), groups= y < .05 ) pbinom(q = 7, size = 24, prob = 0.5) ## [1] 0.03195733 pbinom(q = 7, size = 24, prob = 0.25) ## [1] 0.7662042 pbinom(q = 18, size = 24, prob = 0.5) ## [1] 0.9966946 [This part is just to play around with. Nothing to turn in until below that says "to turn in".] Now let's investigate what happens as n = the number of trials (here a trial is a student choosing a face) gets bigger. Let 𝑝^ = # successes / # trials. For each of n = 24, 240, 2400, find 1. 2. P( 𝑝^ = 0.5) P( 0.45 ≤ 𝑝^ ≤ 0.55 ) Also, be sure you know how to label your plots (xlab for the label on the x -axis, ylab for the label on the y -axis). summary(TimFace) ## ## ## ## ## ## ## TimLft Min. :0.0000 1st Qu.:0.0000 Median :1.0000 Mean :0.5417 3rd Qu.:1.0000 Max. :1.0000 TimRt Min. :0.0000 1st Qu.:0.0000 Median :0.0000 Mean :0.4583 3rd Qu.:1.0000 Max. :1.0000 table(TimFace) ## TimRt ## TimLft 0 1 ## 0 0 11 ## 1 13 0 barplot(table(TimFace), xlab="side", ylab="frequency", names.arg=c("TimLft", "TimRt")) To turn in Turn in answers to the following questions in the setting of Practice Problem 1.2 a. State the null and alternative hypotheses for testing whether Marine is more likely to pick the cancer patient than if he was just randomly guessing between the five patients. b. Assume Marine was correct in 30 of the 33 attempts. Determine the p-value using BOTH the function iscambinomtest and pbinom (you should get the same answer!) for Marine and provide a detailed interpretation of the p-value you find. (Your answer should start "This p-value represents the probability that ... given that ..."). c. Consider a situation where you have one third the amount of data (10 correct out of 11 attempts). Determine the p-value for the new setting. Again, give a detailed interpretation of the p-value you find. d. Using R, make a barplot (= bargraph) of your data. Remember to label your axes appropriately. Can you make a conclusion (about your hypotheses) based on the barplot only? Why or why not? (Note that in this setting we are not testing 𝜋 = 0.5.) e. Summarize the conclusions you would draw from this study. Do you think Marine got lucky or do you think something other than random chance was at play? How strong is the evidence? f. Why does the same proportion of correct identifications give such different evidence against the null when the sample size changes from 33 to 11? (i.e., why is the p-value much smaller when you have 33 trials?) Notes on write-ups You are welcome to answer questions in the enumerated order above. However, please use complete sentences and complete explanations. Single numbers will never get full credit.