1 NAME ____________________ STUDENT #____________________ SC/BIOL 2060 3.0 STATISTICS FOR BIOLOGISTS TEST #1 Oct 10, 2014 (TOTAL PAGES = 7) (TOTAL MARKS = 45 ) (TIME = 60 MINS) INSTRUCTIONS: 1) WRITE YOUR NAME AND STUDENT NUMBER ON THE TEST PAPER 2) PLEASE KEEP YOUR TEST PAPER TO YOURSELF! 3) ANSWER QUESTIONS DIRECTLY ON THE TEST PAPER USING THE SPACE PROVIDED 4) READ QUESTIONS CAREFULLY, AND THINK CAREFULLY BEFORE ANSWERING, PROVIDING THE BEST ANSWER. Be concise. Do not pad your answers with extra words or marks may be deducted. 5) YOU MAY USE A non-programmable CALCULATOR. You cannot use any statistical functions on your Calculator. 6) BUDGET YOUR TIME APPROPRIATELY. 2 1. For each of the following, state the type of variable that has been explored in the experiment. Where more than one variable has been studied, indicate which if any, is the response versus the explanatory variable. Provide only short answers, full sentences are not necessary. (9 marks) (Example answers: Variable femur length – numerical-continuous, response variable Variable eye colour – categorical-nominal, explanatory variable). a) You sample randomly 50 earthworms from the soil and measure their lengths in mm. Earthworm length: numerical ½ continuous ½ b) You count the numbers of spines on 10 randomly sampled porcupines. Number of spines (or quills): numerical ½ discrete ½ c) You wish to determine whether wearing certain colours of clothing causes children to be bullied at recess. You randomly sample 400 children recording the colour of their shirts and the size of the largest bruise on their right leg (in mm). clothing colour: categorical nominal ½ explanatory ½ bruise size: numerical continuous ½ response ½ d) You wish to determine whether the gene, alpha-dioxygenase, has a role in defense in plants. You expose 10 randomly selected plants to a grasshopper which eats parts of their leaves, and the other 10 plants serve as controls that are not eaten. You measure the amount of alpha-dioxygenase (in picograms) produced by each plant. Eaten or not: categorical nominal ½ explanatory ½ Amount of alpha-dioxygenase: numerical continuous ½ response ½ e) You wish to explore various factors that might enhance the rate of growth of captive salmon. You randomly assign 10 salmon to each of the following treatment combinations 10 salmon receive antibiotic, and are raised at 10 degrees Celsius 10 salmon receive no antibiotic, and are raised at 10 degrees Celsius 10 salmon receive antibiotic, and are raised at 20 degrees Celsius 10 salmon receive no antibiotic, and are raised at 20 degrees Celsius. After 12 months, you measure the growth rate of salmon in grams/month Growth rate: numerical continuous ½ response ½ Antibiotic treatment: categorical nominal ½ explanatory ½ Temperature: categorical nominal ½ (or discrete ok) explanatory ½. 3 2. Estimate the mean, median, 1st and 3rd quartiles and standard error of the mean for the following data sets. Identify all extreme values, if there are any: (8 marks) a) Data: 10, 8, 19, 1, 5, 7 sorted 1, 5, 7, 8, 10, 19 Mean =__8.3___ n=6 ½ Median =___7.5__ given by (7 + 8)/2 ½ 1st Quartile =__5__ (middle number of 1 5 7) ½ 3rd Quartile =__10___ (middle number of 8 10 19) ½ Standard error of mean =_2.5_1_ (note variance is 36.67, s = 6.06) List extreme values, if any, stating in 1 short sentence why they are extreme There is one extreme value, 19 ½. Since 1.5 x IQR = 1.5 x (10 – 5) = 7.5. Extreme values would have to exceed 7.5+10 = 17.5 or be less than 5 – 7.5 = -2.5. ½ b) Data: 1.6, -2.1, 1.1, 1.5, 1.3 sorted -2.1, 1.1, 1.3, 1.5, 1.6 Mean =__0.68_ ½ Median =__1.3_½ It’s just the middle number since n is odd (n = 5) 1st Quartile =_-0.5½ __ given by (-2.1 + 1.1)/2 3rd Quartile =_1.55½ __ given by (1.5+1.6)/2 Standard error of mean =_0.70 1__ Variance = 2.452, s = 1.566 List extreme values, if any, stating in 1 short sentence why they are extreme There are no extreme values½. Since 1.5 x IQR = 1.5 x (1.55– -0.5) = 3.075. Extreme values would have to exceed 1.55+3.075 = 4.625 or be less than -.5 – 3.075 = -3.575. ½ ________________________________________________________________________ ________________________________________________________________________ 4 3. A biologist obtains a random sample of passion flower plants and measures the diameter of one flower on each plant (in cm). The data are presented below in a frequency table. Estimate the mean and standard error of the mean from these data (3 marks). Frequency Flower diameter 110 3.0 60 5.0 20 6.0 10 8.0__________ N = 110 + 60 + 20 + 10 = 200 Sum of X = 110 x 3 + 50 x 5 + 20 x 6 + 10 x 8 = 830 SumXsquared = 110 x 32 + 50 x 52 + 20 x 62 + 10 x 82 = 3850 Mean = 830 / 200 = 4.15 Variance, s2 = {3850 – 8302/200}/ (200-1) = 2.0377 s = 1.427 Mean = 4.15 1 SEmean = 1.427/2000.5 = 0.10 2 4. A biologist obtains a random sample of n = 9 aardvarks and finds that the standard error of the mean of their mass in kilograms is 10. Estimate the variance of aardvark mass in the original population? (2 marks). Standard error of the mean is given by s/√𝑛= 10 so squaring both sides, this gives s2/n = 100, where n = 9. Solving for s2 , we get s2= 100 x 9 = 900. So the variance of in the original population is 900. 2 5. If you roll three dice, what is the probability that all three dice show the number 2, or all three show the number 5? (2 marks) Assuming independence Pr(all three are 2) = Pr(2) x Pr(2) xPr(2) = 1/6x1/6x1/6= 1/216=0.00463 ½ Pr(all three are 5) = Pr(5) x Pr(5) xPr(5) = 1/6x1/6x1/6= 1/216=0.00463 ½ So prob all show 2 or all show 5 is sum: Pr(all three are 2)+ Pr(all three are 5) = 1/216+1/216 = 1/108 or 0.00926 1 (result as a fraction is fine too) 5 6. You are told that cheetahs have a running speed of 100 km/hour. You are skeptical of this claim so you obtain the running speeds of 5 randomly sampled cheetahs in km/hour (below). Estimate the approximate 95% confidence interval for running speed. Does the 95% confidence interval support the claim that the running speed is 100 km/hour? Explain briefly in one sentence why it does or does not? (3marks) Running speed data in km/hour: 95, 105, 93, 100, 90 Mean = 96.6 ½ Standard Error mean = 2.657 ½ Upper 95% Confidence limit = 96.6 + 2 x 2.657 = 101.9 ½ Lower 95% Confidence limit = 96.6 - 2 x 2.657 = 91.3 ½ We are 95% certain that the true mean speed falls between 101.9 and 91.3 so we have no reason to doubt cheetahs run 100 km/hour 1 7. Remarkably, normal healthy couples have only a 25% chance (0.25 probability) of pregnancy in any given month. What is the probability that after 10 months, a couple still has not achieved a pregnancy? (2 marks) Probably of not being pregnant in a particular month is 1- 0.25 = 0.75 1 Assuming independence probability of no pregnancy after 10months is: 0.7510 = 0.056 1 (or a 5.6% chance of not being pregnant after 10 months) 8. The proportion of smokers in Canada is 0.20 (or 20%) while 0.05 (5%) of Canadians have heart disease. From this information predict the proportion of Canadians who both smoke and have heart disease. What assumption did you make in your prediction? If smoking tends to cause heart disease, is your estimate too low or too high? ( 2 marks) Assuming smoking and heart disease are independent ½ Probability of smoking and having heart disease is: 0.2 x 0.05 = 0.01 1 If smoking causes heart disease our estimate is too low. ½ 9. Describe briefly in words how you would generate the sampling distribution of the median? (2 marks) Take a random sample of fixed size n from a population, and estimate the median. 1 Repeat the sampling process above infinitely many times and plot a histogram of these medians 1 6 10. For each of the following, state the null (Ho) and alternative (Ha) hypotheses clearly indicating if Ha is one-sided or two-sided. (6 marks) a) You wish to determine whether the consumption of sugary drinks when young causes type II diabetes. Two hundred baby rats consume excess sugar-water in their diet when young, while 200 are given plain water with the same diet as above. You count numbers of rats with or without diabetes when they reach adulthood. Ho: proportion sugar-eating diabetic rats = proportion non-sugar-eating diabetic rats 1 Ha: proportion sugar-eating diabetic rats > proportion non-sugar-eating diabetic rats 1-sided 1 b. You believe that oregano oil eliminates the fungus from infected toenails. You obtain a random sample of 40 people with infected toenails and 20 people apply oregano oil to their toenails while 20 others apply olive oil (as a control). After 6 months you count the numbers in treatment group that have a fungal infection. Ho: prop. fungus infected with oregano treat = prop. Fungus infected without oregano 1 Ha: prop. fungus infected with oregano treat < prop. Fungus infected without oregano, 1-sided 1 c. A biologist believes that addition of phosphorus to a lake will change the abundance of daphnia (a small crustacean) in the lake. You add phosphorus to 10 randomly selected lakes, and randomly choose 10 other lakes as controls (that don’t receive added phosphorus). You measure the abundance of daphnia in each lake 6 months after the phosphorus additions. Ho: mean abundance with phosphorus = mean abundance without phosophorus 1 Ha: mean abundance with phosphorus ≠ mean abundance without phosophorus, 2-sided 1 7 11) Write a single SAS program to determine the median, 1st and 3rd quartiles, and coefficient of variation for each of the two species of turtles below. The data are as follows: You measure the shell length of two species of turtles randomly sampling individuals from a pond. The data are in centimeters but you should have SAS take the natural log of each measurement in your program. Enter the data for each turtle species in the order given below. Note that the first number in each row indicates the species of turtle where 1 indicates snapping turtle, while a 2 indicates a musk turtle. The second number in each row is the shell length in cm. Have your SAS program convert the species identifier from a number (1 versus 2) to the species names (snapping vs musk). (6 marks) DATA TURTLES; ½ INPUT SPECIES $ SHELL; ½ (note you can have different names for variables) LNSHELL = LOG(SHELL); 1 IF SPECIES = ‘1’ THEN SPNAME = ‘SNAPPING’; ½ IF SPECIES = ‘2’ THEN SPNAME = ‘MUSK’; ½ single quotes are vital CARDS; (OR DATALINES;) ½ 1 34 2 45 1 36 2 54 1 41 1 54 2 60 ½ for rewriting all the data PROC SORT; ½ BY SPNAME; ½ (must sort by either spname or species) PROC UNIVARIATE; ½ BY SPNAME; ½ (by must be the same as used in proc sort) RUN;