Answers for test 1

advertisement
1
NAME
____________________
STUDENT #____________________
SC/BIOL 2060 3.0 STATISTICS FOR BIOLOGISTS
TEST #1
Oct 10, 2014
(TOTAL PAGES = 7)
(TOTAL MARKS = 45 )
(TIME = 60 MINS)
INSTRUCTIONS:
1) WRITE YOUR NAME AND STUDENT NUMBER ON THE TEST
PAPER
2) PLEASE KEEP YOUR TEST PAPER TO YOURSELF!
3) ANSWER QUESTIONS DIRECTLY ON THE
TEST PAPER USING THE SPACE PROVIDED
4) READ QUESTIONS CAREFULLY, AND THINK CAREFULLY
BEFORE ANSWERING, PROVIDING THE BEST ANSWER.
Be concise. Do not pad your answers with extra
words or marks may be deducted.
5) YOU MAY USE A non-programmable CALCULATOR.
You cannot use any statistical functions on your
Calculator.
6) BUDGET YOUR TIME APPROPRIATELY.
2
1. For each of the following, state the type of variable that has been explored in the
experiment. Where more than one variable has been studied, indicate which if any, is the
response versus the explanatory variable. Provide only short answers, full sentences are
not necessary. (9 marks)
(Example answers: Variable femur length – numerical-continuous, response variable
Variable eye colour – categorical-nominal, explanatory variable).
a) You sample randomly 50 earthworms from the soil and measure their lengths in mm.
Earthworm length: numerical ½ continuous ½
b) You count the numbers of spines on 10 randomly sampled porcupines.
Number of spines (or quills): numerical ½ discrete ½
c) You wish to determine whether wearing certain colours of clothing causes children to
be bullied at recess. You randomly sample 400 children recording the colour of their
shirts and the size of the largest bruise on their right leg (in mm).
clothing colour: categorical nominal ½ explanatory ½
bruise size: numerical continuous ½ response ½
d) You wish to determine whether the gene, alpha-dioxygenase, has a role in defense in
plants. You expose 10 randomly selected plants to a grasshopper which eats parts of their
leaves, and the other 10 plants serve as controls that are not eaten. You measure the
amount of alpha-dioxygenase (in picograms) produced by each plant.
Eaten or not: categorical nominal ½ explanatory ½
Amount of alpha-dioxygenase: numerical continuous ½ response ½
e) You wish to explore various factors that might enhance the rate of growth of captive
salmon. You randomly assign 10 salmon to each of the following treatment combinations
10 salmon receive antibiotic, and are raised at 10 degrees Celsius
10 salmon receive no antibiotic, and are raised at 10 degrees Celsius
10 salmon receive antibiotic, and are raised at 20 degrees Celsius
10 salmon receive no antibiotic, and are raised at 20 degrees Celsius.
After 12 months, you measure the growth rate of salmon in grams/month
Growth rate: numerical continuous ½ response ½
Antibiotic treatment: categorical nominal ½ explanatory ½
Temperature: categorical nominal ½ (or discrete ok) explanatory ½.
3
2. Estimate the mean, median, 1st and 3rd quartiles and standard error of the mean for the
following data sets. Identify all extreme values, if there are any:
(8 marks)
a) Data: 10, 8, 19, 1, 5, 7 sorted 1, 5, 7, 8, 10, 19
Mean =__8.3___
n=6
½
Median =___7.5__ given by (7 + 8)/2
½
1st Quartile =__5__ (middle number of 1 5 7) ½
3rd Quartile =__10___ (middle number of 8 10 19) ½
Standard error of mean =_2.5_1_
(note variance is 36.67, s = 6.06)
List extreme values, if any, stating in 1 short sentence why they are extreme
There is one extreme value, 19 ½. Since 1.5 x IQR = 1.5 x (10 – 5) = 7.5. Extreme values
would have to exceed 7.5+10 = 17.5 or be less than 5 – 7.5 = -2.5. ½
b) Data: 1.6, -2.1, 1.1, 1.5, 1.3 sorted -2.1, 1.1, 1.3, 1.5, 1.6
Mean =__0.68_ ½
Median =__1.3_½ It’s just the middle number since n is odd (n = 5)
1st Quartile =_-0.5½ __ given by (-2.1 + 1.1)/2
3rd Quartile =_1.55½ __ given by (1.5+1.6)/2
Standard error of mean =_0.70 1__ Variance = 2.452, s = 1.566
List extreme values, if any, stating in 1 short sentence why they are extreme
There are no extreme values½. Since 1.5 x IQR = 1.5 x (1.55– -0.5) = 3.075. Extreme
values would have to exceed 1.55+3.075 = 4.625 or be less than -.5 – 3.075 = -3.575. ½
________________________________________________________________________
________________________________________________________________________
4
3. A biologist obtains a random sample of passion flower plants and measures the
diameter of one flower on each plant (in cm). The data are presented below in a
frequency table. Estimate the mean and standard error of the mean from these data (3
marks).
Frequency Flower diameter
110
3.0
60
5.0
20
6.0
10
8.0__________
N = 110 + 60 + 20 + 10 = 200
Sum of X = 110 x 3 + 50 x 5 + 20 x 6 + 10 x 8 = 830
SumXsquared = 110 x 32 + 50 x 52 + 20 x 62 + 10 x 82 = 3850
Mean = 830 / 200 = 4.15
Variance, s2 = {3850 – 8302/200}/ (200-1) = 2.0377
s = 1.427
Mean = 4.15 1
SEmean = 1.427/2000.5 = 0.10
2
4. A biologist obtains a random sample of n = 9 aardvarks and finds that the standard
error of the mean of their mass in kilograms is 10. Estimate the variance of aardvark mass
in the original population? (2 marks).
Standard error of the mean is given by s/√𝑛= 10 so squaring both sides, this gives
s2/n = 100, where n = 9. Solving for s2 , we get s2= 100 x 9 = 900. So the variance of in
the original population is 900. 2
5. If you roll three dice, what is the probability that all three dice show the number 2, or
all three show the number 5? (2 marks)
Assuming independence
Pr(all three are 2) = Pr(2) x Pr(2) xPr(2) = 1/6x1/6x1/6= 1/216=0.00463 ½
Pr(all three are 5) = Pr(5) x Pr(5) xPr(5) = 1/6x1/6x1/6= 1/216=0.00463 ½
So prob all show 2 or all show 5 is sum: Pr(all three are 2)+ Pr(all three are 5)
= 1/216+1/216 = 1/108 or 0.00926 1 (result as a fraction is fine too)
5
6. You are told that cheetahs have a running speed of 100 km/hour. You are skeptical of
this claim so you obtain the running speeds of 5 randomly sampled cheetahs in km/hour
(below). Estimate the approximate 95% confidence interval for running speed.
Does the 95% confidence interval support the claim that the running speed is 100
km/hour? Explain briefly in one sentence why it does or does not? (3marks)
Running speed data in km/hour: 95, 105, 93, 100, 90
Mean = 96.6 ½
Standard Error mean = 2.657 ½
Upper 95% Confidence limit = 96.6 + 2 x 2.657 = 101.9 ½
Lower 95% Confidence limit = 96.6 - 2 x 2.657 = 91.3 ½
We are 95% certain that the true mean speed falls between 101.9 and 91.3 so we have no
reason to doubt cheetahs run 100 km/hour 1
7. Remarkably, normal healthy couples have only a 25% chance (0.25 probability) of
pregnancy in any given month. What is the probability that after 10 months, a couple still
has not achieved a pregnancy? (2 marks)
Probably of not being pregnant in a particular month is 1- 0.25 = 0.75 1
Assuming independence probability of no pregnancy after 10months is:
0.7510 = 0.056 1 (or a 5.6% chance of not being pregnant after 10 months)
8. The proportion of smokers in Canada is 0.20 (or 20%) while 0.05 (5%) of Canadians
have heart disease. From this information predict the proportion of Canadians who both
smoke and have heart disease. What assumption did you make in your prediction? If
smoking tends to cause heart disease, is your estimate too low or too high?
( 2 marks)
Assuming smoking and heart disease are independent ½
Probability of smoking and having heart disease is: 0.2 x 0.05 = 0.01 1
If smoking causes heart disease our estimate is too low. ½
9. Describe briefly in words how you would generate the sampling distribution of the
median? (2 marks)
Take a random sample of fixed size n from a population, and estimate the median. 1
Repeat the sampling process above infinitely many times and plot a histogram of these
medians 1
6
10. For each of the following, state the null (Ho) and alternative (Ha) hypotheses clearly
indicating if Ha is one-sided or two-sided. (6 marks)
a) You wish to determine whether the consumption of sugary drinks when young causes
type II diabetes. Two hundred baby rats consume excess sugar-water in their diet when
young, while 200 are given plain water with the same diet as above. You count numbers
of rats with or without diabetes when they reach adulthood.
Ho: proportion sugar-eating diabetic rats = proportion non-sugar-eating diabetic rats 1
Ha: proportion sugar-eating diabetic rats > proportion non-sugar-eating diabetic rats
1-sided 1
b. You believe that oregano oil eliminates the fungus from infected toenails. You obtain a
random sample of 40 people with infected toenails and 20 people apply oregano oil to
their toenails while 20 others apply olive oil (as a control). After 6 months you count the
numbers in treatment group that have a fungal infection.
Ho: prop. fungus infected with oregano treat = prop. Fungus infected without oregano 1
Ha: prop. fungus infected with oregano treat < prop. Fungus infected without oregano,
1-sided 1
c. A biologist believes that addition of phosphorus to a lake will change the abundance of
daphnia (a small crustacean) in the lake. You add phosphorus to 10 randomly selected
lakes, and randomly choose 10 other lakes as controls (that don’t receive added
phosphorus). You measure the abundance of daphnia in each lake 6 months after the
phosphorus additions.
Ho: mean abundance with phosphorus = mean abundance without phosophorus 1
Ha: mean abundance with phosphorus ≠ mean abundance without phosophorus,
2-sided 1
7
11) Write a single SAS program to determine the median, 1st and 3rd quartiles, and
coefficient of variation for each of the two species of turtles below. The data are as
follows: You measure the shell length of two species of turtles randomly sampling
individuals from a pond. The data are in centimeters but you should have SAS take the
natural log of each measurement in your program. Enter the data for each turtle species in
the order given below. Note that the first number in each row indicates the species of
turtle where 1 indicates snapping turtle, while a 2 indicates a musk turtle. The second
number in each row is the shell length in cm. Have your SAS program convert the
species identifier from a number (1 versus 2) to the species names (snapping vs musk).
(6 marks)
DATA TURTLES; ½
INPUT SPECIES $ SHELL; ½ (note you can have different names for variables)
LNSHELL = LOG(SHELL); 1
IF SPECIES = ‘1’ THEN SPNAME = ‘SNAPPING’; ½
IF SPECIES = ‘2’ THEN SPNAME = ‘MUSK’; ½ single quotes are vital
CARDS; (OR DATALINES;) ½
1 34
2 45
1 36
2 54
1 41
1 54
2 60
½ for rewriting all the data
PROC SORT; ½
BY SPNAME; ½ (must sort by either spname or species)
PROC UNIVARIATE; ½
BY SPNAME;
½ (by must be the same as used in proc sort)
RUN;
Download