Show all work for full credit. Ask the instructor... allowed one page of notes. Biometry 333

advertisement
Biometry 333
Final Exam
Spring 2005
Show all work for full credit. Ask the instructor if any question is unclear. You are
allowed one page of notes.
(1)(5pts) Which best describes a 95% confidence interval?
(i) An interval calculated from sampled data such that it contains 95% of the sampled
data.
(ii) An interval calculated from sampled data such that it contains 95% of the population
values.
(iii) An interval calculated from the data such that if the process were done repeatedly,
about 95% of the intervals would capture the parameter’s (population’s) true value.
(iv) An interval calculated from the data such that if the process were done repeatedly,
about 95% of them would reject the null hypothesis.
(2)(5pts) Which best describes the key part of the central limit theorem?
(i) As the sample size increases, the population becomes approximately normal.
(ii) As the sample size increases, the distribution of the sample means become more
normally distributed.
(iii) As the sample size increases, the distribution of the sample becomes approximately
normal.
(iv) As the sample size increases, the population variance decreases.
(3)(5pts) P-value is short for “probability value”. Which statement best describes what
probability the p-value is describing?
(i) It is the probability of the null hypothesis being true.
(ii) It is the probability of getting a test statistic as extreme or more extreme than the test
statistic obtained from the data if the null hypothesis were true.
(iii) It is the probability of getting the test statistic that we got from the data.
(iv) It is the probability of a type I error.
(4) Let X be a normal random variable with mean 50 and standard deviation 5. Suppose
1 16
16 independent values of X are averaged to get X   X i . Show all work or
16 i 1
describe Minitab/R commands used
(4a. 2pts) Find P(45  X  47.5)
(4b. 2pts) Find the 75th percentile of X; i.e., P( X  ?)  0.75 .
(4c. 2pts) Find P( X  47.5)
1
(5) Below is an investigation of the student data to see if gender and handedness (left or
right) are independent.
Chi-Square Test: left, right
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
female
male
Total
left
4
BBBB
0.174
right
29
xxxxx
0.031
Total
33
6
yyyy
0.169
AAA
28.93
0.030
34
10
57
67
Chi-Sq = 0.403, DF = 1, P-Value = ????
1 cells with expected counts less than 5.
(5a. 2pts) What was the observed number of right handed males (AAA)?
(5b. 2pts) What was the expected number of left handed females (BBBB)?
(5c. 2pts) Below “BBBB” is a 0.174 which is the chi-square contribution for that cell.
Show how 0.174 was calculated.
(5d. 2pts) Use the chi-square statistic of 0.403 to find the p-value (????).
(6. 6pts)Six right handed men lifted weights (pounds) with their left and right arms.
Perform a statistical test to see if the right arm is any stronger than the left arm. Assume
normality. (Hint: This is Biom109 review material.) Show all work or describe
Minitab/R commands. Give a test statistic, p-value, and conclusion as your answer.
left
38
55
37
25
70
61
right
41
63
41
25
68
71
2
(7) The following is a residual plot for the model y = b0 + b1 x.
The regression equation is y = - 6.24 + 10.9 x
Residuals Versus the Fitted Values
(response is y)
25
20
15
Residual
10
5
0
-5
-10
-15
0
10
20
Fitted Value
30
40
50
(7a. 4pts) What is the fitted value and squared residual for the (x,y) point (4.45, 63.11)?
(7b. 4pts) What model would you suggest be tried next? Why?
(7c. 4pts) Name 3 assumptions about the distribution of the errors (residuals?)
3
(8) The continuous variables x1, x2, and x3 were investigated for their association with
the y response variable. Use the following Minitab ANOVA table for the general linear
model y=b0 + b1x1 + b2x2 + b3x3 to answer the questions.
General Linear Model: y versus
Factor
Type
Levels
Values
Analysis of Variance for y, using Adjusted SS for Tests
Source
x1
x2
x3
Error
Total
DF
1
1
1
37
40
S = ZZZZZZ
Seq SS
112.89
19.77
705.17
1853.01
2690.84
Adj SS
229.83
4.49
705.17
1853.01
R-Sq = 31.14%
Adj MS
229.83
4.49
705.17
50.08
F
XXXX
-------
P
YYYY
-------
R-Sq(adj) = 25.55%
(8a. 8pts) Use the above table to calculate the residual sum of squares for the following
four models. Show your work or explain what part of the table gave you the result.
(Hint, this is a reverse of the quiz problem and use the sequential sum of squares.)
Model
y = b0
y = b0 + b1x1
y = b0 + b1x1 + b2x2
y = b0 + b1x1 + b2x2 + b3x3
residual SS
?=
?=
?=
?=
(8b. 4pts) Calculate the F-statistic (XXXX) and p-value (YYYY) for the x1 variable.
Use the adjusted sum of squares information. Show your work and explain commands
gave you the p-value result.
XXXX=
YYYY=
(8c. 4pts) Which variable is responsible for the adjusted R-square being smaller than the
R-square statistic? Explain why you chose that variable.
(8d. 4pts) Estimate the standard deviation for the noise term. ZZZZZZ=?
4
(9)The student data (from Mark Rizzardi’s data web page) were analyzed to explain
height using sex and shoe size. Shoe size was treated as a continuous variable and an
interaction was included in the model.
General Linear Model: height versus sex
Factor
sex
Type
fixed
Levels
2
Values
female, male
Analysis of Variance for height, using Sequential SS for Tests
Source
sex
shoe
sex*shoe
Error
Total
DF
1
1
1
65
68
Seq SS
508.55
281.66
11.85
312.54
1114.61
Adj SS
4.04
277.35
11.85
312.54
Seq MS
508.55
281.66
11.85
4.81
F
105.77
58.58
2.47
S = 2.19278
R-Sq = 71.96%
Term
Constant
sex
female
shoe
shoe*sex
female
Coef
55.441
SE Coef
1.722
T
32.19
P
0.000
1.578
1.3633
1.722
0.1795
0.92
XXXX
0.363
YYYY
-0.2818
0.1795
-1.57
0.121
P
0.000
0.000
0.121
R-Sq(adj) = 70.67%
(9a. 4pts) Why would not be sensible to use the adjusted sum of squares to calculate the
F-statistics for the individual variables?
(9b. 4pts) Is the interaction between shoe size and sex statistically significant? Explain
how you reached your conclusion.
(9c. 4pts) Write the equation for the line which describes the expected height of a female
for a given shoe size. Show your work and use the equation form: y= a + bx.
(9d. 3pts) Write the equation for the line which describes the expected height of a male
for a given shoe size. Show your work and use the equation form: y= a + bx.
(9e. 4pts) Calculate the t-statistic and p-value for the shoe main effect variable.
XXXX=
YYYY=
5
(10. 5pts) Suppose you are going to model the thickness of fur (mm) of gray foxes using
the variables height (continuous), sex (2 level factor), and season (4 level factor). You
will not fit any interactions and you will assume normally distributed errors. Using
Greek letters (alphas, betas, gammas, epsilons, etc.) along with “fur” (response),
“height”, “sex”, and “season”, write a model describing fur length. Include an error
term. Use [ ] brackets for the factors. (Nothing tricky here! You should know how to do
this from seeing this done frequently on the white board to describe a model.)
(11) An experiment was carried out where 20 amateur male runners were randomly
selected from a list of 1000 runners who had registered to race in a 10 kilometer running
event. The 20 runners were broken down into two groups of 10. One group was given
VitaminX to take for the next year and the other group given VitaminZ. The runners then
were to record their times during six 10k races on a track over the course of a year. The
data were then analyzed to determine which Vitamin was better.
(11a. 2pts) What is the fixed factor, if any, for this experiment? Explain why it is fixed.
(11b. 3pts) What is the random factor, if any, for this experiment? Explain why it is
random.
(12. 4pts) Suppose your response variables are either proportions or yes/no outcomes.
You are interested in how some explanatory variables (x1, x2, etc) are associated with the
outcomes. Specifically suggest a type of regression that would be very appropriate for
such data.
6
Download