Exam 4/Final

advertisement
Dr. Kelly Bradley
Final Exam
Summer 2015
{2 points} Name
You MUST work alone – no tutors; no help from classmates. Email me or see me with
questions. You will receive a score of 0 if this rule is violated. Exam is scored out of 100 points.
EPE/EDP 660 Exam 4
{3 points} Minitab (or other approved software) output must be included. It must be clearly
labeled, with all answers clearly identified. In addition, you must include a copy of your session
window. Do NOT include a copy of the worksheet. Read each question before responding. In order
to receive partial credit, work must be shown.
PART A: (23 points) Fill in each blank on the answer sheet with the best choice. {1 point each blank}
(1) The two branches of statistics are
(2) The
between two variables.
(3)
and
.
measures the direction and strength of the linear association
is the idea that simpler models are easier to understand and appreciate,
and therefore have a "beauty" that their more complicated counterparts often lack.
(4) If H0 is true and we reject it, we have made a
error.
(5) In a(n)
design, the total sum of squares is made up of the treatment sum
of squares and the error sum of squares.
(6) Predicting y when the x values are outside the range of experimentation is
(7) We refer to
the independent variables.
.
as the error term ε having constant variance σ2 for all levels of
(8) If the effect of a 1-unit change in one independent variable depends on the level of the other
independent variable, we have a(n)
.
(9) In
, the β parameter is interpreted as the percentage change in odds for every
1-unit increase in xi holding all other x’s fixed.
(10)
In a hypothesis test, if the p-value = .04 and you have set alpha at .05, you would
the null hypothesis.
(11)
occurs when two (or more) independent variables in a regression are
related; they measure essentially the same thing.
(12) Variance can be separated into two major components _________________, variability in particular
groups and ___________________, variability depending on group.
1|Page
Dr. Kelly Bradley
Final Exam
True/False: Determine the correctness of each statement
A. True or B. False.
Using the following table, answer items 13 – 15.
ID Age Score [0-100%] Sex
1
24
89
F
2
32
74
F
3
36
77
F
4
28
92
M
Summer 2015
by assigning the best choice,
Disease [Relapse or Remission]
Relapse
Remission
Remission
Relapse
(13)
ID is an ordinal measure.
A. True
B. False
(14)
Sex could be classified as a categorical variable. _____
A. True
B. False
(15)
Score is a ratio measure.
A. True
B. False
(16)
When choosing a measure of central tendency, if the data set has extreme values, the
median would be the best measure.
A. True
B. False
(17)
Range, IQR, and standard deviation are measures of variability.
A. True
B. False
(18)
To test if all of the slope parameters are zero, we use an F –test.
A. True
B. False
(19)
The value of SST does not change with the model, as it depends only on the values of the
dependent variable, y.
A. True
B. False
(20)
Once an interaction has been deemed important in a model, we cannot remove any associated
first-order terms in the model.
A. True
B. False
(21)
In a completely randomized experimental design with 4 factors and 4 levels, 8 treatments exist.
A. True
B. False
2|Page
Dr. Kelly Bradley
Final Exam
Summer 2015
PART B: Short Answer (30 POINTS) Answer the questions below. {5 points each}
(1) In hypothesis testing, does rejecting the null hypothesis prove that the research hypothesis is
correct? Specifically, can we accept the alternative? Explain.
(2) A colleague conducts a study and finds a positive correlation between income and health. She
concludes that higher income causes better health. Is this a suitable conclusion? Explain.
(3) Explain when we might use stepwise regression, and note at least one reason we would need to use
caution in drawing inferences from a stepwise model.
(4) In an experimental design, what is the purpose of blocking? Explain.
(5) Consider the assumption of equal population variances in ANOVA. Why is this important? Explain.
(6) In an ANOVA, why is it preferable to use a follow-up analysis such as Tukey’s Multiple
Comparisons of Means as opposed to multiple t-tests?
3|Page
Dr. Kelly Bradley
Final Exam
Summer 2015
PART C: Data Analysis (42 points)
*** (Use α= .05) for testing purposes ***
Consider the following data set (posted on the website as Exam 4 Data)
The High School and Beyond data set includes the following variables: Sex (1=Male, 2=Female),
SES (Socioeconomic Status: 1=Low, 2=Middle, 3=Upper), School Type (1=Public, 2=Private), Type
of High School Program (1=General, 2=Academic or 3=Vocational), Self-concept Scores, and
Motivation Level Scores, in addition to Test Scores on an Achievement Test in Writing. The data
are posted Exam 4 Data under Exams on the website.
1.
Descriptive statistics were produced for all the continuous variables, including a correlation matrix.
Using the output below, describe the distribution of each variable and their relationship with one
another, be sure to discuss strength and direction. {5 points}
Descriptive Statistics: self concept, motivation, WRTG
Variable
self concept
Motivation
WRTG
N
600
600
600
N*
0
0
0
Variable
self concept
Motivation
WRTG
Maximum
1.1900
1.0000
67.100
Mean
0.0049
0.6608
52.385
StDev
0.7055
0.3427
9.726
Skewness
-0.90
-0.59
-0.47
Minimum
-2.6200
0.0000
25.500
Q1
-0.3000
0.3300
44.300
Median
0.0300
0.6700
54.100
Q3
0.4400
1.0000
59.900
Kurtosis
1.56
-0.88
-0.70
Correlations: self concept, motivation, WRTG
Motivation
WRTG
self concept
0.289
0.019
motivation
0.254
4|Page
Dr. Kelly Bradley
Final Exam
Summer 2015
2. A multiple regression equation was computed to explain the variation in Self-Concept, with a
summary residual analysis. Using the output below,
A. Write the regression model in population format. Label each component, i.e., main effect,
error, etc. {4 points}
B. Determine if the model has utility. Report your p-value and explain the decision. {3 points}
C. Test the significance of the variables included (report p-values). Interpret the results. {3 points}
D. Do you feel the assumptions of regression have held in this analysis? Be specific, outline each
assumption. Explain. {4 points}
Regression Analysis: self concept versus motivation, SEX, motivation*SEX
The regression equation is
self concept = 0.209 + 0.195 motivation - 0.403 SEX + 0.279 motivation*SEX
Predictor
Constant
motivation
SEX
motivation*SEX
S = 0.666568
Coef
0.2094
0.1953
-0.4034
0.2792
SE Coef
0.1875
0.2595
0.1185
0.1602
R-Sq = 11.2%
T
1.12
0.75
-3.41
1.74
P
0.265
0.452
0.001
0.082
R-Sq(adj) = 10.7%
Analysis of Variance
Source
DF
SS
Regression
3
33.341
Residual Error 596 264.810
Total
599 298.151
MS
11.114
0.444
F
25.01
P
0.000
Residual Plots for self concept
Normal Probability Plot
Versus Fits
99
1
90
0
Residual
Percent
99.99
50
10
-2
1
0.01
-1
-3.0
-1.5
0.0
Residual
1.5
-3
3.0
-0.6
Histogram
-0.4
-0.2
0.0
Fitted Value
0.2
Versus Order
100
Residual
Frequency
1
75
50
25
0
0
-1
-2
-2.4
-3
-1.8
-1.2
-0.6 0.0
Residual
0.6
1.2
1
50
100 150 200 250 300 350 400 450 500 550 600
Observation Order
5|Page
Dr. Kelly Bradley
Final Exam
Summer 2015
3. Using an ANOVA approach
A. Conduct an analysis to determine if there is a significant difference between the self-concept of
students by SES (1 = Low, 2 = Average, 3 = High). {4 points}
i. Produce the 4 in 1 plot. {1 point}
ii. Produce the comparative boxplots. {2 point}
iii. Make sure to run Tukey’s post hoc. {2 points}
B. Based on your results, is there sufficient evidence of a difference between the self-concept of
students for different SES levels? Report the test-statistic and p-value. Explain. {3 points}
C. If you found an overall difference, where did the individual differences lie? Justify your answer.
{2 points}
4. Researchers decided to block on School Type to attempt to control for variation.
A. List the explained and unexplained components of the model. List the random effect(s).
{4 points}
B. Using the output below, determine if the blocking was useful. Explain. {3 points}
General Linear Model: self concept versus SES, School Type
Factor
SES
School Type
Type
fixed
random
Levels
3
2
Values
1, 2, 3
1, 2
Analysis of Variance for self concept, using Adjusted SS for Tests
Source
SES
School Type
Error
Total
S = 0.701864
DF
2
1
596
599
Seq SS
4.5017
0.0521
293.5972
298.1510
R-Sq = 1.53%
Adj SS
4.2549
0.0521
293.5972
Adj MS
2.1274
0.0521
0.4926
F
4.32
0.11
P
0.014
0.745
R-Sq(adj) = 1.03%
C. Plot the potential interaction between SES and school type. {2 points}
When you are finished, submit your exam and celebrate.
You have just completed 660 in the 4-week summer session!
6|Page
Download