Formative Exercise Multiple Regression

advertisement
EDF 802
Dr. Jeffrey Oescher
Formative Assessment - Multiple Regression
I.
Statistical Hypotheses
I assumed two related research questions. The first was "Can the reactions of faculty be
predicted from the independent variables age, climate, years of experience, teaching
effectiveness, and number of publications?" The second was "What is the most parsimonious set
of predictors?" The latter suggests I analyze the data using a STEPWISE approach. The
following hypotheses were developed for the analysis of the MRFA data.
Research Question 1
H0: ρ = 0
H1: ρ ≠ 0
Research Question 2
H0: βi = 0
H1: βi ≠ 0
where i designates one of each of the five independent variables (i.e., 1, 2, 3, 4, and 5).
II.
Descriptive Information for the Sample
No information was given upon which the variables reaction, climate, and effective teaching could
be interpreted. Ages for the subjects ranged from 29 to 60 years with a median age of 42 years.
The median for years of experience was 6.00, although this ranged from a single year to 12
years. Publications ranged from none to four per year, with a median of 2.00.
Why have I chosen to report these as medians and ranges rather than means and standard
deviations? Please report the latter in your performance assessment.
III.
Test Statistics and Sampling Distributions
The test statistic for Research Question 1 is an F. The sampling distribution is F 1,13.
There are several test statistics and sampling distributions for Research Question 2. The test
statistics are t-statistics, but the sampling distributions differ depending on which model you
examine. A stepwise regression was used to analyze the data, and two variables - years of
experience and climate - were found to be the set of parsimonious predictors. Thus, my focus is
on Model 2, ignoring Model 1 for all intense and purposes. The test statistics for testing years of
experience and climate are t-statistics, and the sampling distribution is t12. The degrees of
freedom reflect two variables in the equation and were calculated as 15-2-1 = 12.
I have reported the non-significant variables (i.e., age, teaching effectiveness, and publications)
separately from the two significant ones. If you examine the output from SPSS, you will find the tstatistics associated with each of these reported in the EXCLUDED VARIABLES table. Each of
the t-statistics was analyzed using a sampling distribution of t11. The degrees of freedom reflect
three variables in the equation (i.e., 15-3-1 = 11). These variables were not a part of the solution.
This is tricky. In general, the sampling distribution for regression coefficients is t n-k-1. The sampling
distribution for Model 1 is t13 because the only variable entered into the equation was years of
experience. The degrees of freedom were calculated as 15-1-1 = 13. The sampling distribution for
Model 2 is t12 because two variables were in the equation - years of experience and climate. The
degrees of freedom were calculated as 15-2-1 = 12. The sampling distribution for the non-
1
significant predictors was t11 because there were three variables being considered. The degrees
of freedom were calculated as 15-3-1 = 11.
IV.
Statistical Results
The most parsimonious solution involved two predictors, years of experience and climate. The
regression equation was Y’ = -0.41(X1) + 0.71(X2) +14.49 where X1 represents years of
experience and X2 represent the climate of the faculty member’s workplace.
The analysis for Research Question 1 resulted in R = .98 suggesting approximately 94% of the
variance in faculty reactions could be explained by their years of experience and the climate in
which they worked (i.e., R2). The multiple correlation coefficient was significant (F 1,13 = 93.22, p =
.000).
The analysis of Research Question 2 indicated years of experience and climate were both
significant predictors (t12 = -4.62, p = .001; t12 =4.34, p = .001). No other predictors contributed
significantly to the prediction of faculty reactions (i.e., age (t 11 = -0.05, p = .959; teaching
effectiveness (t11 = 0.31, p = .765; and publications (t11 = 0.72, p = .487).
The information for the two variable solution, that is the regression equation, and the multiple
correlation coefficient R, should be readily apparent. The overall prediction equation reflects the
two variable solution and its significance. The predictors themselves are grouped into those that
are significant and those that are not. The significance of those that are is reported with
information from the two variable solution. The non-significant predictors reflect the next iteration
of the analysis (i.e., three predictor variables) and the appropriate t-statistics and degrees of
freedom associated with it.
V.
Statistical Conclusions
The null hypothesis for Research Question 1 was rejected; the prediction of faculty reactions
based on years of experience and climate is significantly different from 0.00. The null hypotheses
for Research Question 2 for the variables years of experience and climate were rejected; both
variables significantly contribute to the predictive process. The null hypotheses for the variables
age, teaching effectiveness, and publications were accepted; these variables do not contribute to
the predictive process.
VI.
Research Conclusions
It is possible to predict faculty reactions quite effectively on the basis of years of experience and
climate. This information can be helpful to determining the overall reaction of faculty to the
proposed tenure and promotion policy changes.
When working in this prediction context, the conclusions usually reflect the significance of the
predictive process and its use in future decisions.
VII.
Inferential Logic
Research Question 1: The null and alternative hypotheses were developed. Alpha was set at .05,
an acceptable level for protecting against a Type I error. The null hypothesis H 0: ρ = 0 was
assumed to be true, allowing the researcher to generate a sampling distribution of F with 1 and 13
degrees of freedom. The observed statistic, F1,13 = 93.22, was mapped into the underlying
sampling distribution of F and found to be atypical of the others values of F in this distribution (p =
.000). Therefore, the null hypothesis was rejected; the alternative hypothesis was accepted.
Research Question 2: A set of null and alternative hypotheses for each of the predictors
variables were developed. Alpha was set at .05, an acceptable level for protecting against a Type
2
I error. A set of iterative analysis were used to assess the effects of the predictor variables.
Assuming the null hypotheses for years of experience and climate were true (i.e., H 0: βi = 0 where
i = 1 and 2 for years of experience and climate respectively), a sampling distribution of t-statistics
with 12 degrees of freedom was generated. The observed t-statistics for years of experience and
climate were mapped into this distribution and found to be atypical of the other t-statistics in it (p =
.001 and p = .001 for years of experience and climate respectively). Thus, the null hypotheses for
years of experience and climate were rejected in favor of accepting the alternative hypotheses.
The null hypotheses for all other predictor variables were assumed to be true, allowing the
researcher to generate a sampling distribution of t-statistics with 11 degrees of freedom. The
observed t-statistics for age (t11 = -0.05) , teaching effectiveness (t11 = 0.31) , and publications
(t11 = 0.72) were typical of the t-statistics in this sampling distribution (p = .959, p = .765, and p =
.487 respectively). Therefore, the null hypotheses were accepted.
General Comments
The description of a stepwise multiple regression analysis is quite cumbersome because of the
iterative nature of the analysis. That is, the process involves several stages at which specific tests
are being performed. Depending on the outcome of each test, the process precedes or stops. The
problem comes in determining how to describe this process yet also attend to the ultimate
solution. I have considered the significant and non-significant variables as two categories and
treated all of the variables within each category similarly. Nevertheless, the description of the
process of analyzing the data is awkward.
When you report the results of a multiple regression analysis in your research, you will focus only
on the overall prediction (i.e., is the multiple correlation coefficient R different from 0.00) and the
variables fond to be significant in the prediction (e.g., years of experience and climate for this
problem).
3
Download