EDF 802 Dr. Jeffrey Oescher Formative Assessment - Multiple Regression I. Statistical Hypotheses I assumed two related research questions. The first was "Can the reactions of faculty be predicted from the independent variables age, climate, years of experience, teaching effectiveness, and number of publications?" The second was "What is the most parsimonious set of predictors?" The latter suggests I analyze the data using a STEPWISE approach. The following hypotheses were developed for the analysis of the MRFA data. Research Question 1 H0: ρ = 0 H1: ρ ≠ 0 Research Question 2 H0: βi = 0 H1: βi ≠ 0 where i designates one of each of the five independent variables (i.e., 1, 2, 3, 4, and 5). II. Descriptive Information for the Sample No information was given upon which the variables reaction, climate, and effective teaching could be interpreted. Ages for the subjects ranged from 29 to 60 years with a median age of 42 years. The median for years of experience was 6.00, although this ranged from a single year to 12 years. Publications ranged from none to four per year, with a median of 2.00. Why have I chosen to report these as medians and ranges rather than means and standard deviations? Please report the latter in your performance assessment. III. Test Statistics and Sampling Distributions The test statistic for Research Question 1 is an F. The sampling distribution is F 1,13. There are several test statistics and sampling distributions for Research Question 2. The test statistics are t-statistics, but the sampling distributions differ depending on which model you examine. A stepwise regression was used to analyze the data, and two variables - years of experience and climate - were found to be the set of parsimonious predictors. Thus, my focus is on Model 2, ignoring Model 1 for all intense and purposes. The test statistics for testing years of experience and climate are t-statistics, and the sampling distribution is t12. The degrees of freedom reflect two variables in the equation and were calculated as 15-2-1 = 12. I have reported the non-significant variables (i.e., age, teaching effectiveness, and publications) separately from the two significant ones. If you examine the output from SPSS, you will find the tstatistics associated with each of these reported in the EXCLUDED VARIABLES table. Each of the t-statistics was analyzed using a sampling distribution of t11. The degrees of freedom reflect three variables in the equation (i.e., 15-3-1 = 11). These variables were not a part of the solution. This is tricky. In general, the sampling distribution for regression coefficients is t n-k-1. The sampling distribution for Model 1 is t13 because the only variable entered into the equation was years of experience. The degrees of freedom were calculated as 15-1-1 = 13. The sampling distribution for Model 2 is t12 because two variables were in the equation - years of experience and climate. The degrees of freedom were calculated as 15-2-1 = 12. The sampling distribution for the non- 1 significant predictors was t11 because there were three variables being considered. The degrees of freedom were calculated as 15-3-1 = 11. IV. Statistical Results The most parsimonious solution involved two predictors, years of experience and climate. The regression equation was Y’ = -0.41(X1) + 0.71(X2) +14.49 where X1 represents years of experience and X2 represent the climate of the faculty member’s workplace. The analysis for Research Question 1 resulted in R = .98 suggesting approximately 94% of the variance in faculty reactions could be explained by their years of experience and the climate in which they worked (i.e., R2). The multiple correlation coefficient was significant (F 1,13 = 93.22, p = .000). The analysis of Research Question 2 indicated years of experience and climate were both significant predictors (t12 = -4.62, p = .001; t12 =4.34, p = .001). No other predictors contributed significantly to the prediction of faculty reactions (i.e., age (t 11 = -0.05, p = .959; teaching effectiveness (t11 = 0.31, p = .765; and publications (t11 = 0.72, p = .487). The information for the two variable solution, that is the regression equation, and the multiple correlation coefficient R, should be readily apparent. The overall prediction equation reflects the two variable solution and its significance. The predictors themselves are grouped into those that are significant and those that are not. The significance of those that are is reported with information from the two variable solution. The non-significant predictors reflect the next iteration of the analysis (i.e., three predictor variables) and the appropriate t-statistics and degrees of freedom associated with it. V. Statistical Conclusions The null hypothesis for Research Question 1 was rejected; the prediction of faculty reactions based on years of experience and climate is significantly different from 0.00. The null hypotheses for Research Question 2 for the variables years of experience and climate were rejected; both variables significantly contribute to the predictive process. The null hypotheses for the variables age, teaching effectiveness, and publications were accepted; these variables do not contribute to the predictive process. VI. Research Conclusions It is possible to predict faculty reactions quite effectively on the basis of years of experience and climate. This information can be helpful to determining the overall reaction of faculty to the proposed tenure and promotion policy changes. When working in this prediction context, the conclusions usually reflect the significance of the predictive process and its use in future decisions. VII. Inferential Logic Research Question 1: The null and alternative hypotheses were developed. Alpha was set at .05, an acceptable level for protecting against a Type I error. The null hypothesis H 0: ρ = 0 was assumed to be true, allowing the researcher to generate a sampling distribution of F with 1 and 13 degrees of freedom. The observed statistic, F1,13 = 93.22, was mapped into the underlying sampling distribution of F and found to be atypical of the others values of F in this distribution (p = .000). Therefore, the null hypothesis was rejected; the alternative hypothesis was accepted. Research Question 2: A set of null and alternative hypotheses for each of the predictors variables were developed. Alpha was set at .05, an acceptable level for protecting against a Type 2 I error. A set of iterative analysis were used to assess the effects of the predictor variables. Assuming the null hypotheses for years of experience and climate were true (i.e., H 0: βi = 0 where i = 1 and 2 for years of experience and climate respectively), a sampling distribution of t-statistics with 12 degrees of freedom was generated. The observed t-statistics for years of experience and climate were mapped into this distribution and found to be atypical of the other t-statistics in it (p = .001 and p = .001 for years of experience and climate respectively). Thus, the null hypotheses for years of experience and climate were rejected in favor of accepting the alternative hypotheses. The null hypotheses for all other predictor variables were assumed to be true, allowing the researcher to generate a sampling distribution of t-statistics with 11 degrees of freedom. The observed t-statistics for age (t11 = -0.05) , teaching effectiveness (t11 = 0.31) , and publications (t11 = 0.72) were typical of the t-statistics in this sampling distribution (p = .959, p = .765, and p = .487 respectively). Therefore, the null hypotheses were accepted. General Comments The description of a stepwise multiple regression analysis is quite cumbersome because of the iterative nature of the analysis. That is, the process involves several stages at which specific tests are being performed. Depending on the outcome of each test, the process precedes or stops. The problem comes in determining how to describe this process yet also attend to the ultimate solution. I have considered the significant and non-significant variables as two categories and treated all of the variables within each category similarly. Nevertheless, the description of the process of analyzing the data is awkward. When you report the results of a multiple regression analysis in your research, you will focus only on the overall prediction (i.e., is the multiple correlation coefficient R different from 0.00) and the variables fond to be significant in the prediction (e.g., years of experience and climate for this problem). 3