Dr. Kelly Bradley Final Exam Summer 2015 {2 points} Name You MUST work alone – no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. Exam is scored out of 100 points. EPE/EDP 660 Exam 4 {3 points} Minitab (or other approved software) output must be included. It must be clearly labeled, with all answers clearly identified. In addition, you must include a copy of your session window. Do NOT include a copy of the worksheet. Read each question before responding. In order to receive partial credit, work must be shown. PART A: (23 points) Fill in each blank on the answer sheet with the best choice. {1 point each blank} (1) The two branches of statistics are (2) The between two variables. (3) and . measures the direction and strength of the linear association is the idea that simpler models are easier to understand and appreciate, and therefore have a "beauty" that their more complicated counterparts often lack. (4) If H0 is true and we reject it, we have made a error. (5) In a(n) design, the total sum of squares is made up of the treatment sum of squares and the error sum of squares. (6) Predicting y when the x values are outside the range of experimentation is (7) We refer to the independent variables. . as the error term ε having constant variance σ2 for all levels of (8) If the effect of a 1-unit change in one independent variable depends on the level of the other independent variable, we have a(n) . (9) In , the β parameter is interpreted as the percentage change in odds for every 1-unit increase in xi holding all other x’s fixed. (10) In a hypothesis test, if the p-value = .04 and you have set alpha at .05, you would the null hypothesis. (11) occurs when two (or more) independent variables in a regression are related; they measure essentially the same thing. (12) Variance can be separated into two major components _________________, variability in particular groups and ___________________, variability depending on group. 1|Page Dr. Kelly Bradley Final Exam True/False: Determine the correctness of each statement A. True or B. False. Using the following table, answer items 13 – 15. ID Age Score [0-100%] Sex 1 24 89 F 2 32 74 F 3 36 77 F 4 28 92 M Summer 2015 by assigning the best choice, Disease [Relapse or Remission] Relapse Remission Remission Relapse (13) ID is an ordinal measure. A. True B. False (14) Sex could be classified as a categorical variable. _____ A. True B. False (15) Score is a ratio measure. A. True B. False (16) When choosing a measure of central tendency, if the data set has extreme values, the median would be the best measure. A. True B. False (17) Range, IQR, and standard deviation are measures of variability. A. True B. False (18) To test if all of the slope parameters are zero, we use an F –test. A. True B. False (19) The value of SST does not change with the model, as it depends only on the values of the dependent variable, y. A. True B. False (20) Once an interaction has been deemed important in a model, we cannot remove any associated first-order terms in the model. A. True B. False (21) In a completely randomized experimental design with 4 factors and 4 levels, 8 treatments exist. A. True B. False 2|Page Dr. Kelly Bradley Final Exam Summer 2015 PART B: Short Answer (30 POINTS) Answer the questions below. {5 points each} (1) In hypothesis testing, does rejecting the null hypothesis prove that the research hypothesis is correct? Specifically, can we accept the alternative? Explain. (2) A colleague conducts a study and finds a positive correlation between income and health. She concludes that higher income causes better health. Is this a suitable conclusion? Explain. (3) Explain when we might use stepwise regression, and note at least one reason we would need to use caution in drawing inferences from a stepwise model. (4) In an experimental design, what is the purpose of blocking? Explain. (5) Consider the assumption of equal population variances in ANOVA. Why is this important? Explain. (6) In an ANOVA, why is it preferable to use a follow-up analysis such as Tukey’s Multiple Comparisons of Means as opposed to multiple t-tests? 3|Page Dr. Kelly Bradley Final Exam Summer 2015 PART C: Data Analysis (42 points) *** (Use α= .05) for testing purposes *** Consider the following data set (posted on the website as Exam 4 Data) The High School and Beyond data set includes the following variables: Sex (1=Male, 2=Female), SES (Socioeconomic Status: 1=Low, 2=Middle, 3=Upper), School Type (1=Public, 2=Private), Type of High School Program (1=General, 2=Academic or 3=Vocational), Self-concept Scores, and Motivation Level Scores, in addition to Test Scores on an Achievement Test in Writing. The data are posted Exam 4 Data under Exams on the website. 1. Descriptive statistics were produced for all the continuous variables, including a correlation matrix. Using the output below, describe the distribution of each variable and their relationship with one another, be sure to discuss strength and direction. {5 points} Descriptive Statistics: self concept, motivation, WRTG Variable self concept Motivation WRTG N 600 600 600 N* 0 0 0 Variable self concept Motivation WRTG Maximum 1.1900 1.0000 67.100 Mean 0.0049 0.6608 52.385 StDev 0.7055 0.3427 9.726 Skewness -0.90 -0.59 -0.47 Minimum -2.6200 0.0000 25.500 Q1 -0.3000 0.3300 44.300 Median 0.0300 0.6700 54.100 Q3 0.4400 1.0000 59.900 Kurtosis 1.56 -0.88 -0.70 Correlations: self concept, motivation, WRTG Motivation WRTG self concept 0.289 0.019 motivation 0.254 4|Page Dr. Kelly Bradley Final Exam Summer 2015 2. A multiple regression equation was computed to explain the variation in Self-Concept, with a summary residual analysis. Using the output below, A. Write the regression model in population format. Label each component, i.e., main effect, error, etc. {4 points} B. Determine if the model has utility. Report your p-value and explain the decision. {3 points} C. Test the significance of the variables included (report p-values). Interpret the results. {3 points} D. Do you feel the assumptions of regression have held in this analysis? Be specific, outline each assumption. Explain. {4 points} Regression Analysis: self concept versus motivation, SEX, motivation*SEX The regression equation is self concept = 0.209 + 0.195 motivation - 0.403 SEX + 0.279 motivation*SEX Predictor Constant motivation SEX motivation*SEX S = 0.666568 Coef 0.2094 0.1953 -0.4034 0.2792 SE Coef 0.1875 0.2595 0.1185 0.1602 R-Sq = 11.2% T 1.12 0.75 -3.41 1.74 P 0.265 0.452 0.001 0.082 R-Sq(adj) = 10.7% Analysis of Variance Source DF SS Regression 3 33.341 Residual Error 596 264.810 Total 599 298.151 MS 11.114 0.444 F 25.01 P 0.000 Residual Plots for self concept Normal Probability Plot Versus Fits 99 1 90 0 Residual Percent 99.99 50 10 -2 1 0.01 -1 -3.0 -1.5 0.0 Residual 1.5 -3 3.0 -0.6 Histogram -0.4 -0.2 0.0 Fitted Value 0.2 Versus Order 100 Residual Frequency 1 75 50 25 0 0 -1 -2 -2.4 -3 -1.8 -1.2 -0.6 0.0 Residual 0.6 1.2 1 50 100 150 200 250 300 350 400 450 500 550 600 Observation Order 5|Page Dr. Kelly Bradley Final Exam Summer 2015 3. Using an ANOVA approach A. Conduct an analysis to determine if there is a significant difference between the self-concept of students by SES (1 = Low, 2 = Average, 3 = High). {4 points} i. Produce the 4 in 1 plot. {1 point} ii. Produce the comparative boxplots. {2 point} iii. Make sure to run Tukey’s post hoc. {2 points} B. Based on your results, is there sufficient evidence of a difference between the self-concept of students for different SES levels? Report the test-statistic and p-value. Explain. {3 points} C. If you found an overall difference, where did the individual differences lie? Justify your answer. {2 points} 4. Researchers decided to block on School Type to attempt to control for variation. A. List the explained and unexplained components of the model. List the random effect(s). {4 points} B. Using the output below, determine if the blocking was useful. Explain. {3 points} General Linear Model: self concept versus SES, School Type Factor SES School Type Type fixed random Levels 3 2 Values 1, 2, 3 1, 2 Analysis of Variance for self concept, using Adjusted SS for Tests Source SES School Type Error Total S = 0.701864 DF 2 1 596 599 Seq SS 4.5017 0.0521 293.5972 298.1510 R-Sq = 1.53% Adj SS 4.2549 0.0521 293.5972 Adj MS 2.1274 0.0521 0.4926 F 4.32 0.11 P 0.014 0.745 R-Sq(adj) = 1.03% C. Plot the potential interaction between SES and school type. {2 points} When you are finished, submit your exam and celebrate. You have just completed 660 in the 4-week summer session! 6|Page