Stat 301 A -- Fall 2015 -- Midterm exam 2 3 November 2015 Information and JMP output Problem 1: Intelligence and future salary These data were analyzed in an influential (and contentious) book on intelligence. The observations in this data set are a subset of the individuals in a nationwide random sample of youth. The subset is those individuals who were 16 or 17 year olds in 1978, took the Armed Forces Qualifying Test, and were still alive in 2005. The study goal is to evaluate associations between the subject’s characteristics (including socioeconomic status and intelligence) and their income in 2005 (INCOME) and then to develop a model to predict 2005 income. The X variables used in this problem are: MotherEd: number of years of schooling for the mother (e.g. 12 means finished high school) FatherEd: number of years of schooling for the mother FamilyIncome78: family income in 1978, in dollars AFQT: subject’s score on the Armed Forces Qualifying Test, no units The AFQT score was treated as a measure of intelligence; higher values are considered to be more intelligent. The other three variables are measures of socioeconomic status; higher values are considered to be higher status. The response variable, INCOME, is the subject’s income in 2005, in dollars. The output includes: 1. Correlation coefficients between each pair of variables 2. Summary of Fit, ANOVA table, and Parameter Estimates from Fit Model with MotherEd, FatherEd, FamilyIncome78, and AFQT. 3. Summary of Fit, ANOVA table, and Parameter Estimates from Fit Model with 6 variables: MotherEd, FatherEd, FamilyIncome78, and AFQT, one squared term and FamilyIncome78*AFQT. 1. Correlation coefficients for each pair of variables MotherEd FatherEd FamilyIncome78 AFQT Income MotherEd 1.0000 0.6148 0.3229 0.4409 0.1648 FatherEd FamilyIncome78 0.6148 0.3229 1.0000 0.3526 0.3526 1.0000 0.4495 0.3061 0.1904 0.1743 AFQT 0.4409 0.4495 0.3061 1.0000 0.2981 Income 0.1648 0.1904 0.1743 0.2981 1.0000 2. Response Income, Model: MotherEd, FatherEd, FamilyIncome78, and AFQT Summary of Fit Root Mean Square Error Mean of Response Observations (or Sum Wgts) 9254.748 104379.8 2584 Analysis of Variance Source Model Error C. Total DF Sum of Squares 4 2.4071e+10 2579 2.2089e+11 2583 2.4496e+11 Mean Square 6.0178e+9 85650352 F Ratio 70.2601 Prob > F <.0001* Parameter Estimates Term Intercept MotherEd FatherEd FamilyIncome78 AFQT Estimate 96878.474 -13.83038 139.91682 0.05642 88.45966 Std Error 844.7635 91.1411 68.0730 0.0143 7.6366 t Ratio 114.68 -0.15 2.06 3.93 11.58 Prob>|t| <.0001 0.8794 0.0399 <.0001 <.0001 Std Beta 0 -0.00372 0.051069 0.080278 0.252173 VIF . 1.7201567 1.7655996 1.192387 1.3554371 3. Response Income, Model: MotherEd, FatherEd, FamilyIncome78, AFQT, 1 squared term, and AFQT*FamilyIncome78 Response Income Summary of Fit Root Mean Square Error Mean of Response Observations (or Sum Wgts) 9262.602 104382.4 2584 Analysis of Variance Source Model Error C. Total DF Sum of Squares 6 2.4714e+10 2577 2.211e+11 2583 2.4581e+11 Mean Square 4.1191e+9 85795801 F Ratio 48.0102 Prob > F <.0001* Parameter Estimates Term Intercept MotherEd FatherEd FamilyIncome78 AFQT FamilyIncome78*FamilyIncome78 AFQT*FamilyIncome78 Estimate 95834.972 -26.6895 142.1196 0.1115 112.6681 0.00000038 -0.0012 Std Error 995.0404 91.241 68.076 0.043 12.768 0.00000061 0.0005 t Ratio 96.31 -0.29 2.09 2.59 8.82 0.62 -2.38 Prob>|t| <.0001 0.77 0.037 0.0096 <.0001 0.54 0.018 Problem 2: longevity of mammal species Investigators compiled information about the average body mass (in kg), average metabolic rate (units unknown) and typical longevity (in years) for 95 species of mammals. These can be assumed to be a simple random sample of all mammal species. There is one row of data for each species. The investigators want to model longevity as a function of body mass and metabolic rate. Preliminary inspection of the data indicates that is necessary to log transform longevity. All analyses use log longevity as the Y variable. The output includes: 4. Scatterplot matrix of mass, metabolic rate, log mass, log metabolic rate, and log longevity 5. Fit Model output for log longevity as response and model effects of mass and metabolic rate 6. Fit Model output for log longevity as response and model effects of log mass and log metabolic rate 7. Information about predictions at selected combinations of mass and metabolic rate using output 6. 4. Scatterplot Matrix of mass, metabolic rate, log mass, log metab, and log longevity 5. Response: log Longevity, model effects: mass, metabolic rate Summary of Fit Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.87127 2.022222 95 Analysis of Variance Source Model Error C. Total DF Sum of Squares 2 33.88288 92 69.83819 94 103.72107 Mean Square 16.9414 0.7591 F Ratio 22.3175 Prob > F <.0001* Parameter Estimates Term Intercept Mass Metab Estimate 1.7364868 -0.008884 0.0001748 Std Error 0.099417 0.001875 0.000032 t Ratio 17.47 -4.74 5.37 Prob>|t| <.0001* <.0001* <.0001* 6. Response: log Longevity, model effects: log mass, log metabolic rate Summary of Fit Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.377345 2.022222 95 Analysis of Variance Source Model Error C. Total DF Sum of Squares 2 90.62126 92 13.09981 94 103.72107 Mean Square 45.3106 0.1424 F Ratio 318.2166 Prob > F <.0001* Parameter Estimates Term Intercept log Mass log Metab Estimate 3.7193428 0.5346157 -0.316104 Residual by Predicted Plot Std Error 0.484075 0.064361 0.085577 t Ratio 7.68 8.31 -3.69 Prob>|t| <.0001* <.0001* 0.0004* Std Beta 0 1.641772 -0.73007 VIF . 28.455897 28.455897 7. Predictions at selected combinations of mass and metabolic rate using output 6. Note: The row labelled mean is the prediction at the average mass and metabolic rate. Common Name Sloth Camel Cat Beluga whale Bat Mean Mass Metab log Mass log Metab Predicted StdErr log Predicted longevity Y 3.79 407 3 331 23600 546 1.33 6.01 1.10 5.80 10.07 6.30 2.60 3.75 2.31 0.0812 0.0792 0.0419 170 0.022 65 23000 15.6 7560 5.14 -3.82 4.17 10.04 2.75 8.93 3.29 0.81 0.0875 0.0631 0.0634 1.32 346 0.28 5.85 2.02 0.0387