multiple01

Overview of our study of the multiple linear regression model Regression models with more than one slope parameter Example 1 Is brain and body size predictive of intelligence? • Sample of n = 38 college students • Response (y): intelligence based on PIQ (performance) scores from the (revised) Wechsler Adult Intelligence Scale. • Potential predictor (x1): Brain size based on MRI scans (given as count/10,000). • Potential predictor (x2): Height in inches. • Potential predictor (x3): Weight in pounds. Example 1 Scatter matrix plot 3 .728 8 2 . 00 .75 3. 25 27. 5 70.5 6 5 8 1 6 7 1 1 130.5 PIQ 91.5 100.728 Brain 86.283 73.25 Height 65.75 Weight Example 1 Scatter matrix plot 91.5 100.728 86.283 73.25 65.75 Brain 130.5 PIQ Weight 7.5 70.5 2 1 1 Height Brain Height 8 83 0. 72 2 . . 75 3. 25 6 0 5 8 1 6 7 Scatter matrix plot • Illustrates the marginal relationships between each pair of variables without regard to the other variables. • The challenge is how the response y relates to all three predictors simultaneously. Example 1 A multiple linear regression model with three quantitative predictors yi  0  1 xi1  2 xi 2  3 xi 3    i where … • yi is intelligence (PIQ) of student i • xi1 is brain size (MRI) of student i • xi2 is height (Height) of student i • xi3 is weight (Weight) of student i and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2. Example 1 Some research questions • Which predictors – brain size, height, or weight – explain some variation in PIQ? • What is the effect of brain size on PIQ, after taking into account height and weight? • What is the PIQ of an individual with a given brain size, height, and weight? Example 1 The regression equation is PIQ = 111 + 2.06 Brain - 2.73 Height + 0.001 Weight Predictor Constant Brain Height Weight Coef 111.35 2.0604 -2.732 0.0006 S = 19.79 R-Sq = 29.5% Analysis of Variance Source DF Regression 3 Residual Error 34 Total 37 Source Brain Height Weight SE Coef 62.97 0.5634 1.229 0.1971 DF 1 1 1 SS 5572.7 13321.8 18894.6 Seq SS 2697.1 2875.6 0.0 T 1.77 3.66 -2.22 0.00 P 0.086 0.001 0.033 0.998 R-Sq(adj) = 23.3% MS 1857.6 391.8 F 4.74 P 0.007 Example 2 Baby bird breathing habits in burrows? • Experiment with n = 120 nestling bank swallows • Response (y): % increase in “minute ventilation”, Vent, i.e., total volume of air breathed per minute • Potential predictor (x1): percentage of oxygen, O2, in the air the baby birds breathe • Potential predictor (x2): percentage of carbon dioxide, CO2, in the air the baby birds breathe Example 2 Scatter matrix plot .5 14 .5 17 5 2.2 5 6.7 484.75 Vent 52.25 17.5 O2 14.5 CO2 Example 2 Three-dimensional scatter plot 600 400 Vent 200 0 -200 13 14 15 O2 16 17 18 0 19 2 4 6 8 CO2 Example 2 A first order model with two quantitative predictors yi  0  1 xi1  2 xi 2    i where … • yi is percentage of minute ventilation • xi1 is percentage of oxygen • xi2 is percentage of carbon dioxide and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2. Example 2 Some research questions • Is oxygen related to minute ventilation, after taking into account carbon dioxide? • Is carbon dioxide related to minute ventilation, after taking into account oxygen? • What is the mean minute ventilation of all nestling bank swallows whose breathing air is comprised of 15% oxygen and 5% carbon dioxide? Example 2 The regression equation is Vent = 86 - 5.33 O2 + 31.1 CO2 Predictor Constant O2 CO2 Coef 85.9 -5.330 31.103 S = 157.4 R-Sq = 26.8% Analysis of Variance Source DF Regression 2 Residual Error 117 Total 119 Source O2 CO2 SE Coef 106.0 6.425 4.789 DF 1 1 SS 1061819 2897566 3959385 Seq SS 17045 1044773 T 0.81 -0.83 6.50 P 0.419 0.408 0.000 R-Sq(adj) = 25.6% MS 530909 24766 F 21.44 P 0.000 Example 3 Is baby’s birth weight related to smoking during pregnancy? • Sample of n = 32 births • Response (y): birth weight in grams of baby • Potential predictor (x1): smoking status of mother (yes or no) • Potential predictor (x2): length of gestation in weeks Example 3 Scatter matrix plot 36 40 5 0.2 5 0. 7 3252.5 Weight 2697.5 40 Gest 36 Smoking Example 3 A first order model with one binary predictor yi  0  1 xi1  2 xi 2    i where … • yi is birth weight of baby i • xi1 is length of gestation of baby i • xi2 = 1, if mother smokes and xi2 = 0, if not and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2. Example 3 Estimated first order model with one binary predictor The regression equation is Weight = - 2390 + 143 Gest - 245 Smoking Weight (grams) 3700 0 1 3200 2700 2200 34 35 36 37 38 39 Gestation (weeks) 40 41 42 Example 3 Some research questions • Is baby’s birth weight related to smoking during pregnancy? • How is birth weight related to gestation, after taking into account smoking status? Example 3 The regression equation is Weight = - 2390 + 143 Gest - 245 Smoking Predictor Constant Gest Smoking Coef -2389.6 143.100 -244.54 S = 115.5 SE Coef 349.2 9.128 41.98 R-Sq = 89.6% T -6.84 15.68 -5.83 P 0.000 0.000 0.000 R-Sq(adj) = 88.9% Analysis of Variance Source Regression Residual Error Total Source Gest Smoking DF 1 1 DF 2 29 31 SS 3348720 387070 3735789 Seq SS 2895838 452881 MS 1674360 13347 F 125.45 P 0.000 Example 4 Compare three treatments (A, B, C) for severe depression • Random sample of n = 36 severely depressed individuals. • y = measure of treatment effectiveness • x1 = age (in years) • x2 = 1 if patient received A and 0, if not • x3 = 1 if patient received B and 0, if not Example 4 Compare three treatments (A, B, C) for severe depression 75 A B C 65 y 55 45 35 25 20 30 40 50 age 60 70 Example 4 A second order model with one quantitative predictor, a three-group qualitative variable, and interactions yi   0  1 xi1   2 xi 2  3 xi 3  12 xi1 xi 2  13 xi1 xi 3   i where … • yi is treatment effectiveness for patient i • xi1 is age of patient i • xi2 = 1, if treatment A and xi2 = 0, if not • xi3 = 1, if treatment B and xi3 = 0, if not Example 4 The estimated regression function Regression equation is y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3 - 0.703 agex2 - 0.510 agex3 80 A B C y = 47.5 + 0.33x 70 y 60 50 y = 28.9 + 0.52x 40 30 y = 6.21 + 1.03x 20 20 30 40 50 age 60 70 Example 4 Potential research questions • Does the effectiveness of the treatment depend on age? • Is one treatment superior to the other treatment for all ages? • What is the effect of age on the effectiveness of the treatment? Regression equation is y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3 - 0.703 agex2 - 0.510 agex3 Example 4 Predictor Constant age x2 x3 agex2 agex3 Coef 6.211 1.03339 41.304 22.707 -0.7029 -0.5097 SE Coef 3.350 0.07233 5.085 5.091 0.1090 0.1104 S = 3.925 R-Sq = 91.4% Analysis of Variance Source DF SS Regression 5 4932.85 Residual Error 30 462.15 Total 35 5395.00 Source age x2 x3 agex2 agex3 DF 1 1 1 1 1 Seq SS 3424.43 803.80 1.19 375.00 328.42 T 1.85 14.29 8.12 4.46 -6.45 -4.62 P 0.074 0.000 0.000 0.000 0.000 0.000 R-Sq(adj) = 90.0% MS 986.57 15.40 F 64.04 P 0.000 Example 5 How is the length of a bluegill fish related to its age? • In 1981, n = 78 bluegills randomly sampled from Lake Mary in Minnesota. • y = length (in mm) • x1 = age (in years) Example 5 Scatter plot 200 length 150 100 1 2 3 4 age 5 6 Example 5 A second order polynomial model with one quantitative predictor   yi  0  1 xi  11 x   i 2 i where … • yi is length of bluegill (fish) i (in mm) • xi is age of bluegill (fish) i (in years) and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2. Example 5 Estimated regression function Regression Plot length = 13.6224 + 54.0493 age - 4.71866 age**2 S = 10.9061 R-Sq = 80.1 % R-Sq(adj) = 79.6 % 200 length 150 100 1 2 3 4 age 5 6 Example 5 Potential research questions • How is the length of a bluegill fish related to its age? • What is the length of a randomly selected five-year-old bluegill fish? Example 5 The regression equation is length = 148 + 19.8 c_age - 4.72 c_agesq Predictor Constant c_age c_agesq S = 10.91 Coef 147.604 19.811 -4.7187 SE Coef 1.472 1.431 0.9440 R-Sq = 80.1% T 100.26 13.85 -5.00 P 0.000 0.000 0.000 R-Sq(adj) = 79.6% Analysis of Variance Source DF SS MS F P Regression 2 35938 17969 151.07 0.000 Residual Error 75 8921 119 Total 77 44859 ... Predicted Values for New Observations New Fit SE Fit 95.0% CI 95.0% PI 1 165.90 2.77 (160.39, 171.42) (143.49, 188.32) Values of Predictors for New Observations New c_age c_agesq 1 1.37 1.88 The good news! • Everything you learned about the simple linear regression model extends, with at most minor modification, to the multiple linear regression model: – – – – same assumptions, same model checking (adjusted) R2 t-tests and t-intervals for one slope prediction (confidence) intervals for (mean) response New things we need to learn! • The above research scenarios (models) and a few more • The “general linear test” which helps to answer many research questions • F-tests for more than one slope • Interactions between two or more predictor variables • Identifying influential data points New things we need to learn! • Detection of (“variance inflation factors”) correlated predictors (“multicollinearity”) and the limitations they cause • Selection of variables from a large set of variables for inclusion in a model (“stepwise regression and “best subsets regression”)

multiple01

Related documents

Products

Support

multiple01

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib