Bivariate Normal Distribution and Regression Application to Galton’s Heights of Adult Children and Parents Sources: Galton, Francis (1889). Natural Inheritance, MacMillan, London. Galton, F.; J.D. Hamilton Dickson (1886). “Family Likeness in Stature”, Proceedings of the Royal Society of London, Vol. 40, pp.42-73. Data – Heights of Adult Children and Parents • Adult Children Heights are reported by inch, in a manner so that the median of the grouped values is used for each (62.2”,…,73.2” are reported by Galton). – He adjusts female heights by a multiple of 1.08 – We use 61.2” for his “Below” – We use 74.2” for his “Above” • Mid-Parents Heights are the average of the two parents’ heights (after female adjusted). Grouped values at median (64.5”,…,72.5” by Galton) – We use 63.5” for “Below” – We use 73.5” for “Above” Adult Child vs Mid-Parent Height 75 74 73 72 71 70 Adult Child 69 68 67 66 65 64 63 62 61 60 63 64 65 66 67 68 Mid-Parent 69 70 71 72 73 Mid-Parent Height 250 200 Frequency 150 100 50 0 63.5 64.5 65.5 66.5 67.5 68.5 Height 69.5 70.5 71.5 72.5 Adult Child Heights 180 160 140 Frequency 120 100 80 60 40 20 0 61.2 62.2 63.2 64.2 65.2 66.2 67.2 Height 68.2 69.2 70.2 71.2 72.2 73.2 74.2 Joint Density Function f ( y1 , y2 ) 1 2 1 2 1 2 2 2 1 exp 2 2 1 y1 1 2 2 y1 1 y2 2 y2 2 2 2 22 1 1 2 y1 , y2 Bivariate Normal Density where : 1 E (Y1 ) 12 V (Y1 ) 2 E (Y2 ) 22 V (Y2 ) E Y1 1 Y2 2 1 2 0.15-0.2 0.1-0.15 0.05-0.1 0.2 0-0.05 0.15 0.1 0.05 0 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 x1 120 121 0.4 Marginal Distribution of Y1 (P. 1) f1 y1 f y1 , y2 dy2 1 2 12 22 1 2 1 exp 2 2 1 y1 1 2 2 y1 1 y2 2 y2 2 2 dy2 2 2 1 2 2 1 Bringing out constant (temporari ly) and forming common denominato r in exponent : 1 2 12 22 1 2 1 2 2 2 2 exp 2 1 2 12 22 y1 1 2 2 y1 1 y2 2 1 2 y2 2 1 dy 2 Completing the square in the exponent by adding and subtractin g y1 1 22 2in the square brackets : 2 1 2 12 22 1 2 1 2 2 2 2 2 2 2 2 2 2 exp 2 1 2 12 22 y1 1 2 2 y1 1 y2 2 1 2 y2 2 1 y1 1 2 y1 1 2 1 2 2 2 2 1 2 1 1 2 2 2 2 2 2 2 2 exp y 2 y y y y 1 dy2 1 1 2 1 1 2 2 1 2 2 2 1 1 1 2 2 1 2 2 2 1 2 dy 2 Marginal Distribution of Y1 (P. 2) Pulling out term not involving y2 and cleaning up exponents : f1 y1 1 2 12 22 1 2 1 2 12 22 1 2 1 2 12 22 1 2 y1 1 2 22 1 2 y2 2 1 y1 1 2 2 exp exp dy2 2 1 2 12 22 2 1 2 12 22 2 2 y2 2 y1 1 y1 1 2 1 dy exp exp 2 2 2 2 2 1 2 1 2 2 2 y2 2 y1 1 y1 1 2 1 exp exp 2 2 2 dy2 2 2 1 1 2 The integrand is proportion al to a normal density wi th : E Y2 2 y1 1 2 1 Taking the normalizin g constant from the constant in front gives us : y1 1 2 f1 y1 exp 2 12 212 1 V Y2 1 2 22 2 2 y2 2 y1 1 y1 1 2 1 1 1 exp exp 2 2 dy2 2 2 1 2 12 2 1 2 22 2 2 1 Conditional Distribution of Y2 Given Y1=y1 (P. 1) f y2 | y1 1 f y1 , y2 f1 y1 2 12 22 1 2 1 2 22 1 2 2 22 1 2 1 2 22 1 2 1 exp 2 2 1 Putting terms involving 1 y1 1 2 2 y1 1 y2 2 y2 2 2 2 2 1 1 2 2 y1 1 2 1 exp 2 2 12 21 1 exp 2 2 1 y1 1 2 2 y1 1 y2 2 y2 2 2 1 y1 1 2 2 2 2 2 1 1 2 2 1 y1 1 2 together 1 exp 2 2 1 1 exp 2 2 1 by multiplyin g and dividing last term by 1 2 : 2 y1 1 2 1 1 2 2 y1 1 y2 2 y2 2 2 22 1 1 2 y2 2 2 2 y1 1 y2 2 y1 1 2 2 2 2 2 1 2 1 Conditional Distribution of Y2 Given Y1=y1 (P. 2) Pulling out 22 in the denominato r of the exponent, then forming the " perfect square" , then a function of y2 : 1 2 22 1 2 1 2 22 1 2 1 2 22 1 2 2 1 2 2 y1 1 y2 2 y1 1 2 22 2 y2 2 exp 2 2 1 12 2 1 2 2 1 y1 1 2 y2 2 exp 2 2 2 1 1 2 2 1 y1 1 2 y 2 exp 2 2 2 1 2 1 2 y 1 2 , 2 1 2 Y2 | Y1 y1 ~ N 2 1 2 1 This is referred to as the REGRESSION of Y2 on Y1 Summary of Results Joint Distributi on : f ( y1 , y2 ) 1 2 12 22 1 2 1 exp 2 2 1 y1 1 2 2 y1 1 y2 2 y2 2 2 2 2 1 1 2 2 Marginal (aka Unconditi onal) Distributi ons : y1 1 2 f1 y1 exp y1 2 2 2 21 1 1 y2 2 2 f 2 y2 exp y2 2 2 2 2 2 2 1 Y1 ~ N 1 , 12 Y2 ~ N 2 , 22 Conditiona l Distributi ons : f y2 | y1 f y1 | y2 1 2 22 1 2 1 212 1 2 2 1 y1 1 2 y 2 y2 exp 2 2 2 2 1 2 1 2 1 y2 2 1 y 1 y1 exp 2 2 1 2 1 1 2 y 1 2 , 2 1 2 Y2 | Y1 y1 ~ N 2 1 2 1 y 2 1 , 2 1 2 Y1 | Y2 y2 ~ N 1 2 1 2 y1 , y2 Heights of Adult Children and Parents • Empirical Data Based on 924 pairs (F. Galton) • Y2 = Adult Child’s Height – Y2 ~ N(68.1,6.39) 2=2.53 • Y1 = Mid-Parent’s Height – Y1 ~ N(68.3,3.18) 1=1.78 • COV(Y1,Y2) = 2.02 0.45, 2 = 0.20 • Y2|Y1=y1 is Normal with conditional mean and variance: EY2 | Y1 y1 2 y1 1 2 1 68.1 y1 68.3(0.45) V Y2 | Y1 y1 22 1 2 6.39(1 .20) 5.11 3.18 6.39 68.1 0.638 y1 43.6 24.5 0.638 y1 Y | y 5.11 2.26 2 1 Unconditional 63.5 66.5 69.5 72.5 E[Y2|y1] 68.1 65.0 66.9 68.8 70.8 Y2|y1 2.53 2.26 2.26 2.26 2.26 y1 Joint Density Function 0.035-0.04 0.03-0.035 0.025-0.03 62.96 0.02-0.025 0.04 64.206 65.452 66.698 y1 67.944 0.035 0.01-0.015 0.03 0.005-0.01 0.025 0-0.005 0.02 69.19 0.015 70.436 71.682 72.928 0.015-0.02 0.01 0.005 0 Joint Density Function 0.035-0.04 0.03-0.035 62.96 64.206 0.04 65.452 0.035 66.698 0.03 67.944 0.025 0.02 69.19 0.015 0.01 0.005 0 70.436 71.682 72.928 y1 0.025-0.03 0.02-0.025 0.015-0.02 0.01-0.015 0.005-0.01 0-0.005 Distributions of Heights of Adult Children 0.2 0.18 0.16 0.14 0.12 f(y2) uncond y1=63.5 0.1 y1=66.5 y1=69.5 y1=72.5 0.08 0.06 0.04 0.02 0 59.5 60.5 61.5 62.5 63.5 64.5 65.5 66.5 67.5 68.5 y2 69.5 70.5 71.5 72.5 73.5 74.5 75.5 76.5 E(Child)= Regression to the Mean Parent+constant 72.5 71.5 Galton’s Finding 70.5 69.5 E(Child) independent of parent 68.5 E(Y2|y1)=24.5+.638y1 E(Y2|y1)=0.21+y1 E(Y2|y1)=E(Y2) 67.5 66.5 65.5 64.5 63.5 63.5 64.5 65.5 66.5 67.5 68.5 y1 69.5 70.5 71.5 72.5 Expectations and Variances • • • • E(Y1) = 68.3 V(Y1) = 3.18 E(Y2) = 68.1 V(Y2) = 6.39 E(Y2|Y1=y1) = 24.5+0.638y1 EY1[E(Y2|Y1=y1)] = EY1[24.5+0.638Y1] = 24.5+0.638(68.3) = 68.1 = E(Y2) • V(Y2|Y1=y1) = 5.11 EY1[V(Y2|Y1=y1)] = 5.11 • VY1[E(Y2|Y1=y1)] = VY1[24.5+0.638Y1] = (0.638)2 V(Y1) = (0.407)3.18 = 1.29 • EY1[V(Y2|Y1=y1)]+VY1[E(Y2|Y1=y1)] = 5.11+1.29=6.40 = V(Y2) (with round-off)