251solngr2-051 4/12/05 (Open this document in 'Page Layout' view!) Graded Assignment 2 Name: Class days and time: Student number: Modify the data below as follows: Add the last digit of your student number to 7 in problem 1; Add the second to last number to the 9 in problem 2. 1) For the following joint probability table (i) check for independence, (ii) Compute E x and Var x , (iii) Compute Covx, y or xy and Corr x, y or xy , (iv) Compute Ex y and Var x y from the results in (ii) and (iii), (iv) Compute Cov5 x 3, y and Corr 5 x 3, y using the formulas in section K4 of 251v2out or section C1 of 251var2. Note y 1y 0 . that x 3 1 .11 5 .08 7 .16 y 3 .10 .14 .10 9 .12 .08 .11 2) For the following sample (i) Compute the sample mean and variance of or rxy , (iii) Compute the sample mean and variance of x , (ii) Compute Covx, y or s xy and Corr x, y x y from the results in (i) and (ii). (iv) Compute Cov5x 3,2 y Corr 5x 3,2 y using the formulas in section K4 of 251v2out or section C1 of 251var2. Note that 2 y 2 y 0 . y and x 9 4 6 2 1 1 3 10 2 4 2 5 7 6 5 -3 Solution: Assume that Seymour Butz’s number is 555555. The table becomes x 3 .11 5 .08 12 .16 y 3 .10 .14 .10 9 .12 .08 .11 1 (i) Check for independence: First you need to find Px and P y . Look at the upper left hand probability below. Its value is .11 and it represents Px 3 y 1 . If x and y are independent , we would have Px 3 y 1 Px 3 P y 1 .33.37 .1221 . Since this is not true, x and y cannot be independent. Even one place where the joint probability is not the product of the marginal probabilities is enough. If this one is not enough to convince you, how about Px 7 y 3 .10 Px 7 P y 3 .35.30 .1050 . Notice that the second row is not proportional to the first row. A zero covariance or correlation would be the consequence of independence, but it is not true that a zero correlation or covariance would prove independence. We have already seen one example where there is a zero correlation, but no independence. 1 251solngr2-051 4/12/05 (ii) Compute E x and Var x : Looking below, we find Ex 6.93 and Varx 2.7938 . (iii) Compute Covx, y and Corr x, y . To summarize x 2 Px 1 , 3 5 12 P y yP y y P y 1 .11 .08 .16 .35 0.35 0.35 x E x xPx 6.93 y 3 .14 .10 .34 1.02 3.06 2 2 .10 Ex x Px 63 .75 .12 9 .08 .11 .31 2.97 25 .11 P y 1 , Px .33 .30 .37 1.00 4.16 28 .52 y Ey yP y 4.16 xPx 0.99 1.50 4.44 6.93 x 2 Px 2.97 7.50 53 .28 63 .75 and E y2 y 2 P y 28 .52 .1131 .0851 .16 12 1 0.33 0.40 1.92 E xy xyPxy .10 33 .14 53 .10 12 3 0.90 2.10 3.60 27 .97 .12 39 .0859 .1112 9 3.24 43 .60 11 .88 xy Covx, y Exy x y 27.97 6.934.16 0.8588 , x2 E x 2 x2 63.75 6.932 15.7251 and y2 E y 2 y2 28.52 4.162 11.2144 . So that xy Corr x, y xy 0.8588 0.8588 2 15 .7251 11 .2144 .0041823 0.06467 . 15 .7251 11 .2144 ( x 3.9655, y 3.3488 ) The correlation and covariance are negative, indicating a tendency of y to x y 2 .0041823 hardly exists on a zero to one scale, indicating that the relationship is fall when x rises. xy barely there. Note that 1 xy 1 always! (iv) Compute Ex y and Var x y . Ex y Ex E y x y 6.93 4.16 11.09 and Var x y x2 y2 2 xy Varx Var y 2Covx, y 15 .7251 11.2144 20.8588 25 .2219 To check this do the computations below. If we run down the columns of the table: y x 3 3 3 5 5 5 12 12 12 1 3 9 1 3 9 1 3 9 x y 4 6 12 6 8 14 13 15 21 Px, y .11 .10 .12 .08 .14 .08 .16 .10 .11 2 251solngr2-051 4/12/05 Now collect probabilities that belong to the same value. For example, P6 .10 .08 .18 . 4 6 8 12 13 14 15 21 E x y ( x y) Px y Px y x y .11 .18 .14 .12 .16 .08 .10 .11 1.00 x y Px, y 9.32 0.44 1.08 1.12 1.44 2.08 1.12 1.50 2.31 11.09 ( x y) 2 Px y 1.76 6.48 8.96 17.28 27.04 15.68 22.50 48.51 148.21 Var x y x2 y E x y 2 x2 y 148.21 11.092 25.2219 (v) Compute Cov5 x 3, y and Corr 5x 3, y using the formulas in section K4 of 251v2out or section C1 of 251var2. Note that y 1y 0 . Solution: 251v2out says Cov(ax b, cy d ) acCov( x, y) and Corr (ax b, cy d ) (sign(ac))Corr ( x, y) , where signac has the value 1 or 1 depending on whether the product of a and c is negative or positive. a 5 and c 1 . xy Covx, y 0.18588 and xy Corrx, y 0.06467 Cov(5x 3, 1y 0) 51Cov( x, y) 50.8588 4.2940 Corr (5x 3,1y 0) (sign(51))Corr ( x, y) 10.06467 0.06467 . 2) 2) For the following sample (i) Compute the sample mean and variance of x , (ii) Compute Covx, y or s xy and Corr x, y or rxy , (iii) Compute the sample mean and variance of x y from the results in (i) and (ii). (iv) Compute Cov5 x 3,2 y and Corr 5x 3,2 y using the formulas in section K4 of 251v2out or section C1 of 251var2. Note that 2 y 2 y 0 . The original data y x 9 2 4 4 6 2 2 5 1 7 1 6 3 5 10 -3 Becomes y x 14 2 4 4 6 2 2 5 1 7 1 6 3 5 10 -3 3 251solngr2-051 4/12/05 The entire table is below with required computations. Row x x2 xy y y2 1 2 3 4 5 6 7 8 sum 14 4 6 2 1 1 3 10 41 196 16 36 4 1 1 9 100 363 2 4 2 5 7 6 5 -3 28 4 16 4 25 49 36 25 9 168 28 16 12 10 7 6 15 -30 64 x 41, x 363 , y 28, y 168 , and xy x 41 5.125 and y y 28 3.500 . Then x So n 7, 2 n (i) s x2 x 2 8 2 n nx 2 n 1 64 . 8 363 85.125 2 21 .8393 , s 2y 7 y 2 ny 2 n 1 28 83.500 2 10 .0000 . 7 ( s x 21 .8393 4.67325 and s y 10.0000 3.16228 ). (ii) s xy Covx, y rxy s xy sx s y xy nxy 64 85.125 3.500 11.3571 and n 1 Corr x, y 8 1 11 .3531 0.7685 . The correlation and covariance are negative, 21 .8393 10 .0000 indicating a tendency of y to fall when x rises. rxy2 .5906 is fairly large on a zero to one scale, indicating that the relationship is moderately strong. Note that 1 rxy 1 always! 41 28 69 8.625 . 8 8 8 (iii) x y x y s x2 y s x2 s 2y 2s xy 21 .8393 10 .0000 211 .3571 9.125 . To check this do the computations below. x x y y x y 2 14 4 6 2 1 1 3 10 2 4 2 5 7 6 5 -3 So x y x y x y 2 16 8 8 7 8 7 8 7 69 256 64 64 49 64 49 64 49 659 69 8.625 as above and 8 n x y 2 659 8698 2 9.125 . Notice how much larger the variation is in x and n 1 7 y individually than in x y . This is reflected in the small variance. Computer results follow. N* means missing measurements. s x2 y Variable x y x+y N 8 8 8 N* 0 0 0 Mean 5.13 3.50 8.63 SE Mean 1.65 1.12 1.07 StDev 4.67 3.16 3.02 Minimum 1.00 -3.00 7.00 Q1 1.25 2.00 7.00 Median 3.50 4.50 8.00 Q3 9.00 5.75 8.00 Maximum 14.00 7.00 16.00 4 251solngr2-051 4/12/05 (iv) Compute Cov5 x 3,2 y and Corr 5x 3,2 y using the formulas in section K4 of 251v2out or section C1 of 251var2. Note that 2 y 2 y 0 . Solution: 251v2out says Cov(ax b, cy d ) acCov( x, y) and Corr (ax b, cy d ) (sign(ac))Corr ( x, y) , where signac has the value 1 or 1 depending on whether the product of a and c is negative or positive. a 5 and c 1 . s xy Covx, y 7.21429 and rxy Corrx, y .0.828475. Cov(5x 3, 2 y 0) 52Cov( x, y) 10 11.3531 113 .531 Corr (5x 3, 2 y 0) (sign(52))Corr ( x, y) 10.7685 0.7685 . Appendix: Minitab Computations (This is mostly a reminder to me of how I checked my work – but there is enough info here to run the routines and I’m happy to give them to anyone who wants them.) Population Correlation Problem ————— 4/12/2005 9:12:27 PM ———————————————————— Welcome to Minitab, press F1 for help. Results for: 1gr2-051aa.MTW MTB > WSave "C:\Documents and Settings\rbove\My Documents\Minitab\1gr2051aa.MTW"; SUBC> Replace. Saving file as: 'C:\Documents and Settings\rbove\My Documents\Minitab\1gr2-051aa.MTW' MTB > echo MTB > Execute "C:\Documents and Settings\rbove\My Documents\Minitab\251popcorr.mtb" 1. Executing from file: C:\Documents and Settings\rbove\My Documents\Minitab\251popcorr.mtb MTB > #251popcorr MTB > # Computes population covariance and correlation MTB > # Put x in C15, y in c17 MTB > # Put a joint probability table in c10 - c14 MTB > # Fill table with zeros to make it 5 by 5. MTB > name k10 'varx' MTB > name k11 'vary' MTB > name k12 'Exy' MTB > name k13 'covxy' MTB > name k14 'corr' MTB > name k15 'sdx' MTB > name k16 'sumpx' MTB > name k17 'sdy' MTB > name k18 'sumpy' MTB > name k25 'Ex' MTB > name k26 'Ex2' MTB > name k27 'Ey' MTB > name k28 'Ey2' MTB > execute 'marg973' Executing from file: marg973.MTB (Input was Probabilities in c10-c14, x in c15, y in c17) MTB > #marg973.mtb #part of 251popcorr Computes marginal probabilities MTB > let c18=c10+c11+c12+c13+c14 MTB > let c16(1)=sum(c10) MTB > let c16(2)=sum(c11) MTB > let c16(3)=sum(c12) MTB > let c16(4)=sum(c13) MTB > let c16(5)=sum(c14) 5 251solngr2-051 4/12/05 MTB > let k16=sum(c16) MTB > let k18=sum(c18) MTB > print c10-c18 #These are sums of x and y #probabilities and should be 1. Data Display Row 1 2 3 4 5 C10 0.11 0.10 0.12 0.00 0.00 C11 0.08 0.14 0.08 0.00 0.00 C12 0.16 0.10 0.11 0.00 0.00 C13 0 0 0 0 0 C14 0 0 0 0 0 C15 3 5 12 0 0 C16 0.33 0.30 0.37 0.00 0.00 C17 1 3 9 0 0 C18 0.35 0.34 0.31 0.00 0.00 MTB > end (End means the end of an exec and a return to Popcorr) MTB > execute 'meany973' Executing from file: meany973.MTB MTB > #meany.973.mtb part of 251popcorr MTB > print k16,k18 Data Display sumpx sumpy 1.00000 1.00000 (A check to see if probabilities add to 1) MTB > let c20=c10*c17 (c20-c24 are products for E(xy)) MTB > let c20=c20*c15(1) MTB > let c21=c11*c17 MTB > let c21=c21*c15(2) MTB > let c22=c12*c17 MTB > let c22=c22*c15(3) MTB > let c23=c13*c17 MTB > let c23=c23*c15(4) MTB > let c24=c14*c17 MTB > let c24=c24*c15(5) MTB > let c25=c15*c16 # xP(x) MTB > let k25=sum(c25) # E(x) MTB > let c27=c17*c17 MTB > let k27=sum(c27) # E(y)? MTB > end MTB > execute 'meanz973' Executing from file: meanz973.MTB MTB > #meanz973.mtb part of 251popcorr MTB > let c27=c17*c18 MTB > let k27=sum(c27) #E(y) MTB > end MTB > execute 'exysq973' Executing from file: exysq973.MTB MTB > #exysq973.mtb part of 251popcorr MTB > let c26=c15*c25 # xsqP(x) MTB > let c28=c17*c27 # ysqP(y) MTB > let k26=sum(c26) # E(xsq) MTB > let k28=sum(c28) # E(ysq) MTB > print c20-c28 Data Display Row 1 2 3 4 5 C20 0.33 0.90 3.24 0.00 0.00 C21 0.4 2.1 3.6 0.0 0.0 C22 1.92 3.60 11.88 0.00 0.00 C23 0 0 0 0 0 C24 0 0 0 0 0 C25 0.99 1.50 4.44 0.00 0.00 C26 2.97 7.50 53.28 0.00 0.00 C27 0.35 1.02 2.79 0.00 0.00 C28 0.35 3.06 25.11 0.00 0.00 MTB > print k25-k28 6 251solngr2-051 4/12/05 Data Display Ex Ex2 Ey Ey2 6.93000 63.7500 4.16000 28.5200 MTB > end MTB > execute 'xyvar973' Executing from file: xyvar973.MTB MTB > #xyvar973.mtb part of popcorr MTB > let k10=k26-k25*k25 #variance of x MTB > let k11=k28-k27*k27 #variance of y MTB > print k10 k11 Data Display varx vary MTB MTB MTB MTB MTB MTB MTB 15.7251 11.2144 > > > > > > > let k20=sum(c20) (Column sums) let k21=sum(c21) let k22=sum(c22) let k23=sum(c23) let k24=sum(c24) let k12 = k20+k21+k22+k23+k24 # E(xy) print k20-k24, k12 Data Display K20 K21 K22 K23 K24 Exy 4.47000 6.10000 17.4000 0 0 27.9700 MTB > end MTB > execute 'cov973' Executing from file: cov973.MTB MTB > #cov973.mtb part of popcorr MTB > let k13=k12-k25*k27 #Covariance MTB > let k15=sqrt(k10) #St. dev of x MTB > let k17=sqrt(k11) #St. dev of y MTB > let k14=k13/k15 MTB > let k14=k14/k17 #Corr(x, y) MTB > print k13-k18 Data Display covxy corr sdx sumpx sdy sumpy -0.858800 -0.0646707 3.96549 1.00000 3.34879 1.00000 Covariance Correlation Std. deviation of x Sum of x probabilities Std. deviation of y Sum of y probabilities MTB > end MTB > execute 'tb2973' Executing from file: tb2973.MTB MTB > #tb2973.mtb Part of 251popcorr MTB > let k30=30 MTB > let k31=10 MTB > let k32=1 MTB > execute 'tb2s973' 5 Executing from file: tb2s973.MTB 7 251solngr2-051 4/12/05 MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > #tb2s973.mtb Subroutine # Part of popcorr let ck30=ck31 let ck30(6)=c16(k32) let ck30(7)=c25(k32) let ck30(8)=c26(k32) let k30=k30+1 let k31=k31+1 let k32=k32+1 end #tb2s973.mtb Subroutine # Part of popcorr let ck30=ck31 let ck30(6)=c16(k32) let ck30(7)=c25(k32) let ck30(8)=c26(k32) let k30=k30+1 let k31=k31+1 let k32=k32+1 end #tb2s973.mtb Subroutine # Part of popcorr let ck30=ck31 let ck30(6)=c16(k32) let ck30(7)=c25(k32) let ck30(8)=c26(k32) let k30=k30+1 let k31=k31+1 let k32=k32+1 end #tb2s973.mtb Subroutine # Part of popcorr let ck30=ck31 let ck30(6)=c16(k32) let ck30(7)=c25(k32) let ck30(8)=c26(k32) let k30=k30+1 let k31=k31+1 let k32=k32+1 end #tb2s973.mtb Subroutine # Part of popcorr let ck30=ck31 let ck30(6)=c16(k32) let ck30(7)=c25(k32) let ck30(8)=c26(k32) let k30=k30+1 let k31=k31+1 let k32=k32+1 end let c35=c18 let c35(6)=k18 let c35(7)=k25 let c35(8)=k26 let c36=c27 let c36(6)=k27 let c37=c28 let c37(6)=k28 of tb2973 of tb2973 of tb2973 of tb2973 of tb2973 8 251solngr2-051 4/12/05 MTB > print c30-c37 Data Display Row 1 2 3 4 5 6 7 8 C30 0.11 0.10 0.12 0.00 0.00 0.33 0.99 2.97 C31 0.08 0.14 0.08 0.00 0.00 0.30 1.50 7.50 (The final table) C32 0.16 0.10 0.11 0.00 0.00 0.37 4.44 53.28 C33 0 0 0 0 0 0 0 0 C34 0 0 0 0 0 0 0 0 C35 0.35 0.34 0.31 0.00 0.00 1.00 6.93 63.75 C36 0.35 1.02 2.79 0.00 0.00 4.16 C37 0.35 3.06 25.11 0.00 0.00 28.52 MTB > write c30-c37. Data Display (WRITE) 0.11 0.10 0.12 0.00 0.00 0.33 0.99 2.97 0.08 0.14 0.08 0.00 0.00 0.30 1.50 7.50 0.16 0.10 0.11 0.00 0.00 0.37 4.44 53.28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.35 0.34 0.31 0.00 0.00 1.00 6.93 63.75 0.35 1.02 2.79 0.00 0.00 4.16 * * 0.35 3.06 25.11 0.00 0.00 28.52 * * * NOTE * Column lengths not equal. MTB > end. MTB > execute 'tb3973' Executing from file: tb3973.MTB MTB > #tb3973.mtb final printout for 252popcorr MTB > MTB > write c20-c24; SUBC> replace. Data Display (WRITE) 0.33 0.90 3.24 0.00 0.00 0.4 2.1 3.6 0.0 0.0 1.92 3.60 11.88 0.00 0.00 0 0 0 0 0 (Products for E(x y) again) 0 0 0 0 0 MTB > end MTB > end Sample Correlation Problem MTB > exec '251samcov' Executing from file: 251samcov.MTB MTB > #251samcov Computes sample variances MTB > # and covariances using 'var973' MTB > #Input is x column in c40, y column in c42 MTB > # Example in Covex MTB > name c40 'x' MTB > name c41 'xsq' MTB > name c42 'y' MTB > name c43 'ysq' MTB > name c44 'xy' MTB > name k40 'sumx' MTB > name k41 'sumx2' MTB > name k42 'sumy' MTB > name k43 'sumy2' MTB > name k44 'sumxy' MTB > name k45 'n' MTB > name k46 'xbar' 9 251solngr2-051 4/12/05 MTB > name k47 'ybar' MTB > name k48 'svarx' MTB > name k49 'svary' MTB > name k50 'scovxy' MTB > name k51 'sx' MTB > name k52 'sy' MTB > name k53 'rxy' MTB > name k54 'rxy2' MTB > let c1=c40 (Input for sample stats in c40, c42) MTB > execute 'var973' Executing from file: var973.MTB (An exec for computing variance is used for both x and y.) MTB > #var973.mtb MTB > #computes sample variance of data in C1. MTB > let k1=sum(c1) MTB > let c2=c1 * c1 MTB > let k2=sum(c2) MTB > let k3=count(c1) MTB > let k4=k1/k3 #mean MTB > let k5=k3*k4*k4 MTB > let k5=k2-k5 MTB > print k5 Data Display svar MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB 152.875 > > > > > > > > > > > > > > name k1 'sum' name k2 'sumsq' name k3 'count' name k4 'smean' name k5 'svar' name k6 'sdev' name k7 'DF' name k8 'sterr' let k7 = k3 - 1 let k5 = k5/k7 let k6 = sqrt(k5) let k8 = k5/k3 let k8 = sqrt(k8) describe c1 (DF = n – 1) (Standard error) Descriptive Statistics: C1 Variable C1 N 8 N* 0 Mean 5.13 SE Mean 1.65 StDev 4.67 Minimum 1.00 Q1 1.25 Median 3.50 Q3 9.00 Maximum 14.00 MTB > print c1-c2 Data Display Row 1 2 3 4 5 6 7 8 C1 14 4 6 2 1 1 3 10 (x and x squared) C2 196 16 36 4 1 1 9 100 10 251solngr2-051 4/12/05 MTB > print k1-k8 Data Display sum sumsq count smean svar 41.0000 363.000 8.00000 5.12500 21.8393 sdev DF sterr 4.67325 7.00000 1.65224 (A sample variance for x) MTB > end MTB > let C41=c2 MTB > let k40=k1 MTB > let k41=k2 MTB > let k46=k4 MTB > let k48=k5 MTB > let k51=k6 MTB > let c1=c42 (Input for sample stats in c40, c42) MTB > execute 'var973' Executing from file: var973.MTB (An exec for computing variance is used for both x and y.) MTB > #var973.mtb MTB > #computes sample variance of data in C1. MTB > let k1=sum(c1) MTB > let c2=c1 * c1 MTB > let k2=sum(c2) MTB > let k3=count(c1) MTB > let k4=k1/k3 #mean MTB > let k5=k3*k4*k4 MTB > let k5=k2-k5 MTB > print k5 Data Display svar MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB 70.0000 > > > > > > > > > > > > > name k1 'sum' name k2 'sumsq' name k3 'count' name k4 'smean' name k5 'svar' name k6 'sdev' name k7 'DF' name k8 'sterr' let k7 = k3 - 1 let k5 = k5/k7 let k6 = sqrt(k5) let k8 = k5/k3 let k8 = sqrt(k8) MTB > describe c1 Descriptive Statistics: C1 Variable C1 N 8 N* 0 Mean 3.50 SE Mean 1.12 StDev 3.16 Minimum -3.00 Q1 2.00 Median 4.50 Q3 5.75 Maximum 7.00 11 251solngr2-051 4/12/05 MTB > print c1-c2 Data Display Row 1 2 3 4 5 6 7 8 C1 2 4 2 5 7 6 5 -3 (y and y squared) C2 4 16 4 25 49 36 25 9 MTB > print k1-k8 Data Display sum sumsq count smean svar sdev DF sterr MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB > > > > > > > > > > > > > > > > > 28.0000 168.000 8.00000 3.50000 10.0000 3.16228 7.00000 1.11803 (A sample variance for y) end let C43=c2 let k42=k1 let k43=k2 let k45=k3 let k47=k4 let k49=k5 let k52=k6 let c44=c40*c42 let k44=sum(c44) let k50=k45*k46*k47 let k50=k44-k50 let k50=k50/k7 let k53=k51*k52 let k53=k50/k53 let k54=k53*k53 Print c40-c44 Data Display Row 1 2 3 4 5 6 7 8 x 14 4 6 2 1 1 3 10 xsq 196 16 36 4 1 1 9 100 y 2 4 2 5 7 6 5 -3 ysq 4 16 4 25 49 36 25 9 xy 28 16 12 10 7 6 15 -30 12 251solngr2-051 4/12/05 MTB > Print k40-k54 Data Display sumx sumx2 sumy sumy2 sumxy n xbar ybar svarx svary scovxy sx sy rxy rxy2 41.0000 363.000 28.0000 168.000 64.0000 8.00000 5.12500 3.50000 21.8393 10.0000 -11.3571 4.67325 3.16228 -0.768511 0.590609 (Sum of x) (Sum of x2) (Sum of y) (Sum of y2) (Sum of xy) (Sample size) (Sample mean of x) (Sample mean of y) (Sample variance of x) (Sample variance of y) (Sample covariance) (Std deviation for x) (Std deviation of y) (Sample correlation) (Sample correlation squared) 13