251progscor 3/25/06 Finding a Sample correlation. Doing Old Computational Problem 1b on Minitab. Downing and Clark (formerly pg. 348 now posted at end of 251hwkadd) Old Computational Problem 1: This is obviously sample data, so we compute only a sample covariance and correlation. b) Compute Covx, y and Corr x, y . x 34 26 9 30 47 10 34 34 45 10 47 32 47 8 45 y 6 57 89 60 95 42 31 28 90 25 45 23 52 95 48 The program Samcov will compute the sample covariance and correlation for a set of x and y points. The setup is as below. The joint probability is in rows 1-5 of C10-C14. x is in C15 and y is in C17. Column Number Column Label C40 x Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Row 9 Row 10 Row 11 Row 12 Row 13 Row 14 Row 15 34 26 9 30 47 10 34 34 45 10 47 32 47 8 45 C41 C42 y 6 57 89 60 95 42 31 28 90 25 45 23 52 95 48 The space allocation is as follows for computation of the covariance and correlation. C41 x x2 (Column) C42 y (Column) C40 C43 y 2 (Column) C44 xy (Column) K44 x x y y xy K45 K46 K47 n x y K48 s x2 K49 s 2y K50 s xy K51 sx K52 sy K53 r xy K54 r xy2 K40 K41 K42 K43 The space allocation is as follows for computation of the variances. C1 Input is set up by Samcov x 2 C2 x K1 K2 x x K3 K4 n x K5 K6 K7 s2 s n 1 K8 sx 2 s n (Column) 2 2 Set up a storage area for the program modules, called ‘exec’s by the current version of Minitab. 1 251progscor 3/25/06 Load the following modules into the storage area that you have set up for the execs. Each of these is a separate document in ‘txt’ format. Note that # introduces a comment. The first line gives the name of the file. #251samcov.txt Computes sample variances # and covariances using 'var973' # Input is x column in C40 # y column in C42 name c40 'x' name c41 'xsq' name c42 'y' name c43 'ysq' name c44 'xy' name k40 'sumx' name k41 'sumx2' name k42 'sumy' name k43 'sumy2' name k44 'sumxy' name k45 'n' name k46 'xbar' name k47 'ybar' name k48 'svarx' name k49 'svary' name k50 'scovxy' name k51 'sx' name k52 'sy' name k53 'rxy' name k54 'rxy2' let c1=c40 execute 'var973.txt' let C41=c2 let k40=k1 let k41=k2 let k46=k4 let k48=k5 let k51=k6 let c1=c42 execute 'var973.txt' let C43=c2 let k42=k1 let k43=k2 let k45=k3 let k47=k4 let k49=k5 let k52=k6 let c44=c40*c42 let k44=sum(c44) let k50=k45*k46*k47 let k50=k44-k50 let k50=k50/k7 let k53=k51*k52 let k53=k50/k53 let k54=k53*k53 Print c40-c44 Print k40-k54 end #var973.mtb #computes sample variance of data in C1. let k1=sum(c1) let c2=c1 * c1 let k2=sum(c2) let k3=count(c1) let k4=k1/k3 #mean let k5=k3*k4*k4 let k5=k2-k5 print k5 name k1 'sum' name k2 'sumsq' name k3 'count' name k4 'smean' name k5 'svar' name k6 'sdev' name k7 'DF' name k8 'sterr' let k7 = k3 - 1 let k5 = k5/k7 let k6 = sqrt(k5) let k8 = k5/k3 let k8 = sqrt(k8) describe c1 print c1-c2 print k1-k8 end To use these execs, click on the command area of the screen, open the ‘editor’ pull-down menu and select ‘enable commands.’ Then open the ‘file’ pull-down menu, select ‘other files,’ then ‘run an exec’. Leave the number of times to execute at one and click on ‘select file.’ Locate samcov – you may have to type it in – and hit ‘open.’ This will start the execs running. My results follow with comments. The information given at the beginning of the document was entered by hand before the execs were run. Explanations are indented. ————— 3/25/2006 1:32:15 AM ———————————————————— Welcome to Minitab, press F1 for help. Executing from file: C:\Documents and Settings\rbove\My Documents\Minitab\251samcov.txt 2 251progscor 3/25/06 Executing from file: var973.txt Data Display K5 3105.73 Var 973 is called twice by the main program to compute the sample variances. The display above is the variance of x. Descriptive Statistics: C1 Variable C1 N 15 N* 0 Mean 30.53 SE Mean 3.85 StDev 14.89 Minimum 8.00 Q1 10.00 Median 34.00 Q3 45.00 Maximum 47.00 Var 973 uses the command ‘describe C1’ to get statistics on x. The display above is the value of n, s the number of blank lines, the mean, s x x or the standard error of the mean, and the 5-number n summary. Data Display Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 C1 34 26 9 30 47 10 34 34 45 10 47 32 47 8 45 C2 1156 676 81 900 2209 100 1156 1156 2025 100 2209 1024 2209 64 2025 The display above consists of the x and x 2 columns used in the variance computation. Data Display sum sumsq count smean svar sdev DF sterr 458.000 17090.0 15.0000 30.5333 221.838 14.8942 14.0000 3.84567 The display above consists of x , x 2 , n , x , s x2 , s x , n 1 and s x s . n Executing from file: var973.txt Data Display svar 11465.6 The display above is the variance of y. Descriptive Statistics: C1 Variable C1 N 15 N* 0 Mean 52.40 SE Mean 7.39 StDev 28.62 Minimum 6.00 Q1 28.00 Median 48.00 Q3 89.00 Maximum 95.00 3 251progscor 3/25/06 Var 973 uses the command ‘describe C1’ to get statistics on y. The display above is the value of n, sy the number of blank lines, the mean, s y or the standard error of the mean, and the 5-number n summary. Data Display Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 C1 6 57 89 60 95 42 31 28 90 25 45 23 52 95 48 C2 36 3249 7921 3600 9025 1764 961 784 8100 625 2025 529 2704 9025 2304 The display above consists of the y and y 2 columns used in the variance computation. Data Display sum sumsq count smean svar sdev DF sterr 786.000 52652.0 15.0000 52.4000 818.971 28.6177 14.0000 7.38905 The display above consists of Data Display Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x 34 26 9 30 47 10 34 34 45 10 47 32 47 8 45 xsq 1156 676 81 900 2209 100 1156 1156 2025 100 2209 1024 2209 64 2025 y 6 57 89 60 95 42 31 28 90 25 45 23 52 95 48 ysq 36 3249 7921 3600 9025 1764 961 784 8100 625 2025 529 2704 9025 2304 y, y 2 , n , y , s 2y , s y , n 1 and s y sy . n xy 204 1482 801 1800 4465 420 1054 952 4050 250 2115 736 2444 760 2160 The display above consists of the x , x 2 , y , y 2 and xy columns. 4 251progscor 3/25/06 Data Display sumx sumx2 sumy sumy2 sumxy n xbar ybar svarx svary scovxy sx sy rxy rxy2 458.000 17090.0 786.000 52652.0 23693.0 15.0000 30.5333 52.4000 221.838 818.971 -21.8714 14.8942 28.6177 -0.0513127 0.00263299 x , x , y , y , xy 2 The final display above consists of 2 , n , x , y , s x2 , s 2y , s xy , s x , s y , r xy and r xy2 . We now have all the information we need to fake the following obs 1 x 34 y 6 x2 1156 y2 36 xy 204 2 3 4 26 9 30 57 89 60 676 81 900 3249 7921 3600 1482 801 1800 5 47 10 95 42 2209 100 9025 6 1764 4465 420 7 8 34 34 31 28 1156 1156 961 784 1054 952 computation. 9 45 90 2025 8100 4080 10 11 10 47 25 45 100 2209 625 2025 250 2115 12 32 23 1369 529 736 13 47 52 2209 2704 2444 14 15 8 45 95 48 64 2025 9025 2304 760 2160 Total 463 786 17435 52652 23808 x 463 , x x 2 17435 , y 786 , y x 463 30.867 y n s x2 15 y 786 52.400 n sxy rxy s 2y 15 x 2 2 52652 and nx 2 n 1 y 2 ny 2 n 1 xy 23808 . 17435 1530 .867 2 224 .5524 14 52652 1552 .400 2 818 .9714 14 xy nx y 23808 1530.867 52.400 453 .2 32.3714 n 1 sxy sx s y 14 14 32 .3714 224 .5514 818 .9714 and 0.07548 . 5