FORMULAS FOR THE 4TH EXAM One-sided (One-tailed) test: Lower tailed (Left-sided) Upper tailed (Right-sided) H0: population characteristics claimed constant value H0: population characteristics claimed constant value Ha: population characteristics < claimed constant value Ha: population characteristics > claimed constant value Two-sided (Two-tailed) test: H0: population characteristics = claimed constant value Ha: population characteristics claimed constant value Hypothesis testing for: A. Independent Samples I. Population characteristics: Difference between two population means, 1-2. 0 is the claimed constant. 1 and 2 _ sample means for X's and Y's, respectively. m and n are the sample sizes for X's and Y's, respectively. the population variances for X's and Y's, respectively. Assumption both popn. distributions are normal and are known 12 , 22 , 2 2 both popn. distributions are normal and at least one sample size is small where unknown , 2 1 2 2 are assumed to be different ( ) and degrees of 2 1 2 2 2 s12 s 22 m n freedom, v 2 2 s12 / m s 22 / n m 1 n 1 are 100(1-)% confidence Interval _ _ x y 0 z 12 22 _ _ x y z /2 m n 22 n x y 0 z 2 s1 s 22 m n _ _ x y 0 t 2 s1 s 22 m n s2 s2 _ _ x y z / 2 1 2 m n _ _ x y 0 , t 1 1 sp m n (m 1) s12 (n 1) s 22 2 sp mn2 1 1 _ _ x y t / 2 ;v s p m n _ _ s12 s 22 _ _ x y t / 2 ; v m n both popn. distributions are normal and at least one sample size is small where unknown 12 , 22 are assumed to be the same ( 1 = 2 ) and degrees of 2 12 and 22 Test statistics m 2 1 y are the s12 and s 22 are the sample variances for X's and Y's, respectively. 12 large sample size (m>40 and n>40) and are unknown _ are the population means for X's and Y's, respectively. x and 2 freedom, v m n 2 is used to look up the critical values. II. Population characteristics: Difference between two population proportions, p1-p2. ^ ^ p0 is the claimed constant. p 1 and p 2 are the sample proportions for X's and Y's, respectively. p1 and p 2 are the population proportions for X's and Y's, respectively. m and n are the large sample sizes for X's and Y's, respectively. The ^ estimator for p is p X Y m ^ n ^ p1 p2 . mn mn mn ^ ^ Test statistics: 100(1-)% large sample confidence Interval: p1 p 2 z / 2 z ^ ^ p p 1 2 p0 , ^ ^ 1 1 p1 p m n ^ ^ ^ ^ p1 1 p1 p 2 1 p 2 m n 12 / 22 or standard deviations, 1 / 2 . 12 and 22 are the population variances for X's and Y's, III. Population characteristics: Ratio of the two population variances, X and Y's are random sample from a normal distribution. 2 respectively. s1 and Y's, respectively. Test statistics: B. F s 22 are the sample variances for X's and Y's, respectively. m and n are the sample sizes for X's and s12 / s 22 12 s12 / s 22 s12 2 2 . 100(1-)% confidence Interval for : / 1 2 F / 2;m 1,n 1 22 F1 / 2;m 1,n 1 s 22 Dependent Samples- Paired Data Population characteristics: Difference between two population means, D =1-2. 0 is the claimed constant. Assumption: the difference distribution should be normal. _ Test statistics: t d 0 sD / n _ where D=X-Y and d and s D are the corresponding sample average and the standard deviation of D. Both X and Y must have n observations. The degrees of freedom to look at the table is v=n-1 _ 100(1-)% confidence Intervals with the same assumptions: d t / 2;n 1 sD n Decision can be made in one of the two ways in Parts I, II, and III for comparing two populations: a. Let z* or t* be the computed test statistic values. if test statistics is z if test statistics is t Lower tailed test P-value = P(z<z*) P-value = P(t<t*) Upper tailed test P-value = P(z>z*) P-value = P(t>t*) Two-tailed test P-value = 2P(z>|z*|) P-value = 2P(t > |t*| ) In each case, you can reject H0 if P-value and fail to reject H0 (accept H0) if P-value > Rejection region for level test: test statistics is z test statistics is t Lower tailed test z -z t -t;v Upper tailed test z z t t;v Two- tailed test z -z/2 or z z/2 t -t/2;v or t t/2;v Do not forget that, F1-/2;m-1,n-1 = 1 / F/2;n-1,m-1 for the F-table b. Model: X ij i ij treatment). X ij : observations , or X ij i ij i : ith treatment mean, test statistics is F F F1-;m-1,n-1 F F;m-1,n-1 F F1-/2;m-1,n-1 or F F/2;m-1,n-1 Single Factor ANOVA : where i=1,...,I (number of treatments), j=1,...,J (number of observations in each i i : ith treatment effect. ij : errors which are normally distributed with mean, 0 and the constant variance, 2 . Assumptions: X ij 's are independent ( ij 's are independent). ij 's are normally distributed with mean, 0 and the constant variance, 2 . X ij 's are normally distributed with mean, i Hypothesis: Or H 0 : 1 ... I 1 I H0 :i 0 for all i versus versus and the constant variance, 2. H a : at least one i j for i j where i .and j 's are treatment means. H a : i 0 for at least one i where i is the ith treatment effect. I Analysis of Variance Table: n Ji and j=1,…,Ji . n is used when different number of samples obtained from each population. i 1 Source Treatments Error df SS MS F Prob > F I-1 SSTr MSTr = SStr / dftreatment MSTr / MSE P-value I(J-1) SSE MSE = SSE / dferror or n-I Total IJ-1 or SSTotal n-I where df is the degrees of freedom, SS is the sum of squares, MS is the mean square. Reject H 0 if the P-value or if the test statistics F > F;I-1,error df. If you reject the null hypothesis, you need to use multiple comparison test such as Tukey-Kramer Confidence Interval for ci i : c MSE ci2 _ i x i t / 2; I ( J 1) J i i Simple Linear Regression and Correlation PEARSON’S CORRELATION COEFFICIENT (r) :measures X and Y must be numerical variables, H 0 : XY 0 (the true correlation is zero) versus The formal test for the correlation has the test statistics the strength & direction of the linear relationship between X and Y. H a : XY 0 t r n2 1 r (the true correlation is not zero). with n-2 degrees of freedom. 2 Minitab gives you the following output for simple linear regression: parameters 0 and Predictor (y) Coef SE Coef T p-value for Ha : 0 0 p-value for H a : 1 0 Constant b0 sb0 Independent(x) b1 s b1 b1 s b1 Analysis of Variance Source DF Residual Error Total where n observations included, the 1 are constants whose "true" values are unknown and must be estimated from the data. b0 s b0 Regression Y 0 1 x e , SS P MS F 1 SSR MSR=SSR/1 n-2 n-1 SSE SST MSE=SSE/(n-2) MSR/MSE P p-value for H a : 1 0 Coefficient of Determination, R2 : Measure what percent of Y's variation is explained by the X variables via the regression model. It tells us the proportion of SST that is explained by the fitted equation. Note that SSE is the proportion of SST that is not explained by the model. SSR SSE 1 . SST SST H 0 : 1 10 R2 H a : 1 10 Only in simple linear regression, R 2 r 2 where r is the Pearson’s correlation coefficient. Test statistics: t b1 10 s b1 where 10 is the value slope is compared with. Decision making can be done by using error degrees of freedom for any other t test we have discussed before (either using the P-value or the rejection region method). The 100(1-)% confidence interval for 1 is b1 t / 2;df sb 1