Introductory Biostatistics Equation Sheet Sample Mean ðĨĖ = ∑ðĨ ð ð= ∑ðĨ ð Population Mean Range Range = xL-xS where xL and xS are the largest and smallest scores in the data set respectively. Variance Parameter σ2 ∑(x-μ)2 = N Variance Statistic s 2 ïĨïĻ x ï x ïĐ ï― 2 n ï1 Computational Form s2 = ( ∑ x)2 n ð−1 ∑ x2 - Standard Deviation sï― ïĨ (x ï x) 2 n ï1 ( ∑ x)2 2 √∑ x − n s= n−1 Convert x values to Z scores Zï― xïï ïģ Standard Error for sample mean σx = σ √n where σ is the population standard deviation and n is the sample size. Sampling Distribution Z Score Zï― x ïμ σ n where ðĖ is the sample mean, µ is the population mean, σ is the population standard deviation, and n is the sample size. Binomial Probability P(y successes) ï― n! ï° y (1 ï ï° ) n ï y y!(n ï y)! Where n is the number of times process is replicated, p is the P(success), y is the number of successes of interest, and ‘!’ means factorial. Normal Curve Approximation for Binomial Distribution ðĖ − ð ð= √ð(1 − ð) ð where ðĖ is the sample proportion of successes, p is the population proportion and n is the sample size. One Sample Z test Z= Ė − ð0 ð ð/√ð where ðĨĖ is the sample mean, ð0 is the hypothesized mean of the population, ð is the population standard deviation, and n is the sample size. One Sample t test t= Ė -μ0 X df=n-1 s/√n where ðĨĖ is the sample mean, μ0 is the hypothesized mean of the population, ð is the sample standard deviation, and n is the sample size One Sample Z test for Proportions (Approximate Test) pĖ- π0 ð§= √π0(1-π0) n where ðĖ is the sample proportion of successes, π0 is the hypothesized population proportion, and n is the sample size. Two-sided Confidence Interval for a mean when sigma is known ðĨĖ ± ð ð √ð where ðĨĖ is the sample mean, Z is the appropriate Z value for the specified interval, ð is the population standard deviation, and n is the sample size. Two-sided Confidence Interval for a mean when sigma is not known ðĨĖ ± ðĄ ð √ð where ðĨĖ is the sample mean, t is the appropriate value from the t table with n − 1 degrees of freedom, ð is the sample standard deviation, and n is the sample size Two-sided Confidence Interval for a proportion (Approximate Method) pĖ ïą Z pĖ(qĖ) n Where ðĖ is the proportion of successes in the sample, ðĖ is the proportion of failures in the sample, n is the sample size and Z is the appropriate Z value for the specified interval. Paired t-Test ðĄ= dĖ - μd0 Df=n-1 sd/√n where ðĖ is the sample mean of the difference scores, μd0 is the hypothesized value of μd (usually but not always zero), sd the sample standard deviation of the difference scores, and n the number of paired observations (or difference scores). Confidence Interval for Paired Mean Difference ðĖ ± ðĄ sd √n where ðĖ is the sample mean of the difference scores, sd is the sample standard deviation of the difference scores and t is the appropriate value from the t table with n − 1 degrees of freedom. McNemar’s Test (Approximate Method) ð= ðĖ − 0.5 0.5/√ð where ðĖ is the proportion of pairs in which the designated treatment has the advantage and n is the number of pairs utilized in the analysis. Independent Samples t-Test t= xĖ 1 - xĖ 2 1 1 √ðp2( + ) Df = n1 + n2 – 2 n1 n2 where ðĨĖ 1 and ðĨĖ 2 represent the means of samples one and two, n1 and n2 represent the number of observations in each of the two samples, and Sp2 is an estimate of the population variance based on an averaging or pooling of the information in the two samples. Calculation of sp2 Sp2 (n1 -1)s12+(n2 -1)s22 = n1+n2 -2 Where s12 and s22 represent the variances of the first and second samples respectively and , n1 and n2 represent the number of observations in each of the two samples. Alternate Calculation of sp2 (∑ x12 Sp2 = ( ∑ x1)2 ( ∑ x2)2 ) + (∑ x 22 n1 n2 ) n1+n2 -2 The Confidence Interval for the difference between means (xĖ 1 - xĖ 2) ± t√sp2( 1 1 + ) n1 n2 Where t is the appropriate t value with n1 + n2 − 2 degrees of freedom, ðĨĖ 1 and ðĨĖ 2 represent the means of samples one and two, n1 and n2 represent the number of observations in each of the two samples, and Sp2 is an estimate of the population variance based on an averaging or pooling of the information in the two samples. Independent Samples Z Test for Proportions (Approximate Test) pĖ1 - pĖ2 ð= √pĖ(1 − pĖ)( 1 + 1 ) n1 n2 Where ðĖ 1 and ðĖ 2 represent the proportion of successes in the first and second samples respectively, ðĖ represents the overall proportion of successes, and n1 and n2 represent the two sample sizes. Where ðĖ 1 and ðĖ 2 are the proportions of successes in the two samples, ðĖ1 and ðĖ2 are the proportions of failures in the two samples (i.e., ðĖ1 = 1 − ðĖ 1, ðĖ2 = 1 − ðĖ 2), n1 and n2, are the respective sample sizes and Z is the appropriate Z value for the specified interval. Chi Square Test χ2 = ∑ (O-E)2 df=(j - 1)(k − 1) E where O and E are referred respectively to the observed and expected frequencies and j and k are the number of rows and columns in the table. Expected Frequency ðļ= (ðð )(ððķ) ð where NR is the row total for the cell whose expected frequency is being calculated and NC is the column total for the same cell. Chi Square Goodness of Fit Test χ2 = ∑ (O-E)2 E df=(k − 1) where O and E are referred respectively to the observed and expected frequencies and k is the number categories. Expected Frequency E = N*πk0 where N is the total number in sample and πk0 is the hypothesized proportion for each category. Analysis of Variance (ANOVA) F statistic F= ðððĩ ððð Mean Square Within (MSW) or Mean Square Error (MSE) MSW = ððð ð−ð Where SSW is the sum of squares within groups, N is the total number of observations, and k is the number of groups. The quantity N − k is termed the denominator degrees of freedom. Sums of Squares Within (SSW) or Sums of Squares Error (SSE) SSW = SS1 + SS2 + · · · + SSk, where k is the number of groups The sum of squares for a given group can be calculated by SSk = ∑(x-xĖ k)2 or equivalently ððð = ∑ x2 − ( ∑ x)2 n Mean Square Between (MSB) MSB = ðððĩ ð−1 where SSB is the sum of squares between and k is the number of groups. The quantity k − 1 is termed the numerator degrees of freedom. The Sum of Squares Between (SSB) Ė k-X Ė )2 SSB = ∑ nk(X Or equivalently ðððĩ = 1 ( ∑ni=1 xi1)2 n1 + 2 ( ∑ni=1 xi2)2 n2 + 3 ( ∑ni=1 xi3)2 n3 - ( ∑ x)2 N If you are only given group means, you can find the overall mean to use in the SSB calculation using the following equation ðĖ = XĖ 1(n1)+XĖ 2(n2)+…+XĖ k(nk) ð Pearson Correlation Coefficient The conceptual equation ð= ∑(ðĨ − ðĨĖ )(ðĶ − ðĶĖ ) √[∑(ðĨ − ðĨĖ )2 ][∑(ðĶ − ðĶĖ )2 ] where r is the sample correlation coefficient, x and y are the two variables to be correlated and n is the number of paired observations. The computational equation ∑ ðĨðĶ − ð= (∑ ðĨ)(∑ ðĶ) ð 2 2 (∑ ðĨ) (∑ ðĶ) √[∑ ðĨ2 − ∑ ðĶ2 − ] [ ð ð ] Hypothesis Test of H0 : ρ = 0 ðĄ= r √1-r n-2 2 where r is the Pearson correlation coefficient and n is the number of pairs of observations. The degrees of freedom for the test critical value is n − 2. Linear Regression Simple Regression Prediction Equation Ė = a + bX Y Ė is the predicted value of Y, a is the intercept, b is the slope, and X is the independent variable. Where Y Calculating Residual Ė Residual = Y - Y Ė is the predicted value of Y. Where Y is the observed value of Y and Y Multiple Regression Prediction Equation Ė = a+b1X1+b2X2 + …+bkXk Y Ė is the predicted value of Y, a is the intercept, bk is the slope for the kth variable, and Xk is the Where Y kth independent variable.