# Introductory Biostatistics Equation Sheet ```Introductory Biostatistics Equation Sheet
Sample Mean
𝑥̅ =
∑𝑥
𝑛
𝜇=
∑𝑥
𝑁
Population Mean
Range
Range = xL-xS
where xL and xS are the largest and smallest scores in the data set respectively.
Variance Parameter
σ2
∑(x-μ)2
=
N
Variance Statistic
s
2
 x  x 

2
n 1
Computational Form
s2 =
( ∑ x)2
n
𝑛−1
∑ x2 -
Standard Deviation
s
 (x  x)
2
n 1
( ∑ x)2
2
√∑ x −
n
s=
n−1
Convert x values to Z scores
Z
x

Standard Error for sample mean
σx =
σ
√n
where σ is the population standard deviation and n is the sample size.
Sampling Distribution Z Score
Z
x μ
σ n
where 𝑋̅ is the sample mean, &micro; is the population mean, σ is the population standard deviation,
and n is the sample size.
Binomial Probability
P(y successes) 
n!
 y (1   ) n  y
y!(n  y)!
Where n is the number of times process is replicated, p is the P(success), y is the number of successes of
interest, and ‘!’ means factorial.
Normal Curve Approximation for Binomial Distribution
𝑝̂ − 𝜋
𝑍=
√𝜋(1 − 𝜋)
𝑛
where 𝑝̂ is the sample proportion of successes, p is the population proportion and n is the
sample size.
One Sample Z test
Z=
̅ − 𝜇0
𝑋
𝜎/√𝑛
where 𝑥̅ is the sample mean, 𝜇0 is the hypothesized mean of the population, 𝜎 is the population
standard deviation, and n is the sample size.
One Sample t test
t=
̅-μ0
X
df=n-1
s/√n
where 𝑥̅ is the sample mean, μ0 is the hypothesized mean of the population, 𝑠 is the sample standard
deviation, and n is the sample size
One Sample Z test for Proportions (Approximate Test)
p̂- π0
𝑧=
√π0(1-π0)
n
where 𝑝̂ is the sample proportion of successes, π0 is the hypothesized population proportion,
and n is the sample size.
Two-sided Confidence Interval for a mean when sigma is known
𝑥̅ &plusmn; 𝑍
𝜎
√𝑛
where 𝑥̅ is the sample mean, Z is the appropriate Z value for the specified interval, 𝜎 is the population
standard deviation, and n is the sample size.
Two-sided Confidence Interval for a mean when sigma is not known
𝑥̅ &plusmn; 𝑡
𝑠
√𝑛
where 𝑥̅ is the sample mean, t is the appropriate value from the t table with n − 1 degrees of
freedom, 𝑠 is the sample standard deviation, and n is the sample size
Two-sided Confidence Interval for a proportion (Approximate Method)
p̂  Z
p̂(q̂)
n
Where 𝑝̂ is the proportion of successes in the sample, 𝑞̂ is the proportion of failures in the sample, n is
the sample size and Z is the appropriate Z value for the specified interval.
Paired t-Test
𝑡=
d̅ - μd0
Df=n-1
sd/√n
where 𝑑̅ is the sample mean of the difference scores, μd0 is the hypothesized value of μd (usually but not
always zero), sd the sample standard deviation of the difference scores, and n the number of paired
observations (or difference scores).
Confidence Interval for Paired Mean Difference
𝑑̅ &plusmn; 𝑡
sd
√n
where 𝑑̅ is the sample mean of the difference scores, sd is the sample standard deviation of the
difference scores and t is the appropriate value from the t table with n − 1 degrees of freedom.
McNemar’s Test (Approximate Method)
𝑍=
𝑝̂ − 0.5
0.5/√𝑛
where 𝑝̂ is the proportion of pairs in which the designated treatment has the advantage and n
is the number of pairs utilized in the analysis.
Independent Samples t-Test
t=
x̅1 - x̅2
1 1
√𝑆p2( + )
Df = n1 + n2 – 2
n1 n2
where 𝑥̅ 1 and 𝑥̅ 2 represent the means of samples one and two, n1 and n2 represent the number
of observations in each of the two samples, and Sp2 is an estimate of the population variance
based on an averaging or pooling of the information in the two samples.
Calculation of sp2
Sp2
(n1 -1)s12+(n2 -1)s22
=
n1+n2 -2
Where s12 and s22 represent the variances of the first and second samples respectively and , n1
and n2 represent the number of observations in each of the two samples.
Alternate Calculation of sp2
(∑ x12 Sp2 =
( ∑ x1)2
( ∑ x2)2
)
+
(∑
x
22 n1
n2 )
n1+n2 -2
The Confidence Interval for the difference between means
(x̅1 - x̅2) &plusmn; t√sp2(
1
1
+ )
n1 n2
Where t is the appropriate t value with n1 + n2 − 2 degrees of freedom, 𝑥̅ 1 and 𝑥̅ 2 represent the
means of samples one and two, n1 and n2 represent the number of observations in each of the
two samples, and Sp2 is an estimate of the population variance based on an averaging or
pooling of the information in the two samples.
Independent Samples Z Test for Proportions (Approximate Test)
p̂1 - p̂2
𝑍=
√p̂(1 − p̂)( 1 + 1 )
n1 n2
Where 𝑝̂ 1 and 𝑝̂ 2 represent the proportion of successes in the first and second samples respectively, 𝑝̂
represents the overall proportion of successes, and n1 and n2 represent the two sample sizes.
Where 𝑝̂ 1 and 𝑝̂ 2 are the proportions of successes in the two samples, 𝑞̂1 and 𝑞̂2 are the proportions of
failures in the two samples (i.e., 𝑞̂1 = 1 − 𝑝̂ 1, 𝑞̂2 = 1 − 𝑝̂ 2), n1 and n2, are the respective sample sizes and Z
is the appropriate Z value for the specified interval.
Chi Square Test
χ2 = ∑
(O-E)2
df=(j - 1)(k − 1)
E
where O and E are referred respectively to the observed and expected frequencies and j and k are the
number of rows and columns in the table.
Expected Frequency
𝐸=
(𝑁𝑅)(𝑁𝐶)
𝑁
where NR is the row total for the cell whose expected frequency is being calculated and NC is the column
total for the same cell.
Chi Square Goodness of Fit Test
χ2 = ∑
(O-E)2
E
df=(k − 1)
where O and E are referred respectively to the observed and expected frequencies and k is the number
categories.
Expected Frequency
E = N*πk0
where N is the total number in sample and πk0 is the hypothesized proportion for each category.
Analysis of Variance (ANOVA)
F statistic
F=
𝑀𝑆𝐵
𝑀𝑆𝑊
Mean Square Within (MSW) or Mean Square Error (MSE)
MSW =
𝑆𝑆𝑊
𝑁−𝑘
Where SSW is the sum of squares within groups, N is the total number of observations, and k is the
number of groups. The quantity N − k is termed the denominator degrees of freedom.
Sums of Squares Within (SSW) or Sums of Squares Error (SSE)
SSW = SS1 + SS2 + &middot; &middot; &middot; + SSk, where k is the number of groups
The sum of squares for a given group can be calculated by
SSk = ∑(x-x̅k)2
or equivalently
𝑆𝑆𝑊 = ∑ x2 −
( ∑ x)2
n
Mean Square Between (MSB)
MSB =
𝑆𝑆𝐵
𝑘−1
where SSB is the sum of squares between and k is the number of groups. The quantity k − 1 is termed
the numerator degrees of freedom.
The Sum of Squares Between (SSB)
̅k-X
̅ )2
SSB = ∑ nk(X
Or equivalently
𝑆𝑆𝐵 =
1
( ∑ni=1
xi1)2
n1
+
2
( ∑ni=1
xi2)2
n2
+
3
( ∑ni=1
xi3)2
n3
-
( ∑ x)2
N
If you are only given group means, you can find the overall mean to use in the SSB calculation using the
following equation
𝑋̅ =
X̅1(n1)+X̅2(n2)+…+X̅k(nk)
𝑁
Pearson Correlation Coefficient
The conceptual equation
𝑟=
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
√[∑(𝑥 − 𝑥̅ )2 ][∑(𝑦 − 𝑦̅)2 ]
where r is the sample correlation coefficient, x and y are the two variables to be correlated and n is the
number of paired observations.
The computational equation
∑ 𝑥𝑦 −
𝑟=
(∑ 𝑥)(∑ 𝑦)
𝑛
2
2
(∑ 𝑥)
(∑ 𝑦)
√[∑ 𝑥2 −
∑ 𝑦2 −
]
[
𝑛
𝑛 ]
Hypothesis Test of H0 : ρ = 0
𝑡=
r
√1-r
n-2
2
where r is the Pearson correlation coefficient and n is the number of pairs of observations. The degrees
of freedom for the test critical value is n − 2.
Linear Regression
Simple Regression Prediction Equation
̂ = a + bX
Y
̂ is the predicted value of Y, a is the intercept, b is the slope, and X is the independent variable.
Where Y
Calculating Residual
̂
Residual = Y - Y
̂ is the predicted value of Y.
Where Y is the observed value of Y and Y
Multiple Regression Prediction Equation
̂ = a+b1X1+b2X2 + …+bkXk
Y
̂ is the predicted value of Y, a is the intercept, bk is the slope for the kth variable, and Xk is the
Where Y
kth independent variable.
```