Statistics 50 Exam 1 Formulas

advertisement
Statistics 1 Formulas
Mean of numbers is the sum of the numbers divided by the number of numbers or in
n
formula form: x 
x
i
i 1
n
n
Standard Deviation of numbers is given by either of the next formula: s 
 (x  x )
i 1
2
i
n 1
The Median is the center number of a set of numbers.
The pth percentile (p between 0 and 100) of a set of numbers is a number with the
properties that at least p% of the numbers are less than or equal to the pth percentile and at
least (1-p)% of the numbers are greater than or equal to the pth percentile. The pth
percentile is found by applying the following steps:
For n data points, one can find the pth percentile using the following steps:
1.
2.
Write down the data in order from smallest to largest.
Find the desired percentage of the number of values n, that is, find k = p% of n.
If k is a positive integer (that is, equals exactly 1 or 2 or ... or n), then the pth percentile is
halfway between the kth and (k+1)st smallest values.
If k is not a positive integer, change k to the positive integer that is just above it; then pth
percentile is the kth smallest value.
The First Quartile (denoted by Q1) is the 25th Percentile.
The Third Quartile (denoted by Q3) is the 75th Percentile.
The Interquartile Range is Q3- Q1
The Range is the largest number- the smallest number
Pearson Correlation Coefficient Formula:
where
Explanatory E1
Explanatory E2
Total
Response R1
A
C
A+C
Response R2
B
D
B+D
Total
A+B
C+D
N
Risk of R1 for E1=A/(A+B)
Risk of R1 for E2=C/(C+D)
Relative Risk of R1 for E1 compared to E2=
A /( A  B) A(C  D)

C /(C  D) C ( A  B)
Odds of R1 to R2 for E1=A to B
Odds of R1 to R2 for E2=C to D
Odds Ratio of R1 to R2, for E1 compared to E2=
A / B AD

C / D BC
P(A or B) = P(A) + P(B) - P(A and B) for any two events
P(Not A) = 1- P(A) or P(A) = 1- P(Not A)
P(A and B) = P(A|B) P(B) or P(A and B) = P(B|A) P(A) for any two events
P(A and B) = P(A) P(B) if and only if A and B are independent events
Formula for the Mean of any discrete random variable:
Formula for the Variance of a discrete random variable:
Standard Deviation is the Square root of the Variance
The mean of a Binomial Random Variable is n p and the Variance is n p (1-p)
Binomial formula:
P(X = k) = {n!/k!(n-k)!}pk(1-p)n-k
Standardized score:
Value with a known z-score:
X=Zs+m
Central Limit Theorem for Means:
The population of sample means is approximately normally distributed for large
n, has a mean equal to the population mean, , and a standard deviation equal to
the population standard deviation divided by the square root of the sample size,

n
. If  is unknown, it can be replaced by s, the sample standard deviation.
Central Limit Theorem for Sums:
The population of sample sums for sums of n sample values taken from a
population with mean  and standard deviation  is approximately normally
distributed for large n, has mean n, and standard deviation n .
Central Limit Theorem for Proportions:
The population of sample proportions for samples of size n is approximately
normally distributed for large n, has a mean equal to the population proportion, p,
p(1  p)
and a standard deviation equal to
. If p is unknown, it can be replaced
n
by p̂ , the sample proportion.
General form for a Confidence Interval: Sample Estimate  Margin of Error
Confidence Interval for a Population Mean:
xz

where z is the standard normal value appropriate for the confidence
n
level. If  is unknown and the sample is large, it can be replaced by s, the sample
standard deviation. If  is unknown and the sample is small (<30) and the
population is normally distributed, the z in the formula must be replaced by a tvalue where the t has n-1 degrees of freedom.
Confidence Interval for a Population Proportion:
pˆ (1  pˆ )
where z is the standard normal value appropriate for the
pˆ  z
n
confidence level and p̂ is the sample proportion.
Standard Error of the Mean is

n
s
if the population standard deviation is unknown and is
n
if the population standard deviation is known.
z-statistic for testing hypotheses about population means:
(Sample Mean-Hypothesized Mean)/(

n
) where  is known
t-statistic for testing hypotheses about population means:
s
t=(Sample Mean-Hypothesized Mean)/(
) where t has n-1 degrees of freedom. This tn
statistic should be used if the population from which the sample is selected is normally
distributed and the sample
p-Value: P[|Normal|>|z-statistic|] or P[|t with n-1 D.F.|>|t-statistic|]
Chi-Square Statistic for Two-Way Tables:  2  
(O  E )2
E
D.F. for  2 in a Two-Way (#rows-1)(#columns-1)
Expected Number in a Cell of a Two-Way Table: E 
( RowTotal )(ColumnTotal )
GrandTotal
Linear Equation (intercept and slope) : Y=(Intercept)+(Slope)X
Linear Equation: Y=a+bX
Linear Model (generic): Y=a+bX+E where E is an error or residual term
Download