Uploaded by 2750977665

List of Formulas

advertisement
List of Formulas
e = 2.7183
pth Percentiles
• Sort the data in ascending orders
• Find index point: i =
p
∗ n, where n is the number of observations
100
• If i is not a whole number, round up to the next whole number. Then the data value of the position is
the pth percentile
• If i is a whole number, the average of the values in the ith and i + 1th position is the pth percentile
Coefficient of Variation
• s: sample standard deviation
• x̄: sample mean
• Coefficient of variation: CV =
s
x̄
Skewness
• Skewness for sample data (textbook version) =
• Pearson’s Second Skewness = 3 ∗
P xi − x̄ 3
n
(
) , where n is sample size
(n − 1)(n − 2)
s
mean − median
standard deviation
Chebyshev’s Theorem
1
For any k > 1, at least (1 − 2 ) of the items in any data set will be within k standard deviations of the
k
mean.
Empirical Rule
• 68.26% of the values of a normal random variable are within ±1 standard deviation of its mean.
• 95.44% of the values of a normal random variable are within ±2 standard deviation of its mean.
• 99.72% of the values of a normal random variable are within ±3 standard deviation of its mean.
Independent Random Variables
Definition: Two random variables X and Y are independent if, for any two sets A and B of real numbers,
P (X ∈ A & Y ∈ B) = P (X ∈ A)P (Y ∈ B).
Properties: If random variables X and Y are independent,
• for any real numbers x and y, it must be true that P (X ≤ x & Y ≤ y) = P (X ≤ x)P (Y ≤ y);
1
• E(XY ) = E(X)E(Y ).
Median of a Random Variable
Definition: For any random variable X, a median of the distribution of X is defined to be a point m such
that P (X ≤ m) ≥ 0.5 and P (X ≥ m) ≥ 0.5.
2
Binomial distribution
• X = number of successes occurring in the n trials
• p = probability of success in each trail
• n: the number of trials
• Probability mass function: f (x) =
n
k
px (1 − p)n−x
• Mean: E(X) = np
• Variance: V ar(X) = np(1 − p)
Geometric distribution
• X = number of trails needed to achieve the first successful trail
• p = probability of success in each trail
• Probability mass function: f (x) = (1 − p)x−1 p
• Cumulative distribution function: F (x) = 1 − (1 − p)x
• Mean: E(X) =
1
p
• Variance: V ar(X) =
1−p
p2
Hypergeometric distribution
• n = number of trials
• r = number of elements in the population labeled success
• N = number of elements in the population
• Mean: E(X) = n ∗
r
N
• Variance: V ar(X) = n ∗
r
r N −n
(1 − )(
)
N
N N −1
Poisson distribution
• X = the number of occurrences in an interval
• λ = mean number of occurrences in an interval, rate of event
• probability mass function f (x|λ) =
e−λ λx
x!
number of elements in the population
• Mean: E(X) = λ
• Variance: V ar(X) = λ
Normal distribution
µ = mean
σ = standard deviation
π = 3.1416
Probability density function: f (x) =
1
2
2
√ e−(x−µ) /2σ
σ 2π
3
Finite Population Correction Factor
When sampling without replacement for more than
r 5% of a finite population, need to adjust the standard error
N −n
by using the finite population correction factor
, where N is the population size and n is the sample
N −1
size.
Estimating the Proportionq
The standard error of p̄ is σp̄ = p(1−p)
. Because the true proportion is not know, the estimated standard
n
q
p̄)
error is σ̂p̄ = p̄(1−
n
Sampling Distribution of X̄1 − X̄2
• When population standard deviations σ1 and σ2 are known: test statistic follows normal distribution with
standard error
s
σ12
σ2
σX̄1 −X̄2 =
+ 2
n1
n2
• When population standard deviations are unknown and σ1 = σ2 : test statistic follows t-distribution with
degree of freedom n1 + n2 − 2. Use the pooled variance to estimate the standard error of the sampling
distribution of X̄1 − X̄2
r
1
1
+
),
σX̄1 −X̄2 = Sp2 (
n1
n2
where Sp2 =
(n1 − 1)S12 + (n2 − 1)S22
is the pooled variance
(n1 − 1) + (n2 − 1)
• When population standard deviations are unknown and σ1 6= σ2 : test statistic follows t-distribution with
2
2
S1
S22
+
n1
n2
. Always round down the degree of freedom in the calculation. The
degree of freedom
2
S1 2
S2
( n1 )
( n22 )2
+
n1 − 1 n2 − 1
estimated standard error of the sampling distribution of X̄1 − X̄2 is
s
S12
S2
σX̄1 −X̄2 =
+ 2
n1
n2
ANOVA
k: number of groups (treatments)
nj : sample size of group j
nT : total sample size of all groups together
Pk
• Mean square due to treatment (MSTR) =
Pk
• Mean square error (MSE) =
j=1 (nj
j=1
¯ )2
nj (x̄j − x̄
k−1
− 1)Sj2
nT − k
Hypothesis Test to Determine the Significance of the Correlation Coefficient
H0 : ρ = 0
H1 : ρ 6= 0
4
Test statistic t = r
r
1 − r2
n−2
follows t-distribution with degree of freedom n − 2
Simple Linear Regression
• Least square estimations:
b1 =
P
P
P
P
(xi − x̄)(yi − ȳ)
n xi yi − ( xi )( yi )
P
P
P
=
(xi − x̄)2
n x2i − ( xi )2
b0 = ȳ − b1 x̄
• Sampling distribution of b1 : estimated standard error is
Se
Se
= pP 2
,
Sb1 = pP
(xi − x̄)2
xi − n(x̄)2
r
where Se =
SSE
n−2
• Confidence interval estimation for an average value of y:
v
u
u1
(xp − x̄)2
P
+
,
CI = ŷ ± tα/2 Se u
tn P
( xi )2
x2i −
n
where xp is the given value of the independent variable
• Prediction interval estimation for an individual value of y:
v
u
u
1
(xp − x̄)2
P
P I = ŷ ± tα/2 Se u
t1 + n + P
( xi )2
2
xi −
n
5
Table A: Standard Normal Distribution
6
7
Table B: t-Distribution Critical Values
8
Table C: F -Distribution Critical Values
9
10
Download