List of Formulas e = 2.7183 pth Percentiles • Sort the data in ascending orders • Find index point: i = p ∗ n, where n is the number of observations 100 • If i is not a whole number, round up to the next whole number. Then the data value of the position is the pth percentile • If i is a whole number, the average of the values in the ith and i + 1th position is the pth percentile Coefficient of Variation • s: sample standard deviation • x̄: sample mean • Coefficient of variation: CV = s x̄ Skewness • Skewness for sample data (textbook version) = • Pearson’s Second Skewness = 3 ∗ P xi − x̄ 3 n ( ) , where n is sample size (n − 1)(n − 2) s mean − median standard deviation Chebyshev’s Theorem 1 For any k > 1, at least (1 − 2 ) of the items in any data set will be within k standard deviations of the k mean. Empirical Rule • 68.26% of the values of a normal random variable are within ±1 standard deviation of its mean. • 95.44% of the values of a normal random variable are within ±2 standard deviation of its mean. • 99.72% of the values of a normal random variable are within ±3 standard deviation of its mean. Independent Random Variables Definition: Two random variables X and Y are independent if, for any two sets A and B of real numbers, P (X ∈ A & Y ∈ B) = P (X ∈ A)P (Y ∈ B). Properties: If random variables X and Y are independent, • for any real numbers x and y, it must be true that P (X ≤ x & Y ≤ y) = P (X ≤ x)P (Y ≤ y); 1 • E(XY ) = E(X)E(Y ). Median of a Random Variable Definition: For any random variable X, a median of the distribution of X is defined to be a point m such that P (X ≤ m) ≥ 0.5 and P (X ≥ m) ≥ 0.5. 2 Binomial distribution • X = number of successes occurring in the n trials • p = probability of success in each trail • n: the number of trials • Probability mass function: f (x) = n k px (1 − p)n−x • Mean: E(X) = np • Variance: V ar(X) = np(1 − p) Geometric distribution • X = number of trails needed to achieve the first successful trail • p = probability of success in each trail • Probability mass function: f (x) = (1 − p)x−1 p • Cumulative distribution function: F (x) = 1 − (1 − p)x • Mean: E(X) = 1 p • Variance: V ar(X) = 1−p p2 Hypergeometric distribution • n = number of trials • r = number of elements in the population labeled success • N = number of elements in the population • Mean: E(X) = n ∗ r N • Variance: V ar(X) = n ∗ r r N −n (1 − )( ) N N N −1 Poisson distribution • X = the number of occurrences in an interval • λ = mean number of occurrences in an interval, rate of event • probability mass function f (x|λ) = e−λ λx x! number of elements in the population • Mean: E(X) = λ • Variance: V ar(X) = λ Normal distribution µ = mean σ = standard deviation π = 3.1416 Probability density function: f (x) = 1 2 2 √ e−(x−µ) /2σ σ 2π 3 Finite Population Correction Factor When sampling without replacement for more than r 5% of a finite population, need to adjust the standard error N −n by using the finite population correction factor , where N is the population size and n is the sample N −1 size. Estimating the Proportionq The standard error of p̄ is σp̄ = p(1−p) . Because the true proportion is not know, the estimated standard n q p̄) error is σ̂p̄ = p̄(1− n Sampling Distribution of X̄1 − X̄2 • When population standard deviations σ1 and σ2 are known: test statistic follows normal distribution with standard error s σ12 σ2 σX̄1 −X̄2 = + 2 n1 n2 • When population standard deviations are unknown and σ1 = σ2 : test statistic follows t-distribution with degree of freedom n1 + n2 − 2. Use the pooled variance to estimate the standard error of the sampling distribution of X̄1 − X̄2 r 1 1 + ), σX̄1 −X̄2 = Sp2 ( n1 n2 where Sp2 = (n1 − 1)S12 + (n2 − 1)S22 is the pooled variance (n1 − 1) + (n2 − 1) • When population standard deviations are unknown and σ1 6= σ2 : test statistic follows t-distribution with 2 2 S1 S22 + n1 n2 . Always round down the degree of freedom in the calculation. The degree of freedom 2 S1 2 S2 ( n1 ) ( n22 )2 + n1 − 1 n2 − 1 estimated standard error of the sampling distribution of X̄1 − X̄2 is s S12 S2 σX̄1 −X̄2 = + 2 n1 n2 ANOVA k: number of groups (treatments) nj : sample size of group j nT : total sample size of all groups together Pk • Mean square due to treatment (MSTR) = Pk • Mean square error (MSE) = j=1 (nj j=1 ¯ )2 nj (x̄j − x̄ k−1 − 1)Sj2 nT − k Hypothesis Test to Determine the Significance of the Correlation Coefficient H0 : ρ = 0 H1 : ρ 6= 0 4 Test statistic t = r r 1 − r2 n−2 follows t-distribution with degree of freedom n − 2 Simple Linear Regression • Least square estimations: b1 = P P P P (xi − x̄)(yi − ȳ) n xi yi − ( xi )( yi ) P P P = (xi − x̄)2 n x2i − ( xi )2 b0 = ȳ − b1 x̄ • Sampling distribution of b1 : estimated standard error is Se Se = pP 2 , Sb1 = pP (xi − x̄)2 xi − n(x̄)2 r where Se = SSE n−2 • Confidence interval estimation for an average value of y: v u u1 (xp − x̄)2 P + , CI = ŷ ± tα/2 Se u tn P ( xi )2 x2i − n where xp is the given value of the independent variable • Prediction interval estimation for an individual value of y: v u u 1 (xp − x̄)2 P P I = ŷ ± tα/2 Se u t1 + n + P ( xi )2 2 xi − n 5 Table A: Standard Normal Distribution 6 7 Table B: t-Distribution Critical Values 8 Table C: F -Distribution Critical Values 9 10