Harold’s Statistics “Cheat Sheet” 26 December 2015 Descriptive Description Population Sample Used For Parameters π, π Statistics π₯, π¦ π π Describing and predicting. The random value from the evaluated population. Number of observations in the population/sample. Indicates which value is typical for the data set. Measure of center; includes entire population. Average. Used when same probabilities for each X. Answers “Where is the center of the data located?” Data Random Variable Size Measures of Center (Measure of central tendency) π π π Μ = ∑ ππ π π 1 π = ∑ π₯π π Mean π=π π=1 Mean with π Table Median Mode Mid-Range 1 ∑ π₯π π π π+1 ππ = ππ π ππ πππ 2 ππ πππ₯. + πππ. ππππ ππππ = 2 Measures of Variation 1 ∑(π₯ − π)2 π π 1 (∑ π₯π2 − π π2 ) π π=1 1 π(π, π) = ∑(π₯ − ππ₯ )(π¦ − ππ¦ ) π π 1 π(π, π) = ∑ π₯π π¦π − ππ₯ ππ¦ π π2 = Covariance 1 ∑ π₯π π π π=1 Copyright © 2015 by Harold Toomey, WyzAnt Tutor Measure of center for a frequency distribution. More useful when data are skewed. The middle element in order of rank. Appropriate for categorical data. The most frequency value in a data set. Not often used, easy to compute. Highly sensitive to unusual values. (Measure of dispersion) π2 = Variance Μ = π π= Reflect the variability of the data (e.g. how different the values are from each other. 1 ∑(π₯ − π₯Μ )2 π−1 π 1 2 π = (∑ π₯π2 − π π₯Μ 2 ) π−1 Not often used. See standard deviation. Special case of covariance when the two variables are identical. 1 π= ∑(π₯ − π₯Μ )(π¦ − π¦Μ ) π−1 π 1 π(π, π) = (∑ π₯π π¦π − π π₯Μ π¦Μ ) π−1 A measure of how much two random variables change together. Measure of “linear depenedence”. If X and Y are independent, then their covarience is zero (0). π 2 = π=1 π=1 1 Description Standard Deviation Pooled Standard Deviation Interquartile Range (IQR) Population Sample ∑(π₯ − π)2 π = √π 2 = √ π ∑(π − π Μ )π ππ = √ π−π ∑ π₯2 π=√ − π2 π ∑ π₯ 2 − π π₯Μ 2 π =√ π−1 π1 π12 + π2 π22 √ ππ = π1 + π2 (ππ − π)πππ + (ππ − π)πππ √ ππ = (ππ − π) + (ππ − π) Measures of Relative Standing Percentile Quartile Not often used, easy to compute. (Measures of relative position) Data divided onto 100 equal parts by rank. Data divided onto 4 equal parts by rank. π₯ =π+ππ Z-Score / Standard Score / Normal Score π= π₯−π π PDF Copyright © 2015 by Harold Toomey, WyzAnt Tutor Measure of variation; average distance from the mean. Same units as mean. Answers “How spread out is the data?” Inferences for two population means. Less sensitive to extreme values. πΌππ = π3 − π1 π ππππ = πππ₯. − πππ. Range Used For π₯ = π₯Μ + π π π= π₯ − π₯Μ π Highly sensitive to unusual values. Indicates how a particular value compares to the others in the same data set. Important in normal distributions. Used to compute IQR. The π variable measures how many standard deviations the value is away from the mean. TI-84: [2nd][VARS][2] normalcdf(-1E99, z) CDF 2 Regression and Correlation Description Formula Response Variable Covariate / Predictor Variable Least-Squares Regression Line Regression Coefficient (Slope) π Output π Input π1 is the slope π0 is the y-intercept (π₯Μ , π¦Μ ) is always a point on the line Μ = ππ + ππ π π ππ = ∑(ππ − Μ Μ ) π)(ππ − π ∑(π − π Μ )π π1 is the slope ππ ππ = π ππ Μ − ππ π Μ ππ = π Regression Slope Intercept π= Linear Correlation Coefficient (Sample) Used For π0 is the y-intercept Strength and direction of linear relationship between x and y. Μ π−π Μ π π−π ∑( )( ) π−π ππ ππ π= π π π₯ π π¦ π = ±1 π = +0.9 π = −0.9 π = ~0 π ≥ 0.8 π ≤ 0.5 Perfect correlation Positive linear relationship Negative linear relationship No relationship Strong correlation Weak correlation Correlation DOES NOT imply causation. Residual πΜπ = π¦π − π¦Μ πΜπ = π¦π − (π0 + π1 π₯) Residual = Observed – Predicted ∑ ππ = ∑(π¦π − π¦Μπ ) = 0 2 π π1 √ ∑ ππ π−2 = √∑(π₯π − π₯Μ )2 πππ Μπ ) √∑(ππ − π π−π = Μ )π √∑(ππ − π Standard Error of Regression Slope π How well the line ο¬ts the data. Coefficient of Determination π2 Copyright © 2015 by Harold Toomey, WyzAnt Tutor Represents the percent of the data that is the closest to the line of best fit. Determines how certain we can be in making predictions. 3 Proportions Description Population π=π= Proportion π₯ π πΜ = π =1−π π = 1−π π2 = Variance of Population (Sample Proportion) Sample π2 = Pooled Proportion Used For π₯ π Probability of success. The proportion of elements that has a particular attribute (x). Probability of failure. The proportion of elements in the population that does not have a specified attribute. πΜ = 1 − πΜ ππ π π π2 = π(1 − π) π πΜ πΜ π−1 Considered an unbiased estimate of the true population or sample variance. πΜ (1 − πΜ ) π−1 π₯1 + π₯2 πΜπ = π1 + π2 π π2 = π₯ = πΜ π = frequency, or number of members in the sample that have the specified attribute. NA πΜ1 π1 + πΜ2 π2 π1 + π2 πΜπ = Discrete Random Variables Description Formula Random Variable Used For Derived from a probability experiment with different probabilities for each X. Used in discrete or finite PDFs. π πΈ(π) = π₯Μ π΅ Expected Value of X π¬(πΏ) = ππ = ∑ ππ ππ = ∑ π(π) π π½ππ(πΏ) = π=π π ππ = E(X) is the same as the mean. X takes some countable number of specific values. Discrete. ∑ ππ (ππ − ππ )π 2 ππ₯2 = ∑ π(π) (π − πΈ(π)) Variance of X ππ₯2 2 = ∑ π π(π) − πΈ(π) 2 Calculate variances with proportions or expected values. ππ₯2 = πΈ(π 2 ) − πΈ(π)2 ππ·(π) = √πππ(π) Standard Deviation of X ππ₯ = √ππ₯2 Calculate standard deviations with proportions. π Sum of Probabilities ∑ ππ = 1 1 If same probability, then ππ = π . π=1 Copyright © 2015 by Harold Toomey, WyzAnt Tutor 4 Statistical Inference Description Sampling Distribution Central Limit Theorem (CLT) Sample Mean Sample Mean Rule of Thumb Sample Proportion Sample Proportion Rule of Thumb Difference of Sample Means Mean Is the probability distribution of a statistic; a statistic of a statistic. Lots of π₯Μ ’s form a Bell Curve, approximating the normal distribution, √π(π₯Μ − π) ≈ π©(0, π 2 ) regardless of the shape of the distribution of the individual π₯π ’s. π ππΜ = ππΜ = π √π (2x accuracy needs 4x n) Use if π ≥ 30 or if the population distribution is normal π=π Large Counts Condition: Use if ππ ≥ 10 ππ§π π(1 − π) ≥ 10 πΈ(π₯Μ 1 − π₯Μ 2 ) = ππ₯Μ 1 − ππ₯Μ 2 ππ π(1 − π) =√ π π σπ = √ 10 Percent Condition: Use if π ≥ 10π ππΜ π−πΜ π = √ πππ πππ + ππ ππ π π ππΜ π−πΜ π = π√ + ππ ππ Special case when π1 = π2 Difference of Sample Proportions Standard Deviation π₯πΜ = πΜ1 − πΜ2 Special case when π1 = π2 π1 π1 π2 π2 ππ (π − ππ ) ππ (π − ππ ) π=√ + =√ + π1 π2 ππ ππ 1 1 π π π = √ππ√ + = √π(π − π)√ + π1 π2 ππ ππ Bias Caused by non-random samples. Variability Caused by too small of a sample. π < 30 Copyright © 2015 by Harold Toomey, WyzAnt Tutor 5 Confidence Intervals for One Population Mean Description Formula π= Standardized Test Statistic (of the variable π₯Μ ) Confidence Interval (C) for µ / zinterval (σ known, normal population or large sample) Margin of Error/Standard Error (SE) (for the estimate of µ) Sample Size (for estimating µ, rounded up) Critical Value Null Hypothesis: π―π Alternative Hypotheses: π―π ππ π―π Hypothesis Testing πππππππππ − πππππππππ πππππ πππ π ππππππππ ππ πππππππππ π₯Μ − π π=π ⁄ π √ z-interval = πππππππππ ± (ππππππππ πππππ) ∗ (πππππ πππ π ππππππππ ππ πππππππππ) π - interval = π₯Μ ± πΈ π = π₯Μ ± ππΌ⁄2 β √π πΌ 100 − πΆ = 2 2 ππΌ⁄2 = π§ − π ππππ πππ ππππππππππ‘πππ ππ πΌ⁄2 π ππΈ(π₯) = πΈ = ππΌ⁄ β 2 √π ππΈ(π₯Μ ) = π ⁄ √π ππΌ⁄2 β π 2 π=( ) πΈ ππΌ⁄2 Always set ahead of time. Usually at a threshold value of 0.05 (5%) or 0.01 (1%). Is assumed true for the purpose of carrying out the hypothesis test. ο· Always contains “=“ ο· The null value implies a specific sampling distribution for the test statistic ο· Can be rejected, or not rejected, but NEVER supported Is supported only by carrying out the test, if the null hypothesis can be rejected. ο· Always contains “>“ (right-tailed), “<” (left-tailed), or “≠” (twotailed) [tail selection is Important] ο· Without any specific value for the parameter of interest, the sampling distribution is unknown ο· Can be supported (by rejecting the null), or not supported (by failing or rejecting the null), but NEVER rejected 1. Formulate null and alternative hypothesis 2. If traditional approach, observe sample data 3. Compute a test statistic from sample data 4. If p-value approach, compute the p-value from the test statistic 5. Reject the null hypothesis (supporting the alternative) a. p-value: at a significance level α, if the p-value ≤ α; b. Traditional: If the test statistic falls in the rejection region otherwise, fail to reject the null hypothesis Copyright © 2015 by Harold Toomey, WyzAnt Tutor 6 Test Statistics Description Test Statistic Formula Hypothesis Test Statistic for π―π Population/Sample Proportion πΜ − π0 πΜ − π0 = ππΈ(πΜ ) ππ √ π π₯Μ − π0 π₯Μ − π0 π‘= = π ππΈ(π₯Μ ) ⁄ π √ π= Population/Sample Mean π₯Μ − π0 π= π ⁄ π √ Inputs/Conditions Standard Normal π under π»0 . Assumes ππ ≥ 15 πππ ππ ≥ 15. Variance unknown. π‘ Distribution, ππ = π − 1 under π»0 . Variance known. Assumes data is normally distributed or π ≥ 30 since π‘ approaches standard normal π if n is sufficiently large due to the CLT. Goodness-of-Fit Test – Chi-Square Expected Frequencies for a Chi-Square πΈ = ππ π2 = Chi-Square Test Statistic ππ = ∑ (π − 1)π 2 π2 (ππππππππ − ππππππππ )π ππππππππ ππ = π − 1 Degrees of Freedom π = πππππππ‘πππ π = π πππππ π ππ§π Large π 2 values are evidence against the null hypothesis, which states that the percentages of observed and expected match (as in, any differences are attributed to chance). π = ππ’ππππ ππ πππ π ππππ π£πππ’ππ (πππ‘πππππππ ) πππ π‘βπ π£πππππππ π’ππππ ππππ ππππππ‘πππ Independence Test – Chi-Square Expected Frequencies for a Chi-Square Chi-Square Test Statistic Degrees of Freedom ππ π (π − πΈ)2 π2 = πΈ πΈ= ππ = (π − 1)(π − 1) π = # ππ πππ€π π = # ππ ππππ’πππ (see above) π πππ π = ππ’ππππ ππ πππ π ππππ π£πππ’ππ πππ π‘βπ π‘π€π π£ππππππππ π’ππππ ππππ ππππππ‘πππ Formulating Hypothesis If claim consists of … “…is not equal to…” “…is less than…” “…is greater than…” “…is equal to…” or “…is exactly…” “…is at least…” “…is at most…” then the hypothesis test is Two-tailed ≠ Left-tailed < Right-tailed > Two-tailed = Left-tailed < Right-tailed > Copyright © 2015 by Harold Toomey, WyzAnt Tutor and is represented by… π―π π―π 7 Copyright © 2015 by Harold Toomey, WyzAnt Tutor 8