Math 10 - Statistics Winter 2021 Summary of Material Frequency Tables and Relative Frequency Tables Relative Frequency = Frequency of outcome Number of observations Lower Class Limit = lowest data value allowed in a class Upper Class Limit = highest data value allowed in a class Lower Class Boundary = lower class limit – 0.5 Upper Class Boundary = upper class limit + 0.5 Class Width is the difference between consecutive lower class limits. To calculate the class width: Class Width = largest data value−smallest data value desired number of classes ; (+ 1 if an integer otherwise round up ) To calculate the class midpoint: Class midpoint = lower class limit+ lower class limit of next class 2 Frequency Distributions can be: Symmetric, Uniform, Skewed left, Skewed right, Unimodal, Bimodal Sample Data: [ n = sample size ] Sample Mean: ๐ฅฬ = ∑๐ฅ ๐ , (round to one more decimal place than the data) ∑(๐ฅ−๐ฅฬ )2 Sample Variance: ๐ 2 = Standard Deviation: ๐ = √๐ 2 ๐−1 ๐๐ 1 ๐ 2 = (๐−1) [∑ ๐ฅ 2 − (∑ ๐ฅ)2 ๐ ] Population Data: [ N = sample size ] Population Mean: ๐= ∑๐ฅ ๐ Population Variation: ๐ 2 = , (round to one more decimal place than the data) ∑(๐ฅ−๐)2 ๐ Standard Deviation: ๐ = √ ๐ 2 Lauri Papay March 12, 2021 Measures of position 5 - Number Summary: ( smallest data value, Q1, Q2, Q3, largest data value) pth Percentile: Data value for which p% of the values lie to the left of p and 100-p% lies to the right. Q1 = 25th Percentile, Q2 = 50th Percentile = median, Q3 = 75th Percentile To find the pth percentile in a sorted list of n elements: 1) multiply: n *p (in decimal form) 2) If n *p is an integer i, then the pth percentile is the average of the i th and ( i +1)st elements in the list. If n *p is not an integer, then round up to next higher integer to find the position of the pth percentile. . Data Range: largest data value − smallest data value Interquartile Range (IQR): Q3 – Q1 Lower Outlier Boundary = Q1 – (1.5)IQR Upper Outlier Boundary = Q3 + (1.5)IQR Empirical Rule for Symmetrical Mound Shaped Distributions: Empirical Guidelines for a symmetric distribution with mean ๐ and standard deviation ๏ณ 68% of the data lies within ๐ ± ๏ณ Approximately 95 % of the data lies within ๐ ± 2๏ณ Almost all of the data lies within ๐ ± 3๏ณ Z-Score: z = ๐ฅ−๐ ๐ = the number of standard deviations the data value ๐ฅ is from the mean. Correlation and Regression: x = Explanatory Variable y = Response Variable To test a set of (x, y) data for linearity, we calculate the correlation coefficient. Correlation Coefficient: ๐ = 1 ๐−1 ∑( ๐ฅ− ๐ฅฬ ๐ ๐ฅ )( ๐ฆ− ๐ฆฬ ๐ ๐ฆ ) Lauri Papay March 12, 2021 If ๐ is closer to 1, it defines a positive linear relationship If ๐ is closer to 0, there is no linear relationship If ๐ is closer to -1, it defines a negative linear relationship ฬ = ๐0 + ๐1 ๐ฅ ; Least squares linear regression line: ๐ฆ ๐1 = ๐ ๐ ๐ฆ ๐ ๐ฅ ; ๐0 = ๐ฆฬ + ๐1 ๐ฅฬ Coefficient of determination = ๐ 2 = fractional amount of the total variation in y that can be explained by using the linear model ๐ฆฬ = ๐ + ๐๐ฅ. Total deviation in y: ๐ฆ − ๐ฆฬ ; Explained Variation: ๐ฆฬ − ๐ฆฬ ; Unexplained Variation: ๐ฆ − ๐ฆฬ Residual: ๐ฆ − ๐ฆฬ for each (x, y) pair Influential Point: any point that strongly affects the least squares regression line Interpolation: The use of the least squares regression line to predict the output of a data point within the range of known data points Extrapolation: The use of the least squares regression line to predict the output of a data point that is outside of the range of known data points Correlation verses causation. Even when two variables are highly correlated, it is not necessarily the case that changing the value of one of them will cause a change in the other. Residual plot. When a residual plot has no apparent pattern, a linear model is appropriate. Probabilities: Sample space = S = all possible outcomes of an experiment. If S contains equally likely outcomes, and A is a subset of S, then ๐ (๐ด) = n( A ) n(S) Event A = a subset of the sample space. Event Ac = The subset of elements in the sample space that are not in A. P(A) = probability that an outcome will be in A. P(Ac ) = probability that the outcome of an experiment will not be in event A. P(A) + P(Ac ) = 1 Conditional Probabliity – “The probability of A given B” denoted P( A | B ) Lauri Papay March 12, 2021 P( A | B ) = P( A ๐๐๐ B P(B) ) ; P(B) ๏น 0 (probability of A occurring given that B has occurred) Multiplication Rule: P( A | B ) P(B) = P( A ๐๐๐ B ) A and B are independent if P( A | B ) = P(A), which is equivalent to P( A ๐๐๐ B ) = P(A)๏P(B) A and B are mutually exclusive if p(A ๐๐๐ B) = 0 Addition Rules: P( A OR B ) = P(A) + P(B) - P( A ๐๐๐ B ) [for all events A and B] P( A OR B ) = P(A) + P(B) [ If A and B are mutually exclusive events] RANDOM VARIABLES Discrete Random Variable Properties: Properties of a Probability Distribution f ( ๐ฅ) that describes the distribution of a discrete random variable X. 1) 0 ๏ฃ f ( ๐ฅ ) ๏ฃ 1 for all f ( ๐ฅ ) 2) ∑ ๐ (๐ฅ ) = 1 Expected Value = E[X] = ๏ญ =∑ ๐ฅ๐(๐ฅ ) [ f(๐ฅ) = probability that ๐ฅ will occur ] Variance = Var(X) = ๏ณ2 = ∑(๐ฅ − ๏ญ)2 ๐(๐ฅ ) Standard Deviation ๏ณ = √Var(X) Standard Deviation = √Var(X) Binomial Random ๏ณVariables A binomial trial is an activity for which 1) There are only 2 possible outcomes labeled Success and Failure. 2) The probabilities of Success = p, and Failure = 1-p, remain constant. 3) The trials are independent. Lauri Papay March 12, 2021 Binomial Random Variable Properties: A binomial random variable X is one that is assigned the number of successes from n binomial trials. X takes on the values 0, 1, …, n which are all possible numbers of success for n trials. ๐[ ๐ฅ ] = Probability that there are exactly ๐ฅ successes out of ๐ trials ๐[ ๐ฅ] = ๐๐ช๐ฅ ๐ ๐ฅ ๐ ๐−๐ฅ for any ๐ฅ = 0, 1, …, n p = P[success] , 1-p = P[ failure] n = number of trials, p = P[success] , (1-p) = P[ failure] Expected Value = E[X] = ๏ญ = np, Variance = Var(X) = ๏ณ2 = np(1-p) Standard Deviation ๏ณ = √๐๐(1 − ๐) ๐[ ๐ ≤ ๐ฅ ≤ ๐] = Probability that ๐ฅ is between a and b inclusive. ๐[ ๐ < ๐ฅ < ๐] = Probability ๐ฅ is between a and b, not including a or b. On the TI-84 silver plus (and possibly other similar models): binompdf( n, p, x ) = probably of exactly x successes; binomcdf( n, p, x ) = probably of at most x successes Continuous Random Variable Properties: Properties of a Probability Distribution f ( ๐ฅ) that describes the distribution of a continuous random variable X: 1) The total area under the curve f ( ๐ฅ) = 1 2) P[ a ๏ฃ ๐ฅ ๏ฃ b ] = The total area under the curve between a and b 3) ๐ (๐ฅ ) ≥ 0 for all ๐ฅ Population Distribution: x - Distribution Standard Normal Distribution: A bell shaped density distribution with mean = ๏ญ = 0 and standard Deviation = ๏ณ = 1. Empirical Guidelines for a symmetric Bell-Shaped Distribution state that: Approximately: 68% of the data lies within ๐ ± ๏ณ 95 % of the data lies within ๐ ± 2๏ณ 99.7% of the data lies within ๐ ± 3๏ณ Lauri Papay March 12, 2021 Probabilities for a Standard Normal Population Distribution ๐ (๐ ≤ ๐ฅ ≤ ๐) can be interpreted in two ways. 1) It is he proportion of data between the values a and b 2) It is the probability that a randomly chosen data value will be between the values a and b. ๐ฅ−๐ Standardized variable z for a population distribution: z = ๐− ๐ ๐(๐ ≤ ๐ฅ ≤ ๐) = ๐ ( ๐ ≤๐ ≤ ๐ ๐− ๐ ๐ ) On the TI-84 silver plus (and possibly other similar models): Let a and b represent data values from a distribution If working with z-scores: normcdf( za, zb, 0, 1) = the proportion of data between z-scores za and zb If working with data values: normcdf (a, b, ๏ญ, ๏ณ) = the proportion of data between the data values a and b, where the mean is ๏ญ, and the standard deviation is ๏ณ. To find the z-score for which p % of the data lies to the left we use Invnorn( p, 0, 1) To find the data value for which p % of the data lies to the left we use Invnorn( p, ๏ญ, ๏ณ) SAMPLING DISTRIBUTIONS Sampling Distribution for the mean ๐ : Let ๐ฅ = { ๐ฅ1 , ๐ฅ2 , … , ๐ฅ๐ } be a simple random sample of the population X. If the population X is not a normal distribution, then it is required that n > 30. ๐ฅฬ = (๐ฅ1 + ๐ฅ2 +โฏ+ ๐ฅ๐ ) ๐ ; ๐๐ฅฬ = ๐ Lauri Papay March 12, 2021 Probabilities given a sampling distribution for the mean (๐ ๐๐๐๐๐): Standard deviation of the sampling distribution: ๐๐ฅฬ = ๐ √๐ z = Standardized variable z for the sampling distribution: ๐(๐ ≤ ๐ฅฬ ≤ ๐) = ๐ ( ๐−๐ ๐ √๐ ≤๐ ≤ ๐−๐ ๐ √๐ [n = sample size] ๐ฅฬ −๐ ๐ √๐ ) Probabilities given a sampling distribution for the mean (๐ ๐๐๐๐๐๐๐): Standard deviation of the sampling distribution: ๐ ๐ฅฬ = ๐ √๐ Standardized variable t for the sampling distribution: t = [n = sample size] ๐ฅฬ −๐ ๐ √๐ [Use Student’s t Distribution] ๐(๐ ≤ ๐ฅฬ ≤ ๐) = ๐ ( Degrees of freedom( d.f.) = n – 1 , ๐−๐ ๐ √๐ ≤๐ก ≤ ๐−๐ ๐ √๐ ) Sampling Distribution for the proportion p: p = population proportion = proportion of a population having some characteristic n = size of sample; x = number in the sample that have the characteristic If np ≥ 10 and n(1-p) ≥ 10 then ๐ฬ can be approximated by a normal random variable ๐ฬ = ๐ฅ ๐ is the sample proportion Probabilities for a sampling distribution of a proportion ๐: ๐๐ฬ = ๐; ๐๐ฬ = √ ฬ ≤ ๐) = ๐ ( ๐ (๐ ≤ ๐ ๐(1−๐) ๐−๐๐ฬ ๐๐ฬ ๐ ; ≤๐ง ≤ z = ๐−๐๐ฬ ๐๐ฬ ๐ฬ−๐๐ฬ ๐๐ ฬ ) Lauri Papay March 12, 2021 CONFIDENCE INTERVALS Assumptions for Confidence Intervals for a mean ๏ญ 1. We have a simple random sample. 2. The sample size is large (n > 30), or the population is approximately normal. Confidence Intervals for ๏ญ (๏ณ ๐ค๐ง๐จ๐ฐ๐ง): [ NOTE zc = ๐๐ถ ] ๐ If ๐ฅฬ is the point estimator for ๐, c is the confidence level, and n = sample size, then: Margin of Error ๐ธ = ๐ง๐ โ Resulting in the C % confidence interval: ๐ √๐ ( ฬ ๐ฅ − ๐ธ , ๐ฅฬ + ๐ธ ) Confidence Intervals for ๏ญ (๏ณ ๐ฎ๐ง๐ค๐ง๐จ๐ฐ๐ง): [ NOTE tc = ๐๐ถ ] ๐ If ๐ฅฬ is the point estimator for ๐, c is the confidence level, and n = sample size, then: ๐ Margin of Error ๐ธ = ๐ก๐ โ ๐ √ [Use Student’s t Distribution d.f. = n -1] ( ฬ ๐ฅ − ๐ธ , ๐ฅฬ + ๐ธ ) Resulting in the C % confidence interval: Assumptions for Confidence Intervals for a proportion ๐ 1. We have a simple random sample. 2. The population is at least 20 times as large as the sample. 3. The items in the population are divided into two categories. 4. The sample must contain at least 10 individuals in each category. Confidence Intervals for a proportion ๐: [ NOTE zc = ๐ง๐ผ ] 2 ๐ฅ If ๐ฬ = ๐ ; E = ๐ง๐ √ ๐ฬ (1 −๐ฬ) ๐ (c = level of confidence); Resulting in the C % confidence interval: (๐ฬ − ๐ธ , ๐ฬ + ๐ธ ) Assumptions for Confidence Intervals for a standard deviation s 1. We have a simple random sample. 2. The population must have a normal distribution. Lauri Papay March 12, 2021 Confidence Intervals for a standard deviation s: (๐−1)๐ 2 ๏ฃ2๐ผ 2 < ๏ณ2 < (๐−1)๐ 2 ๏ฃ2 1− Resulting in confidence interval: (√ ๐ผ 2 (๐−1)๐ 2 ๏ฃ2๐ผ 2 , √ (๐−1)๐ 2 ๏ฃ2 ๐ผ 1− 2 ) HYPOTHESIS TESTING If H1 contains the inequality symbol < a left-tail test is performed If H1 contains the inequality symbol > a right-tail test is performed If H1 contains the inequality symbol ๏น a two-tail test is performed HYPOTHESIS TESTS FOR A MEAN Assumptions for Hypothesis test of a mean ๏ญ 1. We have a simple random sample. 2. The sample size is large (n > 30), or the population is approximately normal. HYPOTHESIS TEST (P-value Method): (Given significance level ๏ก) 1) State Null Hypothesis denoted H0 (usually status quo) State Alternative Hypothesis denoted H1 (proposed change) 2) Calculate the test statistic for ๐ฅฬ . ๐ง๐ฅฬ = ๐ฅฬ − ๏ญ ๏ณ ⁄ ๐ √ ; or ๐ก๐ฅฬ = ๐ฅฬ − ๏ญ s ⁄ ๐ √ (d. f. = n – 1); 3) Calculate p-value using either the Normal Standard Distribution, or the Student’s T-Distribution. 4) Reject or fail to Reject the null Hypothesis based on the significance level ๏ก. 5) Interpret your conclusion Summarize your decision within the context of the problem. Lauri Papay March 12, 2021 HYPOTHESIS TEST (Critical value Method): (Given significance level ๏ก) 1) State Null Hypothesis denoted H0 (usually status quo) State Alternative Hypothesis denoted H1 (proposed change) 2) Calculate the test statistic for ๐ฅฬ . ๐ง๐ฅฬ = ๐ฅฬ − ๏ญ ๏ณ ⁄ ๐ √ ; or ๐ก๐ฅฬ = ๐ฅฬ − ๏ญ s ⁄ ๐ √ (d. f. = n – 1); 3) Find the critical value associated with significance level ๏ก -zα for a left-tail, zα for a right-tail test, -z α/2 and z α/2 for a two-tailed test Or -tα for a left-tail, tα for a right-tail test, -t α/2 and t α/2 for a two-tailed test 4) Reject or fail to Reject the null hypothesis based on the Critical Value. 5) Interpret your conclusion Summarize your decision within the context of the problem. HYPOTHESIS TESTS FOR A PROPORTION Assumptions for testing a proportion ๐ 1. We have a simple random sample. 2. The population is at least 20 times as large as the sample. 3. The items in the population are divided into two categories. 4. The sample must contain at least 10 individuals in each category. HYPOTHESIS TEST (P-value Method): (Given significance level ๏ก) 1) State Null Hypothesis denoted H0 (usually status quo) State Alternative Hypothesis denoted H1 (proposed change) 2) Calculate the test statistic for ๐ฬ . ๐ง๐ฬ = ๐ฬ− ๐ √ ๐๐ ๐ 3) Calculate p-value using the Normal Standard Distribution. 4) Reject or fail to Reject the null Hypothesis based on the significance level ๏ก. Lauri Papay March 12, 2021 If p-value ๏ฃ ๏ก, Reject the null hypothesis If p-value > ๏ก, Fail to reject the null hypothesis 5) Interpret your conclusion Summarize your decision within the context of the problem. HYPOTHESIS TEST (Critical value Method): (Given significance level ๏ก) 1) State Null Hypothesis denoted H0 (usually status quo) State Alternative Hypothesis denoted H1 (proposed change) 2) Calculate the test statistic for ๐ฬ . ๐ง๐ฬ = ๐ฬ− p ๐๐ ๐ √ 3) Find the critical value associated with significance level ๏ก -zα for a left-tail, zα for a right-tail test, or -z α/2 and z α/2 for a two-tailed test 4) Reject or fail to Reject the null hypothesis based on the Critical Value. 5) Interpret your conclusion Summarize your decision within the context of the problem. Lauri Papay March 12, 2021