Math 12 Elementary Statistics Fall 2013 Review for Final Exam: Chapters 1 – 14 Reminder: You may bring one page (8.5” x 11”) with your own handwritten notes (both sides). That is, no photocopied material from any source may be on your sheet of notes. As always, your calculator can help you with many of the calculations. Start by reviewing each of your exams, quizzes, and the exam reviews. The lecture notes are a great recourse as well, since they contain the information and examples that I find the most important. I also list some additional review problems from the textbook below. These problems are for your benefit only and will not be collected. The final exam is worth 200 points, so you can expect it to be about twice as long as a regular exam. Chapter 1 • Given a research objective, be able to identify the target population(s) and variable(s) that need to be studied to answer that question. • Be able to explain why we select samples to study, rather than study the whole population (census). • Given a study, be able to identify whether random sampling, judgment sampling, convenience sampling, or volunteer sampling was used to select the sample. Be able to identify random vs. non-random sampling. • Quantitative vs. Qualitative and Continuous vs. Discrete Variables • Observational study vs. randomized experiment and the benefit of the latter. Know what a double-blind experiment is. • Understand that basic idea behind a biased vs. unbiased study. Knowing the meaning of voluntary bias, self-interest bias, social acceptability bias, and leading question bias. • Know what a confounder is, also called lurking variable. Understand that a correlation between two variables doesn’t automatically mean that one variable cause an increase or decrease of the other, but that there could be another, or several other, variables that are the actual cause.(Ch. 4) Ch Quiz p.30: 2, 5, 9, 14 Review Exercises p.31: 1, 3, 6, 8, 9, 10, 11 Chapter 2 • Given an ungrouped quantitative data set, be able to construct a frequency distribution table with classes of equal width, containing frequency, relative frequency, percent, and midpoint columns. • Given a frequency, relative frequency, or percentage table for a quantitative variable, be able to construct a histogram, bar graph, and polygon. Conversely, given a graph, be able to reconstruct the frequency, relative frequency, or percentage distribution table that gives you the graph. • Be able to construct and interpret a stem-and-leaf plot and dotplot graph. Review Exercises p.83: 1, 6, 7a, 8, 9 Chapter 3 • Understand and be able to calculate the mean, median, mode, standard deviation, variance, and range for an ungrouped data set using correct notation (s vs. σ ) and correct units.(STATS CALC1-VarStats) • Skewness and outliers. Know which parameters are more sensitive to outliers. • Be able to calculate the mean, standard deviation, and variance for grouped data (from a frequency, relative frequency, or percentage distribution table) by using the midpoints and frequencies of the various classes. • Be able to calculate the quartiles, the inner quartile range, the kth percentile, or the percentile rank of a particular data value for a data set using correct notation and units. • Calculate and interpret z-scores. • Given the mean, the standard deviation, and an interval of data values x − ks to x + ks , be able to determine k (the number of standard deviations above and below the mean) and use Chebyshev’s Theorem to calculate the minimum percentage 1 − k12 100% of data values within k standard deviations from the mean. ( ) • Conversely, given a minimum percentage of data values, be able to determine k using Chebyshev's Thm. Then find the interval of data values using k , the mean, and the standard deviation: x − ks to x + ks . • Be able to do calculations similar to the previous two items for approximately bell-shaped data using approximate percentages and the Empirical Rule. • Be able to construct a boxplot using your calculator and interpret it (is it skewed, what is the median, etc.) Ch Quiz p.142: All except 11 and 14 Review Exercises p.143: 7, 13 Chapter 4 • Given paired data between two variables, be able to construct a scatter plot for the data. • Given paired data between two variables, be able to calculate the equation of the least squares regression line (a.k.a. the line of best fit) for the paired data using correct notation in your equation. Be able to do this on the calculator [STATCALCLinReg(a+bx) ] • Be able to identify the slope and y-intercept in the equation of the regression line and be able to explain, in detail, the rate of change that the slope predicts for us about the dependent and independent variables. • Be able to graph the regression line on the same graph as the scatter diagram. Be able to do this by-hand. • Be able to calculate the linear correlation coefficient on the calculator [STATCALCLinReg(a+bx) ]. Explain whether the linear correlation is positive or negative, and whether it indicates a weak, medium, or strong linear relationship between the dependent and independent variables. • Be able to use the regression equation to make predictions. • Be able to explain the dangers of using linear regression for making predictions outside of the domain, proving causality (that one variable causes a certain behavior of the other variable), or modeling nonlinear data. Ch Quiz p.187: 1, 2, 5, 6, 7, 9 Review Exercises p.189: 14 Chapters 5 • Be able to draw a tree diagram with probabilities for an experiment. • Given a two-way classification table, be able to calculate both marginal and conditional probabilities either directly from the table or using the appropriate general formula: P ( A | B) = • P ( A and B ) P ( B) Be able to explain what the intersection and union of two events are, and given a two-way classification table or tree diagram, be able to calculate the probabilities for the intersection or union of two events either directly from the table or using the following general formulas: P ( A and B ) = P ( A) ⋅ P ( B | A) P ( A or B ) = P ( A) + P ( B ) − P ( A and B ) • Be able to explain what the complement of an event is, and be able to use its formula: P( AC ) = 1 − P ( A) • Be able to explain what mutually exclusive events are, and how this relates to the probability of the intersection and union of two events: For mutually exclusive events A and B , • P ( A and B ) = 0 so P ( A or B ) = P ( A) + P ( B ) Be able to explain what independent events are, how to test if two events are independent, and how this relates to the probability of the intersection two events: Test: If P ( A) = P ( A | B ) , then A and B are independent. If P ( A) ≠ P ( A | B ) , then A and B are dependent. For independent events A and B , Ch Quiz p.235: 1, 9, 10, 11, 12 P ( A and B ) = P ( A) ⋅ P ( B ) Review Exercises p.237: 1, 2, 3, 7, 10 Chapter 6 • Given a random variable, be able to explain whether it is qualitative or quantitative, and be able to classify a quantitative random variable as discrete or continuous. • Given a small population and an experiment, be able to find the probability distribution table for a discrete random variable using the given information or by making a tree diagram (and labeling the tree diagram with the marginal, conditional, and joint probabilities, using sampling with or without replacement). • Using the properties of a probability, be able to identify when a table is the probability distribution for a discrete random variable. • Using a table or tree-diagram, be able to calculate the probability that a discrete random variable is a single value or within an interval of values. • Be able to find the mean and standard deviation of a discrete probability distribution. Be able to interpret the mean as the expected value of the variable. • Be able to explain what the criteria of a binomial experiment are, and apply these criteria to specific situations to determine if an experiment is binomial. • Be able to determine if an experiment is a binomial experiment by looking at its tree diagram. • Be able to explain what a binomial random variable is: a discrete random variable that counts the number of successes x out of n trials in a binomial experiment. • Be able to recognize a binomial problem. It is usually a probability problem where we are given a certain percentage or probability p of selecting an element having a certain characteristic (which does not change after each selection), and we need to calculate the probability of selecting x out of n elements having that characteristic. • Calculate the probability using the appropriate formula or program [PRGM -> BINOML83]. • Be able to calculate the mean and standard deviation of a binomial distribution using the appropriate formulas. Be able to interpret the mean as the expected number of successes out of n trials. Ch Quiz p.271: 1, 2, 3, 4, 6, 9, 10, 15 Chapter 7 • Be able to explain what the mean and standard deviation tell us about the center and spread of a normal distribution curve. • Be able to explain what the standard normal distribution is ( µ = 0, σ = 1 ) • Given x , calculate its corresponding z -score, or vice versa. Remember, z tells us the number of standard deviations σ that x is from the mean µ : z= • x−µ σ x = µ + zσ or on calculator using PRGM -> NORMAL83 or INVNOR83 • Be able to calculate the probability that a normally distributed variable x is over a certain interval. Draw a picture with correctly labeled areas and axis. • Be able to calculate the appropriate z or x value given the probability or percentage of data values in either the tail or over an interval under a normal distribution. Chapter 7 • Be able to explain the difference between a population distribution and a sampling distribution. • Be able to explain how we obtain the sampling distribution of the sample means x , and clearly explain what the axes in such a sampling distribution represent. • The sampling distribution of x will be approximately normally distributed when… 1. The population distribution is already normally distributed (regardless of sample size). 2. The sample sizes taken are large n ≥ 30 , regardless of the shape of the population distribution. (this is the Central Limit Theorem for Means). In either case, • µx = µ , σ x = σ n Be able to recognize a probability problem that uses the sampling distribution of all sample means: you are asked to calculate the probability of selecting a simple random sample of a certain size that has a sample mean over a certain interval, or you are asked to calculate the percentage of simple random samples of a certain size that have means over a certain interval. • Be able to calculate the probability that the sample mean x is over a certain interval. • • The sampling distribution of p̂ will be normally distributed when np > 5 and nq > 5 : Be able to explain how we obtain the sampling distribution of the sample proportions p̂ . then µ pˆ = p , σ pˆ = pq n • Be able to recognize a probability problem that uses the sampling distribution of all sample proportions: it looks similar to a binomial problem, but rather than finding the probability of a certain number of successes as we do in a binomial problem, we are asked the find the probability of selecting a sample in which a certain interval of proportions or percentages of them have a certain characteristic. • Calculate the probability that the sample proportion p̂ is over a certain interval. Chapter 8 and 10 Confidence Intervals • A sample mean x is a point estimate of a population mean µ . A sample standard deviation s is a point estimate of a population standard deviation σ . A sample proportion p̂ is a point estimate of a population proportion p . A difference of sample means x1 − x2 is a point estimate of a difference of population means µ1 − µ 2 . A sample mean of differences d is a point estimate of a population mean difference µ d . A difference of sample prop. pˆ 1 − pˆ 2 is a point estimate of a difference of population prop. p1 − p2 . • We construct a confidence interval using a sample statistic (point estimate), together with some error, whenever we want to estimate an unknown population parameter: point estimate ± margin of error • Be able to explain what the confidence level tells us: the percentage of samples of the same size n that will make confidence intervals that actually contain the true population parameter; thus, a certain percentage of confidence intervals will not contain the population parameter, and we usually never know if our sample’s interval contains the population parameter, or not. • Be able to use the appropriate formula to estimate the sample size needed to construct a confidence interval for a population mean µ with the desired error and confidence level. • Be able to use the appropriate formula to estimate the sample size needed to construct a confidence interval for a population proportion p with the desired error and confidence level. For the most conservative estimate (or when we don’t have any p̂ available), use pˆ = 0.5 . • To decrease the error in a confidence interval estimate: 1. Increase the sample size (preferable, but not always economical or possible). 2. Decrease the confidence level (not preferable). • Be able to calculate confidence intervals for one population mean µ , one population proportion p , the difference between two population means µ1 − µ2 , the difference between two population proportions p1 − p2 , or the population mean difference µd , using the formulas and/or the calculator, and write your answer in the form of a detailed sentence as we did in-class (ex. “we are 95% confident that the true population mean of ... is between ... and ...). Remember, when writing conclusions for the confidence intervals for two populations make a comparison between the population parameters of the • first and second populations instead of using the words “different” or “difference. You should also avoid using negative numbers in your final conclusion. Use the appropriate inverse program to get the z or the t when using the formulas: Desired Estimate: Conf. Int. for µ Conf. Int. for µ Conf. Int. for p Assumptions: 1. SRS 2. n > 30 or population is normal 3. σ known 1. SRS 2. n > 30 or population is normal 3. σ unknown, s known 1. SRS 2. npˆ > 10 and Distribution: Formula: z distribution x ±z t distribution df = n − 1 z distribution x ±t pˆ ± z n(1 − pˆ ) > 10 Conf. Int. for µ1 − µ2 1. Independent SRSs 2. n1 , n2 > 30 or pops. normal 3. σ 1 , σ 2 unknown, and s1 , s2 known 1. Paired SRSs 2. n ≥ 30 or Conf. Int. for pop. of diff. normal µd 3. σ d unknown, sd known 1. Independent SRSs Conf. Int. for 2. Each pop. size≥20 · n p1 − p2 3. Two categories with at least 10 in each. Note: SRS = Simple Random Sample Program: σ ZInterval n s TInterval n ˆˆ pq n 1-PropZInt t distribution df = nsmallest -1 t distribution df = n − 1 z distribution ( x1 − x2 ) ± t d ±t s12 n1 + s22 n2 2-SampTInt (not pooled) sd TInterval n ( pˆ1 − pˆ 2 ) ± z pˆ1qˆ1 n1 + pˆ 2 qˆ2 n2 2-PropZInt Chapter 9, 11, 12,13 • A hypothesis test is a procedure that helps us make a decision regarding statements made about the characteristics of a population. • Be able to perform hypothesis tests about a single population mean µ , a single population proportion p the difference between two population means µ1 − µ2 , the difference between two population proportions p1 − p2 , the population mean difference µd , a goodness of fit test, and an analysis of variance test, using the eight-step procedure: 8-Step Procedure for Performing a Hypothesis Test: 1) State the null and alternative hypotheses of the test. 2) Choose and/or state the significance level α . 3) State type of test, the standardized sampling distribution that should be used, and check that all of the required assumptions for using that distribution are satisfied. 4) Compute the test statistic. 5) Draw a picture of the standardized sampling distribution you are using. Label the test statistic. 6) Calculate the P-value. 7) Interpret the P-value and make a decision. If P-value < α then we reject the null hypothesis, and we have sufficient evidence for the alternative hypothesis. If P-value > α then we do NOT reject the null hypothesis, and we do NOT have sufficient evidence for the alternative hypothesis. 8) State a conclusion in the form of a detailed sentence that addresses the alternative hypothesis. When we Reject H 0 , we say “there is sufficient evidence to show that H 1 ” , where H 1 is stated in words. When we Fail to Reject H 0 , we say “there is not sufficient evidence to show that H 1 ” , where H 1 is stated in words. Be able to perform hypothesis tests about a single population mean µ , a single population proportion p the difference between two population means µ1 − µ2 , the difference between two population proportions p1 − p2 , the population mean difference µd , a goodness of fit test, and an analysis of variance test, using the eight-step procedure: Type of Test: Assumptions: Distribution: Details: Test about µ : H0: µ = value H1: µ ≠ value µ < value µ > value Test about µ : H0: µ = value H1: µ ≠ value µ < value µ > value 1. SRS 2. n > 30 or pop. is normal 3. σ known 1. SRS 2. n > 30 or pop. is normal 3. σ not known, s known Test Stat: z= z distribution P-Value: Normal83, or ZTest Test Stat: t distribution df = n − 1 t= H1: p ≠ value 1. SRS 2. np > 10 and Test Stat: H0: µ1 − µ2 = 0 H1: µ1 − µ2 ≠ 0 µ1 − µ2 < 0 µ1 − µ2 > 0 z= z distribution n(1 − p ) > 10 pˆ − p p (1 − p) n P-Value: T83, or 1-PropZTest p < value p > value Test about µ1 − µ2 : x −µ s n P-Value: T83, or TTest Test about p : H0: p = value x −µ σ n t distribution 1. Independent SRSs 2. n1 , n2 > 30 or pops. Normal 3. σ 1 , σ 2 unknown, s1 , s2 known Test Stat: t = df = ( s12 n1 2 s12 n1 n1 −1 + s2 2 n2 ) 2 2 + s22 n2 n2 −1 ( x1 − x2 ) − ( µ1 − µ 2 ) s12 n1 + s2 2 n2 P-Value: : T83, or 2-SampTTest (not pooled) Chapter 9, 10, 11, 12 (continued) Type of Test: Test about µd : H0: µd = 0 H1: µd ≠ 0 µd < 0 µd > 0 Assumptions: 1. Paired SRSs 2. n > 30 or pop. of diff. normal 3. σ d unknown, sd known Distribution: Details: Critical Value(s): TINVRS83 t distribution Test Stat: t = df = n − 1 d − µd sd n P-Value: TTest Critical Value(s): INVNOR83 Test about p1 − p2 : H0: p1 − p2 = 0 H1: p1 − p2 ≠ 0 p1 − p2 < 0 p1 − p2 > 0 Goodness of Fit Test H0: Pop. fits expected distr. H1: Pop. does not fit expected distr. Test Stat: 1. Independent SRSs 2. n1 pˆ1 > 5 , n1qˆ1 > 5 n2 pˆ 2 > 5 , n2 qˆ2 > 5 z distribution z= ( pˆ1 − pˆ 2 ) − ( p1 − p2 ) p (1 − p ) (1 n1 + 1 n2 ) ,p= P-Value: INVNOR83, or INVNOR83, or 2-PropZTest 1. SRS 2. All expected frequencies ≥ 5 Test of Independence H0: Two characteristics of a 1. SRS population are 2. All expected independent frequencies ≥ 5 H1: Two characteristics of a population are dependent. Test Stat: 2 ∑ χ distribution df = k − 1 (O − E ) 2 E where each E = np P-Value: CHI83, or GOODFT83, or χ 2 GOF-Test Test Stat: ∑ (O − E ) 2 E where each χ 2 distribution df = ( R − 1)(C − 1) E= ( Row Total )(Column Total ) Sample Size P-Value: CHI83, or CHITST83 or χ 2 -Test Analysis of Variance (ANOVA) H0: 3+ Pop. means 1. Independent SRSs 2. Pops. normal are all equal 3. σ ’s are all equal H1: 3+ Pop. means are not all equal x1 + x2 n1 + n2 F -distribution P-Value and Test Stat: ANOVA Chapter 9, 10, 11, 12, 13 (continued) • Be able to recognize when two samples are selected independently, and when paired samples are selected dependently. • Notice that many of the test statistics for means and proportions measure the number of standard deviations in the sampling distribution that the sample statistic is from the null hypothesis value, which measures the evidence against H 0 . test stat = • (sample stat) − (null hypothesis value) (stdev of sampling distribution) Since we use a random sample in a hypothesis test, there is always a chance that we make the wrong the decision in any hypothesis test we perform: Type I error: Deciding to reject H 0 when H 0 is actually true. Type II error: Deciding to fail-to-reject H 0 when H 0 is actually false (when H 1 is actually true). • Based on your conclusion to a hypothesis test, be able to identify whether a type I or type II error could have been made. • Be able to explain what the significance level ( α ) measures. Remember, it is the probability of making a type I error. In other words, assuming the null hypothesis is true, it is the percentage of all simple random samples that could have been selected that would have lead us to making the type I error of rejecting the null hypothesis when it is true.