Math 210 Review for Final The final exam is cumulative. You may see output from applets, and you may need to use them. Look over your old tests and quizzes. Also look through the explorations we completed this semester. Many of them will contain concepts that will be tested on the final. Our final exam is will take place in our classroom, 3031 of the Schaap Science Center, on Wednesday, April 30 from 3:00 to 5:00 p.m. Here is a summary of the data types needed to run each test. o Single proportion: One categorical variable with two categories. o Single mean: One quantitative variable. o Matched pairs test: Dependent samples, subtract to obtain one quantitative variable for the difference. o Comparing means: Response is quantitative, explanatory is categorical with two independent categories. o Comparing proportions: Both response and explanatory are categorical with two categories. o Chi-square test for independence (association): Response and explanatory are categorical. Either one can have more than two categories. o ANOVA: Response is quantitative and explanatory is categorical with more than two categories. o Correlation and Regression: Response and explanatory are quantitative. Preliminaries. We introduced the six step process of statistical investigation in this section. You should understand this overall process. You should know the difference between anecdotal evidence and data. We also looked at how variability is described using standard deviation as well as the shape of a graph. You should know what probability is and how it is estimated. You should know how to describe the observational units and variables (categorical and quantitative) that are involved in a study. Section 1.1: Introduction to Chance Models. We put the statistical investigative process to use on some specific examples in this section. We introduced the coin flipping applet to help us arrive at some sort of a conclusion. You should be able to use this applet to help you arrive at a conclusion to a significance test. You should also understand what each point in the dotplots produced represents and what the entire distribution represents. We also introduced the three S strategy for measuring strength of evidence to help formalize this process. Math 210 Review for Final Section 1.2: Measuring Strength of Evidence. In this section we allowed for things other than a 50-50 chance model. We also formalized the inferential process further in this section and added the terms null and alternative hypotheses, p-value, and null distribution. We also introduce lots of symbols here such as 𝑝̂ , 𝜋, H0, and Ha. You should know all these as well as the guidelines for strength of evidence and how to write appropriate hypotheses and conclusions. Section 1.3: Alternative Measure of Strength of Evidence. Here we introduced the standardized statistic as an alternative to the p-value as a measure of the strength of evidence. You should know how this is used, how to calculate the standardized statistic and what it means. Section 1.4: Impacting Strength of Evidence. We introduced two-sided tests in this section and saw how they impact the strength of evidence in a test. You should be able to use an applet appropriately to do this. You should also be able to express an alternative hypothesis appropriately when it is two-sided. Besides understanding how two-sided tests impact the strength of evidence, you should also know how the distance between the value under the null hypothesis and the sample proportion as well as the sample size impacts the strength of evidence of a test. Section 1.5: Normal Approximation. You should know how a normal distribution can model the null distributions we have seen in our applets and more broadly how a theory-based test differs from a simulation-based test. You should know how to use the test of significance calculator we used in this section. You should also know when the theory-based method is appropriate for a test of a single proportion. Section 2.1: Sampling from a finite population. You should know the difference between a population and a sample as well as between a parameter and a statistic. You should know that simple random samples are unbiased and that convenience samples can often be biased. A biased sample is on that systematically over-represents some parts of a population and under-represents other parts. You should also know the effects of sample size on the p-value in a study. Section 2.2: Inference for a Single Quantitative Variable. You should know that median is a resistant measure of center while mean is not. You should know the relationship between the mean and median of a distribution that is skewed. You should know the validity conditions for a theory-based test for a single mean. You should be able to carry out and understand a theory-based test for a mean. Section 2.3: Errors and Significance. You should know what type I and type II errors are and be able to describe them in the context of a study. You should also know that the significance level is the probability of making a type I error. You should know that the probability of a type II error can be very high particularly with small sample sizes. Section 3.1: Statistical Inference — Confidence intervals. You should be able to use repeated tests of significance to find a range of plausible values for the population proportion. You should know how the width of a confidence interval changes depending on the confidence level. Math 210 Review for Final Section 3.2: 2SD and Theory-Based Confidence Intervals. Using the standard deviation created by a null distribution is another way to approximate a 95% confidence interval by using the formula 𝑝̂ ± 2(𝑆𝐷). Just as in the previous chapter, we saw that the standard deviation of a null distribution can be predicted. Because of this, confidence intervals can be easily found using the Theory-Based Inference applet (that uses a normal distribution). You should be able to find these intervals and also understand what the intervals represent and how the confidence level affects the width. You should know what the margin of error is and how to calculate it given the endpoints of a confidence interval. Given a p-value for a test, you should know if the value under the null is contained in a certain confidence interval. Also, given a confidence interval, you should be able to determine whether or not the results of a test will be significant. Section 3.3: Confidence Intervals for a Single Mean. You should be able to determine to determine a theory-based confidence interval for a population mean using the Theory-Based Inference Applet and understand what the confidence interval is estimating. Section 3.4: Factors that affect the width of a confidence interval. You should know how level of confidence, sample size, sample proportion, and sample mean influence the width of a confidence interval. You should also know that if you repeatedly resample from a population and compute 95% confidence intervals each time, 95% of the resulting intervals will contain the population parameter in the long run. Section 4.1: Association between Variables. You should know what explanatory and response variables are and how to set up and read a two-way table. You should know how to calculate appropriate conditional proportions from a two-way table. You should know that while observational studies can show an association, they cannot be used to show causation because of potential confounding variables. Section 4.2: Observational Studies versus Experiments. You should know that unlike observational studies, experiments control for confounding variables through random assignment. Random assignment allows potential confounding variables to be evenly distributed between the treatment groups. This allows us to make cause and effect conclusions from experiments. Section 5.1: Comparing two groups: Categorical response. We again looked at how categorical data can be summarized in two-way tables and how conditional proportions should be used to compare the groups. We also saw that when variables show no association, their conditional proportions are all the same. Section 5.2: Comparing two proportions: Simulation-based methods. In this section, we used cards and an applet to investigate a test of significance involving the difference in two proportions. You should be able to understand this process of developing a null distribution. You should be able to use the Inference for 2X2 Tables applet and also understand what it is doing to develop a null distribution. You should also be able to interpret the results from such a simulation. You should also be able to determine a 95% confidence interval using the 2SD method with the standard deviation from a null distribution and understand what the confidence interval is describing. Math 210 Review for Final Section 5.3: Comparing two proportions: Theory-based methods. You should know how to conduct a theory-based test to compare two proportions using the theory-based applet and the validity conditions needed to obtain accurate results. You should know how strength of evidence is affected by the difference in the proportions and the sample size. You should know how to find and interpret a confidence interval for the difference in two proportions and how these are related to the p-value of the significance test. Section 6.1: Exploring quantitative data. You should know mean and median are and how each is affected by skewed data and outliers. You should also be able to compare parallel dotplots both in terms of center and variability. Section 6.2: Comparing two averages: Simulation-based methods. You should understand how a simulation-based test for comparing two means is conducted using cards and through the SimulationBased Test for a Quantitative Response applet. Just as with proportions, you should know how to find a 95% confidence interval for the difference in means using the 2SD method. Section 6.3: Comparing two averages: Theory-based methods. You should know how to conduct a test of significance for the difference in two means using theory-based methods and the validity conditions needed to obtain accurate results. You should know how strength of evidence is related to standard deviation, difference in means, and sample size. Just as in proportions, you should know how confidence intervals for differences in means are related to the results from a test of significance. Section 4.3: Paired designs. With a paired design, the response variable becomes the difference in measurements between the two treatments. This can often lead to less variability than the original measurements and thus can lead to stronger evidence against the null hypothesis. Section 7.1: Simulation-Based Approach for Analyzing Paired Data. You should know the difference between independent samples and dependent samples and how a simulation-based paired data test is run. In this type of test randomly chosen pairs of data are switch. This switching makes their difference to be the negative of what it originally was. We again had the 2SD method of finding confidence intervals for the mean difference in the population in this section. Section 7.2: Theory-Based Approach for Analyzing Data from Paired Samples. You should know how a theory based test is run and that it is basically a test for a single mean where the null hypothesis has the mean difference equal to 0. To run this test, we should have a sample size of at least 20 or fairly symmetric data in our sample. Section 8.1: Simulation –Based Method to Compare Multi-Category Categorical Variables. You should understand how categorical data is summarized in two-way tables and be able to compute appropriate conditional proportions from these tables. You should also be able to compute the MAD statistic and understand what it is measuring. Know that no association between the explanatory and response variables means that the population proportions are all the same. Also remember that the null distribution in this section started at 0 and was skewed to the right. Math 210 Review for Final Section 8.2: Theory –Based Method to compare multi-category categorical variables. In this section we saw that a chi-square statistic could also be used to determine how far apart proportions are from each other. Using this statistic, we can use theory-based methods to predict what the null distribution would look like and easily find a p-value. If the results are found significant, we can follow this up by finding pairwise confidence intervals for differences in proportions to find out exactly which proportions differ significantly from others. Section 9.1: Simulation-based approach for comparing more than two groups with a quantitative response. You should know how multiple means can be compared using simulation-based methods and how this overall type of test can be used to control the probability of a type 1 error. Again we used the MAD statistic (though this time with averages). This again gave us a null distribution that starts at 0 and is skewed to the right. Section 9.2: Theory-based approach for comparing more than two groups with a quantitative response method. We saw how the F-statistic can also be used to find out how far means are apart and how an F-distribution is easily modeled with a theoretical distribution. This test is called analysis of variance (ANOVA). You should know that an ANOVA test compares multiple means. You should know what two numbers make up the F-statistic. You should know how strength of evidence is related to standard deviation, difference in means and sample size. Like the last chapter, we again can follow up a significant result with pairwise confidence intervals to find which means are significantly different than other means. Sections 10.1-2: Inference for Correlation: Simulation-Based Approach. You should know how interpret information given in a scatterplot. You should know what correlation is measuring and idea of what the correlation value is given a scatterplot. You should know how a simulation-based test of significance can be used for correlation and how it is similar to what has been done in past chapters. Section 10.3: Least Squares Regression. You should know what a least squares regression line is and the basic idea of why it is the line of best fit. (minimizes the sum of the squared errors) You should be able to describe what the slope of a regression line means in the context of the variables. Sections 10.4: Inference for Regression: Simulation-Based Approach. You should know that the slope of the regression line can be used as the statistic to determine if there is an association between two quantitative variables. You should know how a simulation-based test of significance is performed for the slope of a regression line and how it is similar to what we did when we used correlation as a statistic. Section 10.5: Inference for Regression Slope: Theory-Based Approach. The theory-based test for association between two quantitative variables used slope as the statistic. Given output from the applet, you should be able to determine the p-value for the test. You should realize the p-value in the applet is two-sided. (To determine a one-sided p-value you just need to divide by two---assuming the direction is correct). You should know the technical conditions needed in order to have the theorybased test give valid results.