Final Exam Review Topics

advertisement
Math 210
Review for Final
The final exam is cumulative. You may see output from applets, and you may need to use them. Look
over your old tests and quizzes. Also look through the explorations we completed this semester. Many
of them will contain concepts that will be tested on the final.
Our final exam is will take place in our classroom, 3031 of the Schaap Science Center, on Wednesday,
April 30 from 3:00 to 5:00 p.m.

Here is a summary of the data types needed to run each test.
o Single proportion: One categorical variable with two categories.
o Single mean: One quantitative variable.
o Matched pairs test: Dependent samples, subtract to obtain one quantitative variable
for the difference.
o Comparing means: Response is quantitative, explanatory is categorical with two
independent categories.
o Comparing proportions: Both response and explanatory are categorical with two
categories.
o Chi-square test for independence (association): Response and explanatory are
categorical. Either one can have more than two categories.
o ANOVA: Response is quantitative and explanatory is categorical with more than two
categories.
o Correlation and Regression: Response and explanatory are quantitative.
Preliminaries. We introduced the six step process of statistical investigation in this section. You should
understand this overall process. You should know the difference between anecdotal evidence and data.
We also looked at how variability is described using standard deviation as well as the shape of a graph.
You should know what probability is and how it is estimated. You should know how to describe the
observational units and variables (categorical and quantitative) that are involved in a study.
Section 1.1: Introduction to Chance Models. We put the statistical investigative process to use on some
specific examples in this section. We introduced the coin flipping applet to help us arrive at some sort of
a conclusion. You should be able to use this applet to help you arrive at a conclusion to a significance
test. You should also understand what each point in the dotplots produced represents and what the
entire distribution represents. We also introduced the three S strategy for measuring strength of
evidence to help formalize this process.
Math 210
Review for Final
Section 1.2: Measuring Strength of Evidence. In this section we allowed for things other than a 50-50
chance model. We also formalized the inferential process further in this section and added the terms
null and alternative hypotheses, p-value, and null distribution. We also introduce lots of symbols here
such as 𝑝̂ , 𝜋, H0, and Ha. You should know all these as well as the guidelines for strength of evidence
and how to write appropriate hypotheses and conclusions.
Section 1.3: Alternative Measure of Strength of Evidence. Here we introduced the standardized
statistic as an alternative to the p-value as a measure of the strength of evidence. You should know how
this is used, how to calculate the standardized statistic and what it means.
Section 1.4: Impacting Strength of Evidence. We introduced two-sided tests in this section and saw
how they impact the strength of evidence in a test. You should be able to use an applet appropriately to
do this. You should also be able to express an alternative hypothesis appropriately when it is two-sided.
Besides understanding how two-sided tests impact the strength of evidence, you should also know how
the distance between the value under the null hypothesis and the sample proportion as well as the
sample size impacts the strength of evidence of a test.
Section 1.5: Normal Approximation. You should know how a normal distribution can model the null
distributions we have seen in our applets and more broadly how a theory-based test differs from a
simulation-based test. You should know how to use the test of significance calculator we used in this
section. You should also know when the theory-based method is appropriate for a test of a single
proportion.
Section 2.1: Sampling from a finite population. You should know the difference between a population
and a sample as well as between a parameter and a statistic. You should know that simple random
samples are unbiased and that convenience samples can often be biased. A biased sample is on that
systematically over-represents some parts of a population and under-represents other parts. You
should also know the effects of sample size on the p-value in a study.
Section 2.2: Inference for a Single Quantitative Variable. You should know that median is a resistant
measure of center while mean is not. You should know the relationship between the mean and median
of a distribution that is skewed. You should know the validity conditions for a theory-based test for a
single mean. You should be able to carry out and understand a theory-based test for a mean.
Section 2.3: Errors and Significance. You should know what type I and type II errors are and be able to
describe them in the context of a study. You should also know that the significance level is the
probability of making a type I error. You should know that the probability of a type II error can be very
high particularly with small sample sizes.
Section 3.1: Statistical Inference — Confidence intervals. You should be able to use repeated tests of
significance to find a range of plausible values for the population proportion. You should know how the
width of a confidence interval changes depending on the confidence level.
Math 210
Review for Final
Section 3.2: 2SD and Theory-Based Confidence Intervals. Using the standard deviation created by a null
distribution is another way to approximate a 95% confidence interval by using the formula 𝑝̂ ± 2(𝑆𝐷).
Just as in the previous chapter, we saw that the standard deviation of a null distribution can be
predicted. Because of this, confidence intervals can be easily found using the Theory-Based Inference
applet (that uses a normal distribution). You should be able to find these intervals and also understand
what the intervals represent and how the confidence level affects the width. You should know what the
margin of error is and how to calculate it given the endpoints of a confidence interval. Given a p-value
for a test, you should know if the value under the null is contained in a certain confidence interval. Also,
given a confidence interval, you should be able to determine whether or not the results of a test will be
significant.
Section 3.3: Confidence Intervals for a Single Mean. You should be able to determine to determine a
theory-based confidence interval for a population mean using the Theory-Based Inference Applet and
understand what the confidence interval is estimating.
Section 3.4: Factors that affect the width of a confidence interval. You should know how level of
confidence, sample size, sample proportion, and sample mean influence the width of a confidence
interval. You should also know that if you repeatedly resample from a population and compute 95%
confidence intervals each time, 95% of the resulting intervals will contain the population parameter in
the long run.
Section 4.1: Association between Variables. You should know what explanatory and response variables
are and how to set up and read a two-way table. You should know how to calculate appropriate
conditional proportions from a two-way table. You should know that while observational studies can
show an association, they cannot be used to show causation because of potential confounding variables.
Section 4.2: Observational Studies versus Experiments. You should know that unlike observational
studies, experiments control for confounding variables through random assignment. Random
assignment allows potential confounding variables to be evenly distributed between the treatment
groups. This allows us to make cause and effect conclusions from experiments.
Section 5.1: Comparing two groups: Categorical response. We again looked at how categorical data
can be summarized in two-way tables and how conditional proportions should be used to compare the
groups. We also saw that when variables show no association, their conditional proportions are all the
same.
Section 5.2: Comparing two proportions: Simulation-based methods. In this section, we used cards
and an applet to investigate a test of significance involving the difference in two proportions. You
should be able to understand this process of developing a null distribution. You should be able to use
the Inference for 2X2 Tables applet and also understand what it is doing to develop a null distribution.
You should also be able to interpret the results from such a simulation. You should also be able to
determine a 95% confidence interval using the 2SD method with the standard deviation from a null
distribution and understand what the confidence interval is describing.
Math 210
Review for Final
Section 5.3: Comparing two proportions: Theory-based methods. You should know how to conduct a
theory-based test to compare two proportions using the theory-based applet and the validity conditions
needed to obtain accurate results. You should know how strength of evidence is affected by the
difference in the proportions and the sample size. You should know how to find and interpret a
confidence interval for the difference in two proportions and how these are related to the p-value of the
significance test.
Section 6.1: Exploring quantitative data. You should know mean and median are and how each is
affected by skewed data and outliers. You should also be able to compare parallel dotplots both in
terms of center and variability.
Section 6.2: Comparing two averages: Simulation-based methods. You should understand how a
simulation-based test for comparing two means is conducted using cards and through the SimulationBased Test for a Quantitative Response applet. Just as with proportions, you should know how to find a
95% confidence interval for the difference in means using the 2SD method.
Section 6.3: Comparing two averages: Theory-based methods. You should know how to conduct a test
of significance for the difference in two means using theory-based methods and the validity conditions
needed to obtain accurate results. You should know how strength of evidence is related to standard
deviation, difference in means, and sample size. Just as in proportions, you should know how
confidence intervals for differences in means are related to the results from a test of significance.
Section 4.3: Paired designs. With a paired design, the response variable becomes the difference in
measurements between the two treatments. This can often lead to less variability than the original
measurements and thus can lead to stronger evidence against the null hypothesis.
Section 7.1: Simulation-Based Approach for Analyzing Paired Data. You should know the difference
between independent samples and dependent samples and how a simulation-based paired data test is
run. In this type of test randomly chosen pairs of data are switch. This switching makes their difference
to be the negative of what it originally was. We again had the 2SD method of finding confidence
intervals for the mean difference in the population in this section.
Section 7.2: Theory-Based Approach for Analyzing Data from Paired Samples. You should know how a
theory based test is run and that it is basically a test for a single mean where the null hypothesis has the
mean difference equal to 0. To run this test, we should have a sample size of at least 20 or fairly
symmetric data in our sample.
Section 8.1: Simulation –Based Method to Compare Multi-Category Categorical Variables. You should
understand how categorical data is summarized in two-way tables and be able to compute appropriate
conditional proportions from these tables. You should also be able to compute the MAD statistic and
understand what it is measuring. Know that no association between the explanatory and response
variables means that the population proportions are all the same. Also remember that the null
distribution in this section started at 0 and was skewed to the right.
Math 210
Review for Final
Section 8.2: Theory –Based Method to compare multi-category categorical variables. In this section
we saw that a chi-square statistic could also be used to determine how far apart proportions are from
each other. Using this statistic, we can use theory-based methods to predict what the null distribution
would look like and easily find a p-value. If the results are found significant, we can follow this up by
finding pairwise confidence intervals for differences in proportions to find out exactly which proportions
differ significantly from others.
Section 9.1: Simulation-based approach for comparing more than two groups with a quantitative
response. You should know how multiple means can be compared using simulation-based methods and
how this overall type of test can be used to control the probability of a type 1 error. Again we used the
MAD statistic (though this time with averages). This again gave us a null distribution that starts at 0 and
is skewed to the right.
Section 9.2: Theory-based approach for comparing more than two groups with a quantitative
response method. We saw how the F-statistic can also be used to find out how far means are apart and
how an F-distribution is easily modeled with a theoretical distribution. This test is called analysis of
variance (ANOVA). You should know that an ANOVA test compares multiple means. You should know
what two numbers make up the F-statistic. You should know how strength of evidence is related to
standard deviation, difference in means and sample size. Like the last chapter, we again can follow up a
significant result with pairwise confidence intervals to find which means are significantly different than
other means.
Sections 10.1-2: Inference for Correlation: Simulation-Based Approach. You should know how
interpret information given in a scatterplot. You should know what correlation is measuring and idea of
what the correlation value is given a scatterplot. You should know how a simulation-based test of
significance can be used for correlation and how it is similar to what has been done in past chapters.
Section 10.3: Least Squares Regression. You should know what a least squares regression line is and the
basic idea of why it is the line of best fit. (minimizes the sum of the squared errors) You should be able
to describe what the slope of a regression line means in the context of the variables.
Sections 10.4: Inference for Regression: Simulation-Based Approach. You should know that the slope
of the regression line can be used as the statistic to determine if there is an association between two
quantitative variables. You should know how a simulation-based test of significance is performed for
the slope of a regression line and how it is similar to what we did when we used correlation as a statistic.
Section 10.5: Inference for Regression Slope: Theory-Based Approach. The theory-based test for
association between two quantitative variables used slope as the statistic. Given output from the
applet, you should be able to determine the p-value for the test. You should realize the p-value in the
applet is two-sided. (To determine a one-sided p-value you just need to divide by two---assuming the
direction is correct). You should know the technical conditions needed in order to have the theorybased test give valid results.
Download