advertisement

Name: _______________________________________________

**This guide is provided as a brief overview of the course topics. Please be sure to spend additional time looking over your notes for any units that you need to review further. Unit review packets and tests should provide a nice overview as well. Do not hesitate to ask questions!!!

Graphics:

*Categorical *

Bar graph

*Quantitative *

Histogram

Pie Chart

Contingency Table

U:

S:

C:

S:

Marginal Distributions

Conditional Distributions

Parameter vs. Statistic

**Normal Models: **

Z-scores:

Stem and Leaf Plot

Dot plot

Box and Whisker Plot

Four things to look for in a scatterplot:

*Correlation coefficient:*

a numerical measurement (between -1 and 1) of the strength of the linear relationship between two quantitative variables.

Conditions to check:

CORRELATION ≠ _________________________________

*Lurking Variable:*

a hidden variable that stands behind a relationship and determines it by simultaneously affecting the other two variables.

*Linear Regression:*

Be able to interpret the slope and the intercept.

*Coefficient of Determination: R*

2

Regression Assumptions/Conditions:

Extrapolation: predicting

*y-*

values for “new”

*x*

-values.

Influential Points:

Means vary less than individual values.

If the scatterplot is not straight, try _______________________________________.

**Steps to running a simulation: **

1.

Identify the __________________________ to be repeated.

2.

Explain how you will model the component’s ____________________.

3.

Explain how you will combine the component’s to model a ____________________.

4.

State clearly what the _______________________________________ is.

5.

Run several trials.

6.

Collect and summarize the results of the trials (CUSS).

7.

State your conclusion, _____ ___________________!

**Sampling Methods **

*Simple Random Sample (SRS): *

Every possible sample of the size we plan to draw has an equal chance to be selected. Each individual and each combination of people has an equal chance of being selected.

*Stratified Sampling:*

Slice the population in __________________________________ (strata), then SRS within each stratum.

*Cluster Sampling:*

Split the population into similar parts or clusters. Select a few clusters and perform a

__________________ in each.

*Multistage Sampling:*

Combining several methods to sample.

*Systematic Sampling:*

Select individuals systematically. Start from a randomly selected individual. Use a pattern from there. (If order of the list can be associated with the responses sought, then this is not an appropriate sampling method.)

*Convenience Sampling:*

Members of the population are chosen based on the convenience of including them.

*Voluntary Response:*

Members of the population are invited to respond and all who do are included in the sample.

**Forms of Bias **

Voluntary Response bias

Undercoverage

Nonresonse bias

Response bias

**Observational Study:**

Researchers don’t assign treatments, but simply observe what is going on.

Valuable for discovering trends.

*Not*

possible to demonstrate a __________________________ _________________________.

Retrospective Study:

Prospective Study:

**Experiments:**

A study design that allows us to prove a cause-and-effect relationship. An experiment:

Manipulates factor levels to create _______________________________.

________________________ ____________________ subjects to these treatment levels.

Compares the responses of the subject groups across treatment levels.

Four Principles of Experimental Design:

Blinding:

Placebo:

Confounding Variables: when the levels of one factor are associated with the levels of another factor.

The best experiments are:

The

*Law of Large Numbers*

says that as the number of independent trials increases, the long-run relative frequency of repeated events gets closer and closer to a single value.

Probability Rules:

Be able to calculate probabilities from tree diagrams, Venn diagrams, and tables.

**Random Variables **

A random variable assumes a value based on the outcome of a random event. Discrete random variables can take one of a countable number of distinct outcomes. A continuous random variable can take any numeric value within a range of values.

*Expected Value: Standard deviation: *

Adding or subtracting a constant from data shifts the mean but doesn’t change the variance or standard deviation:

In general, multiplying each value of a random variable by a constant multiplies the mean by that constant and the variance by the

*square*

of the constant:

In general, the mean of the sum of two random variables is the sum of the means, and the mean of the difference of two random variables is the difference of the means.

If the random variables are

*independent*

, the variance of their sum

*or*

difference is

*always*

the

**sum**

of the variances.

**Probability Models **

*Bernoulli Trials:*

3 conditions… o o o

*Geometric Probability Model:*

Geom(

*p*

)

This model tells us the probability for a random variable that counts the number of Bernoulli trials until the first success.

Model:

Expected Value:

*The 10% Condition:*

Bernoulli trials must be independent. If that assumption is violated, it is still okay to proceed as long as the sample is smaller than 10% of the population.

*Binomial Probability Model:*

Binom(

*n*

,

*p*

)

This model tells us the probability for a random variable that counts the number of successes in a fixed number of Bernoulli trials.

Model:

Mean: Standard Deviation:

When dealing with a large number of trials in a Binomial situation, we can apply the Normal model as an approximation of the Binomial model if the Success/Failure Condition is satisfied.

*The Success/Failure Condition:*

A Binomial model is approximately Normal if we expect at least

10 successes and 10 failures.

**Sampling Distributions **

*Sampling Distribution Model for Proportions: *

Assumptions and Conditions

1. The Independence Assumption: The sampled values must be independent of each other.

Randomization Condition:

2. The Sample Size Assumption: The sample size much be large enough

10% Condition:

Success/Failure Condition:

*Sampling Distribution Model for Means: *

Same assumptions and conditions hold (randomization and 10% condition, but no success/failure condition). The sample needs to be large enough—think about the context of the situation and decide whether you believe the condition has been met.

**Central Limit Theorem:**

The sampling distribution model of the sample mean from a random sample is approximately Normal for large “n”, regardless of the distribution of the population, as long as the observations are independent. The larger the sample, the better the approximation will be.

Use your two summary sheets for all tests and confidence intervals, assumptions and conditions, df, etc.

General format:

1.

Hypotheses

2.

3.

4.

5.

Conditions

Name the Test!!

Mechanics (Show test statistic, p-value, and picture of the distribution)

Conclusion—in context!

P-value: The probability of getting results at least as unusual as the observed statistic, given that:

Significance level (α): Threshold for

*p*

-value.

Type 1 Error:

Type 2 Error:

Power of a test: The probability that the test correctly rejects a false null hypothesis.

Factors affecting power:

Know the difference between and when to apply

*z*

-models,

*t*

-models, and χ

2

- models.