Exam Hints from Jean Swaynos - Bowling Green Independent Schools

advertisement
Matching Histograms, Box Plots, and Normality Plots

1
WHAT IF YOU DO NOT KNOW THE STANDARD DEVIATION OF THE
POPULATION?

What are t-distributions?
When you don’t know the standard deviation of the population you use the standard deviation of
the sample. The majority of time you do not know the standard deviation of the population so
most times on the MEAN side you will be using the t-distribution.
The majority of time when you are on the mean side you will be using a t-distribution and when
you are on the proportion side you will be using a z score. The t distribution will only go with
mean problems because you are working with a standard deviation. On the proportion side you
are CREATING the standard deviation based on the sample size.
When dealing with a t-distribution there are two values that have variation, the mean and
the standard deviation. Every sample will most likely have a different mean and different
standard deviation based on the sample.
The CLT tells us that the sampling distribution of sample means approaches a Normal
model as n increases. That's based on the assumption that we know sigma. In the real
world, that's exceedingly rare. Not knowing sigma forces us to use the sample's s, and
that means we're no longer playing by CLT rules. We have to take this extra uncertainty
into account, and Gosset's t does exactly that. The critical issue in deciding between z and
t is the fact that we don't (and indeed can't) know sigma. In the real world, inference for
means ALWAYS requires t.
The rule of thumb suggesting that one should use z if n > 30 is a pre-technology
approximation that's no longer necessary. No matter how large n is, we use t. Before we
had stats software or calculators, finding values of t for any number of df would have
required many pages of tables. The easy out was to print only one page of the t-table
(usually up to about 30 df) and then tell people to switch over to z. That's not because the
sampling distribution miraculously changed, it's because the t distribution approaches a
Normal as n increases, and 30 was a convenient time to say "okay, it's close enough
now".
The book suggesting this approach is out of date We no longer have to resort to this
approximation, because the calculator can work with any number of df and that lets us
always (and properly) use t for inference for means. Suggesting that we should
sometimes use t and sometimes z confuses kids. Don't go there. We won't know sigma. t
is for means, z is for proportions. Keep it simple.
Jeane Swaynos
AP Workshop July 2008
1
Matching Histograms, Box Plots, and Normality Plots
2
There is a family of graphs for the t-distributions and each one depends on the size of the
sample.
T distributions
This is a graph of the NORMAL distribution
This is a graph of the tdistribution with 2 degrees
of freedom. Do you
notice that the tails are
thicker on the t
distribution
This is a t distribution with 10 degrees of freedom. As the degrees of freedom
increase the sampling distribution is closer to a normal distribution. The degrees of
freedom is found by taking your sample size –1. For example is your sample size is
15 your degrees of freedom would be 14 so you would look at the distribution for t =
14.
Look at the chart in your book and find the t-distribution table. Do you notice that
there are values along the left side? These represent the degrees of freedom.
Jeane Swaynos
AP Workshop July 2008
2
Matching Histograms, Box Plots, and Normality Plots
3
These are additional assumptions necessary when using the “t” distribution. If you
are given the data points you must graph the points and discuss what you are
looking for on the graph. If you are not given the data points then you will have to
assume that these conditions occur. Here are the conditions
1. n< 15 distribution approximately Normal
With a “t” distribution you have additional assumptions. If your sample size is
less than 15 then you have to assume that the population of interest is
NORMAL. Think about this, it would be difficult to show normality with only a
small sample size. If the data is provided the reader still expects you to display
the data and comment on the histogram.
2. 15<= n < 30
If your sample size is between 15 and 30 your data does not necessary have to be
normal but you cannot have skewness or outliers. Again, if the data is given
show a histogram and describe what you see.
3. n > 30
If your sample size is greater than 30 then it does not have to be approximately
normal and it can also have some skewness in either direction. You will still
have to be cautious of outliers so you will want to state this as part of your
assumptions.
You will choose one of the assumptions above based on your sample size. You will
still need to address the other two assumptions:


You assume this is a SRS from your population of interest
You assume that your sample independent, which means that it, is less than
10% of the entire population.
The assumptions on the AP test are graded as right or wrong. If one little piece for
example if you do not define the population in the SRS or you do not state the
correct information involving your sample size then it is wrong, there is no partial
credit.
Jeane Swaynos
AP Workshop July 2008
3
Matching Histograms, Box Plots, and Normality Plots
4
One sample T Confidence Interval
Mrs. Swaynos took a random sample of gas prices around the area and found the
price per gallon of regular gasoline was as follows:
3.12
3.21
3.34
3.67
3.78
3.10
3.12
3.32
3.45
3.56
3.78
Make a 95% confidence interval for the data and interpret the results.
Show all parts
Define the statistic because that is what you are using in the formula
_
x
= the average price of regular gasoline per gallon FROM MY SAMPLE
Assumptions
1. I assume this is a simple random sample of locations
2. I assume that the 11 samples are independent of each other
Since n< 15 and I am using a t distribution I will assume the population of interest is
normal. Below is a normality plot to show the 11 data points and the histogram
shows a slightly skewed distribution. It is difficult to show data is normal with such
a small sample size. I will, therefore, assume that the population of all gas prices is
normally distributed. Be sure to describe your graph
You could either make a normality plot or a histogram to show the data is normal.
Mechanics (Name or Formula with substitution, df, and the specific interval)
Note at the start of the year I do not give the students a choice, they must give the formula
with the correct substitutions. If they do not have t-inverse on their calculator have them
use the chart.
Jeane Swaynos
AP Workshop July 2008
4
Matching Histograms, Box Plots, and Normality Plots
_
95% CI =
x t *
5
sd
n
95% CI = 3.4 2.228
.26
11
( 3.23 to 3.58)
Conclusion
Interpretation of the INTERVAL
I am 95% confident that the true price of gas in Florida is between $3.23 and $3.58
per gallon.
Interpretation of the LEVEL
If I did this process again and again, I would capture the true mean of gas prices
approximately 95% of the time in the various intervals.
Two Sample T-Test
A teacher wants to know if the method of instruction affects how well students learn. Using two
classes of the same level of statistics, she teaches one class using lecture only and the other class
using lecture and group work. She measures the level of learning by giving both classes the same
test. Assuming that the two classes are representative of all statistics students, what type of
inference procedures should be used? State the hypotheses for the appropriate test and
identify the inference procedure you would use. Justify your response and include
comments on the design of the study.
Solution
Procedure type: Two-sample t-test
H o :  L  G
Where L represents the mean score of the tests in the class where lecture
H a :  L  G
only was used and  G represents the mean score of the tests in the class
where lecture and group work were both used.
Notes
The response variable, scores of individual students, is numerical, and there are two independent
groups, classes with lecture only and classes with both lecture and group work. This leads us to
conclude it is a difference of means two-sample t problem. The teacher is looking to see if one
method is different from the other which would indicate a two-tailed test.
Matched Pair T- Test
Situation
Jeane Swaynos
AP Workshop July 2008
5
Matching Histograms, Box Plots, and Normality Plots
6
Having done poorly on their math final exam in June, six students repeat the course in summer school, and
then take another exam in August. If we consider these students representative of all students who might
attend this summer school in other years, do these results provide evidence that the program is worthwhile?
Show all parts
June
Aug
54
50
49
65
68
74
66
64
62
68
62
72
This is a matched-pair because we are taking two measurements from one
experimental unit. The treatment is the summer course remediation. Any pre and
posttest is a Matched Pair design and we look at the Difference. This is really a onesample t test on the difference. We call it a matched pair because the data of the
difference comes from two data points. These data points, however, are NOT
independent. The same person is taking the test in June and in August.
What to look for when making a decision about whether to use a “Matched Pair” test for
the difference or a “Two sample” test:



Data must be paired for a matched pair test. Pairing is not a problem, it is an
opportunity. The independent assumption is violated but we can actually do
much better than the two-sample t-test. After all, we should be focusing on the
changes.
You make a decision about whether the data is paired from understanding how
they were collected and what they mean. There is no test to determine whether
the data are paired. This comes from reading the problem
Mechanically, a matched pair t-test is just a one-sample t-test for the means of
these paired differences. The sample size is the number of pairs.
Things to remember
 Don’t use a two-sample t-test for paired data.
 Don’t use a paired-t method when the samples aren’t paired.
When two groups do not have the same number of values, it’s pretty easy to see
that they can’t be paired. But just because two groups have the same number of
observations doesn’t mean they can be paired even if they are shown side-by-side
in a table. We might have 25 men and 25 women in the study, but they might be
completely independent of one another.
 There is most often less variability in the matched pair design then the two sample
t test.
 Matching pairs generally removes so much extra variation that it more than
compensates for having only half the degrees of freedom.
Decide whether the following situations are paired or two samples.
.
Jeane Swaynos
AP Workshop July 2008
6
Matching Histograms, Box Plots, and Normality Plots
7
Define Parameter
 D - This is the AVERAGE DIFFERENCE between the Test score in August – Test score in
June.
Mean of the differences!
Null and Alternative in terms of the PARAMETER
H o D  0
H A D  0
Assumptions
Assumptions
1. This is a simple random sample of students
2. The six students are independent of each other.
3. Since I am using a t-distribution with only 6 points and I have the data points I MUST
SHOW A GRAPH OF THE DATA. I will assume the population is NORMAL.
OR
Since the Normality plot shows an outlier. I will assume that the POPULATION of interest
is normal. The histogram does not appear normal because there is a gap. It is difficult to
show that a sample size this small would be normal. That is why we assume that the
population of interest is normal.
Many students did not include a normality plot or a histogram.
Test statistics, type of test by name of formula, pvalue, degrees of freedom
Matched Pair t-test t score is 1.75
degrees of freedom is 5
P(t >1.75) = .0699
T=
5.33  0
 1.75
7.44
6
Conclusion
There is approximately a .07 probability that a DIFFERENCE of 5.33 or larger would happen by
chance alone if the true difference in scores from August – June was 0. I will NOT reject the
Null at  = .05
Jeane Swaynos
AP Workshop July 2008
7
Matching Histograms, Box Plots, and Normality Plots
8
Proportion Problem (Two Sample)
Notes about the standard error the two proportion test and two-proportion confidence interval
The standard error for a two-proportion test is
pC (1  pC ) 1 1
(  )
1
n1 n2
pC This is the combination of the proportions. The book does not use this notation
but I use it to say it is a combination of both proportions for example
p1 
12
15
27
and p2 
then pC 
30
40
70
When you are doing a TEST you are stating that the two proportions are equal so
you can pool the standard error. When you pool the proportions you combine both
the values and create a proportion based on both samples.
Confidence Interval
When you are doing a confidence interval this is not the case. In this situation the
p (1  p1 ) p2 (1  p2 )
standard error is ( 1

n1
n2
So the formula for a two-proportion confidence interval is

p1 (1  p1 ) p2 (1  p2 )

n1
n2
confidence interval. Remember how to find z*
____% CI =
p
 z*
(
the z* has to match the
(1-c) / 2. This will determine the area under the curve at each tail. Use InvNorm to
find the z* matching the confidence interval.

Remember for a confidence interval you will be defining the p from your sample.
Jeane Swaynos
AP Workshop July 2008
8
Matching Histograms, Box Plots, and Normality Plots
9
Situation
A Gallup Poll taken in May 2000 asked the question: “In general, do you feel that the
laws covering the sale of firearms should be made: more strict, less strict, or kept as they
are now?” Of the n = 493 men who responded, 52% said “more strict,” while of the n =
538 women who responded, 72% said “more strict.” Assuming these respondents
constitute random samples of U.S. men and women, is there sufficient evidence to
conclude that a higher proportion of women than men in the population think these laws
should be made stricter? Justify your answer.
Procedure type: Difference of two proportions z test
H o : pw  pm  0
H o : pw  pm
OR
H a : pw  pm  0
H a : pw  pm
where pm and pw represent the proportion of men and women respectively who support
“more strict” laws in the sale of firearms.
Notes
Performing a large-sample difference of two proportions test solves this problem.
This is evident because there are two populations that we are studying, men and
women. The problem also asks students to find evidence of a higher percentage or
proportion of women than men who think that the sale of firearms should be
stricter, indicating a one-tailed test. Students may need to be reminded that for the
hypothesis test, in calculating the test statistic, students must use a pooled estimate
for the proportion. In a hypothesis test we are assuming the null hypothesis is true,
and the null hypothesis assumes population proportions for men and women are
equal.
Students need to check the conditions that the sample size is large enough. One rule
for checking this would be: nw  pw  10 , nw  (1  pw )  10 and nm  pm  10 ,
nm  (1  pm )  10 where nm and nw represent the number of men and women
respectively. Some texts use: nw  pˆ  10 , nw  (1  pˆ )  10 and nm  pˆ  10 ,
nm  (1  pˆ )  10 where p̂ represents the pooled estimate of p. Other texts use 5 in
place of 10.
Possible incorrect solutions would include difference of two means or Chi-Square.
Difference of means cannot be correct because we have proportions as opposed to
average percentages. If students are confused on this issue, ask them what the
original data must look like. Is it numerical or categorical? For answers of
categorical, tests of proportion are correct. For answers of numerical, t-tests are
appropriate. In this case the raw data must be in the form, “more strict,” “less
strict,” or “kept as they are now.” This is categorical data. If our original data
were presented as a list of percentages for many different samples (which is
numerical), a t-test would be used. Students could also attempt a solution using
Chi-Square. Since Chi-Square is always two-tailed and we are doing a one-tailed
test this would not be appropriate. Chi-Square could work as an alternative
solution if this example was not one-tailed.
Jeane Swaynos
AP Workshop July 2008
9
Matching Histograms, Box Plots, and Normality Plots
10
It is important that you define the Null and Alternative Hypothesis in terms of the
PARAMETER
State Null and Alternative in terms of parameters
HO: Pm = Pf
HA: Pm < Pf
Define Parameters in context
Pm = The proportion of males who think that the laws about gun control should be
MORE STRICT
Pf = The proportion of females who think the laws about gun control should be
made more strict
Give Assumptions for both groups Individually
Males
1. I assume this is a simple random sample of males from the population
2. I assume that the 493 males are independent
3. np> 10 .52(493) > 10 256 > 10 and n(1-p) > 10 493(.48) > 10 236 > 10
Show numerical values
Females
1. I assume that this is a simple random sample of females from the population
2. I assume that the sample of 538 females are independent of each other
3. np> 10 of .72(538) > 10 387 > 10 and .28(538) > 10 150> 10
Show numerical values
4. I assume that the 493 males sampled are independent of the 538 females
Give formula with test statistic, zscore and pvalue
All the values you need for the formula are on the calculator screen
Jeane Swaynos
AP Workshop July 2008
10
Matching Histograms, Box Plots, and Normality Plots
phatm  phat f
Z=
pc (1  pc )(
1
1

)
nF nM
=
11
.52  .72  0
 6.623
1
1
.62(.38)(

)
538 493
If you entered the females first you would have a positive z-score and the p value
would be the same.
Remember the p-value will never be exactly zero even
though that is what the calculator gives. There is
always some small tiny area so have the p-value
APPROXIMATELY equal to zero.
If you had done females – males the z value would be
+ 6.623 and this would represent the right area of the
tail. The p-value would be the same approximately
zero.
Conclusion (pvalue, difference, context, reject or not reject, alpha level)
There is approximately a 0% probability that a DIFFERENCE of .20 or larger would
happen by chance alone if the true difference between males and females that prefer a
stricter gun control law were 0. I will reject the Null at  =.05
Jeane Swaynos
AP Workshop July 2008
11
Matching Histograms, Box Plots, and Normality Plots
12
Chi Square Test
Chi Square test is used when you have categorical data and are comparing a number of
proportions. There are two basic types of Chi Square Test
 Test of Goodness of Fit
 Test for Independence
 Test for Homogeneity
The Goodness of Fit is testing whether a particular distribution is as stated. This is
comparing many different proportions. Only one of the proportions has to be statistically
off, not all of the proportions. These types of problems only have one row and 2 or more
columns.
The Test for Independence also, known as the Test for Association is testing whether there
is a relationship between the two categories for example gender and grades. The problem
will have more than one row.
(Frog Problem 2009 #3)
What does the Chi Square graph look like?
Chi Square for each degree of freedom has a different type of graph.
As the degrees of freedom increase the graph becomes less skewed and is not as tall. You
may get significance with a higher degree of freedom due to the area of the tail. You will
only be using a one sided test for Chi Square
The Goodness of Fit test has one row and the Null state that the distribution is as the
manufacture states or perhaps as the newspaper states.
Example Problem
Arnold Palmer states that the percentage of hair color for a random sample of newborn
babies is as state
10% Red 20% Black 20% Brown and 50% Bald
and you take a random sample of 200 babies and find the result
25 red
Jeane Swaynos
60 black
50 Brown
65 Bald
AP Workshop July 2008
12
Matching Histograms, Box Plots, and Normality Plots
13
You could do a Chi Square Test for Goodness of Fit
HO: The distribution of newborn hair color is as Arnold Palmer states: 10% red,
20% Black, 20% Brown, and 50% Bald
H A:
The distribution of newborn hair color is NOT as Arnold Palmer states: 10% red,
20% Black, 20% Brown, and 50% Bald
Assumptions
1. The data are COUNTS
2. We assume this is a simple random sample of 200 newborn babies
3. We assume the 200 babies are independent of each other
4. All expected counts are greater than 5. (20, 40, 100)
YOU MUST SHOW THE EXPECTED COUNTS.
There are two ways to look at assumption 4. We can say that all expected counts are
greater than 1 and no more than 20% of the expected counts are less than 5.
This is about EXPECTED counts not the actual data.
The degrees of freedom is the number of columns you have minus 1: column – 1
List the data in a table.
Observed
Expected
Red
25
.1(200) = 20
Black
60
.2(200) = 40
Brown
50
.2(200) = 40
Bald
65
.5(200)=100
(Observed  Expected )2
Expected
2
(O  E )
This is the symbol for Chi SquareX2 = 
=
E
(25  20) 2 (60  40) 2
(50  40) 2 (65  100) 2
= ___26____



20
40
40
100
Chi Square is found by taking the total of
1.25 + 10 + 2.5 + 12.25 = 2 6
Jeane Swaynos
AP Workshop July 2008
13
Matching Histograms, Box Plots, and Normality Plots
14
Parts of the Chi Square Test


State the Null and Alternative ( If this is written in words then you have already
defined the variables)
Give the assumptions
1. The data are counts
2. SRS of ______ from _____
3. The samples are independent
4. All expected counts are greater than 5 ( show expected counts)

Chi Square Value, degrees of freedom and P Value

Conclusion
There is a ____% probability that a chi square value of ____ or larger
would happen by chance alone if the true distribution were _____. I
______ reject the Null at  =.05 (Be sure to connect the conclusion to the
context of the problem)
OR
There is a ___% probability that I would get this Chi Square value or larger
by chance alone. Therefore, I _____ reject the Null at alpha = .05.
Therefore it appears that ---- and ---- are or are not independent.
M&M Activity (on your own)
Show all parts for this activity, including the table with observed and expected counts.
Show all the mechanical steps to find your Chi Square value
M&M Plain
Brown 13%
Yellow 14%
Red
13%
Blue 24%
Orange 20%
Green 16%
M&M Peanut
Both Types
Brown 12%
Yellow 15%
Red
12%
Blue 23%
Orange 23%
Green 15%
Peanut Butter
10 Brown
20 Yellow
10 Red
20 Blue
20 Orange
20 Green
Dark Choc. Uniform Distribution
Review of Chi Square
\
There are several procedures to compare the distribution of categorical data. When a
categorical variable has multiple categories, when there are two categorical variables under
Jeane Swaynos
AP Workshop July 2008
14
Matching Histograms, Box Plots, and Normality Plots
15
consideration, or when there are multiple populations under study, a Chi Square test is
used.
Goodness of Fit: One Row. One categorical variable with multiple categories from ONE
population
Test for Homogeneity: More than one row. (One categorical variable with multiple
categories from two or more populations) The test compares the distribution of sample
counts with the hypothesized distribution of the population assume the populations have
identical distributions
Test of Independence: More than one row
(Two categorical variables with multiple categories from ONE population) The test
compares the distribution of sample counts with the hypothesized distribution of the
population assuming the two variables are independent
Test for Homogeneity
Situation:
A certain brand of bits-size candies comes in three varieties: creamy, crispy, and chewy.
The manufacturer is interested if preference for the types of candies differs between three
school age groups: elementary, middle, and high school. Random samples at three local
schools, one of each age group are taken and the sample data compiled in the table below
Population
Elementary
Middle
High School
Variety
Creamy
33
21
16
Crispy
14
16
12
Chewy
19
17
32
*Do you notice that I have three different samples here, so I am looking at three different
populations. This is what tells me it is a test for Homogeneity and not Association.
HO: P creamy in elementary = P creamy in middle school = P creamy in high school
Pc
= P crispy in middle school = P crispy in high school
rispy in elementary
P chewy in elementary = p chewy in middle school = P chewy in high school
HA : At least one of the statements in HO is not true
Chi Square test for Homogeneity
Assumptions IN CONTEXT
1. Data comes from an independent simple random sample: OK – each group of
students was randomly selected from their respective schools.
Jeane Swaynos
AP Workshop July 2008
15
Matching Histograms, Box Plots, and Normality Plots
16
2. The samples were independent or the size of the sample is less than 10% of the
population sizes: OK as long as there are at least 660 elementary, 540 middle, and
600 high school students in the respective populations.
3. All expected counts are as least 5 – see the expected counts below. It is important
that you indicate exactly where to find the expected counts.
Calculate the Chi Square test statistic
The expected counts are found by take the (Row Total)(Column Total) and divide this by
the (Total Total)
The degrees of freedom is (# of rows – 1) ( # of columns – 1)
Elementary
Middle
High School
TOTAL
Creamy
33
21
16
70
Crispy
14
16
12
42
Chewy
19
17
32
68
Creamy
(66)(70)/ 180
(54)(70)/180
(60)(70)/180
Crispy
(66)(42)/180
(42)(54)/180
(60)(42)/180
Chewy
(66)(68)/180
(54)(68)/180
(68)(60)/180
TOTAL
66
54
60
180
Expected Counts
Elementary
Middle
High School
(Observed  Expected )2
Chi Square is found by taking the total of
Expected
2
(O  E )
This is the symbol for Chi SquareX2 = 
=
E
The calculator will automatically store the expected counts in Matrix B. This will only
work when you have more than one row.
The conclusion
There is approximately a 2% probability that a result of
Would happen by chance alone if all the proportions of
creamy, crispy and chewy candy were the same for the
three types of schools. I will reject the Null at  =.05
Jeane Swaynos
AP Workshop July 2008
Elementary
Middle
High
School
Variety
Creamy
33
21
16
Crispy
14
16
12
Chewy
19
17
32
16
Matching Histograms, Box Plots, and Normality Plots
17
OR
There is a approximately a 2% probability that if the preference for candies were
distributed in the same proportion among all three age groups of students a chi-square
statistics 0f 11.552 or larger would occur. Reject the Null at  =.05
Test for Independence
The chi-Square test of independence is used to compare the distribution of sample counts
of two categorical variables from a SINGLE population to see if there is an
association between the variables. For instance, parents of incoming freshman in
a large school district were asked if they supported school uniforms. Parents were
classified by whether or not they favored uniforms, and by the type of uniform
policy.
Situation
According to the Orlando Sentinel the proportion of schools that allow cell phones depends on
the population of students at the school. This is the data they have collected. Complete an
appropriate significance test. Show all Parts
Cell Phone
Population
Less than 500
500-1000
1001-1500
1501-2000
2001-2500
2501-3000
Over 3000
Yes
12
11
16
15
18
12
34
No
23
5
34
17
24
32
30
Solution
This is a Chi Square test for Independence. There is a subtle difference in this problem. We are
assuming this is all from one sample and then we separated the schools and gathered the data. If
this was taken from individual samples then we would do a test for Homogeneity and be
comparing the proportions. Both Null and Alternatives will be accepted. The stem of the
problem was not clear in how the sample was gathered.
Null and Alternative
Ho The number of schools that allow cell phones is independent of the size of the school
HA The number of schools that allow cell phone use is not independent of the size
OR
Ho The proportion of cell phone use allowed is the same for each of the 7 different size schools
HA The proportion of cell phone use allowed is NOT the same for at least one of the 7 different size
schools.
Jeane Swaynos
AP Workshop July 2008
17
Matching Histograms, Box Plots, and Normality Plots
18
Assumptions
Assumptions
1. I assume this is a simple random sample of schools
2. I assume that the sample of schools are independent of each other
3. All expected counts are greater than 5. The approximate value of each is as follows:
14, 20, 6, 9, 20, 29, 13, 18, 17, 24, 18, 25, 26, 37
Show the expected counts. You get the expected counts by looking at Matrix B. You do
not have to do anything expect put the values in Matrix A. The calculator will do the rest.
You need to remember how the expected values are calculated because this could be asked on
a free response. They could also show you a computer printout of the values.
Define Chi Square test, give test statistic, degrees of freedom and p-value
Chi Square test for independence
Chi Square is 15.12
Degrees of freedom 6
P(X6 > 15.12) = .019
Conclusion
There is a .019 probability that I would get a Chi Square value of 15.12 or larger. I will
reject the Null at  =.05. It appears that the size of the school is not independent of
cell phone use.
Jeane Swaynos
AP Workshop July 2008
18
Matching Histograms, Box Plots, and Normality Plots
19
Test for Goodness of Fit
Example
According to the USA Today 20% of children are blond, 40% have brown hair, 10% have
red hair, and 30% have black hair.
Mrs. Swaynos has taken a random sample of children and found the following results
32 Blond
48 Brown
15 Red
40 Black
Is there evidence to think USA today is not correct
Parts to Chi Square for Goodness of Fit
1. Define Null and Alternative in words
2. Give the assumptions
 SRS of _____ from ____
 Data are counts
 N< 10% of the population
 All expected counts are greater than 1
 Not more than 20% of expected counts are less than 5
3. Give the Chi Square value showing the work
4. Give the p-value
5. Give the conclusion
There is a _______% probability that I would get a result of ____ OR LARGER by
chance alone if the true proportion were as stated. I _______reject the Null at alpha =
______. Make a statement that connects to the problem.
Jeane Swaynos
AP Workshop July 2008
19
Matching Histograms, Box Plots, and Normality Plots
20
Notes about Chi Square
There are three types of Chi Square problems Goodness of Fit, Test for Independence,
Test for Homogeneity
The difference between Independence and Homogeneity is that Independence comes
from one population. Homogeneity comes from two samples. The way the sample is
taken and the question you are trying to answer dictates the type of test.
A test for Goodness of Fit only has ONE row. A test for Independence and
Homogeneity has at least two rows. This type of test can be done on the calculator. The
goodness of fit must be done by hand.
Example for Goodness of Fit
According to USA today skittles are equally distributed. Mrs. Swaynos has opened a
package of skittles and found the following results 12 yellow 14 red 22 green and
28 purple. Do a statistical test for this.
Chi Square Test for Independence
According to Ms. Michalik the number of juniors and seniors that attend prom is as
follows:
Males
Females
Spend 0-100
21
4
Spend 101-250
32
32
Spend 251-400
12
23
Spend over 400
8
30
Chi Square Test for Homogeneity
Ms. Michalik took a sample from Seminole, Oviedo, and Lake Mary and found out how
much people spend on prom
Oviedo
Seminole
Lake Mary
Spend 0-100
34
12
5
Spend 101-250
56
25
34
Spend 251-400
123
231
129
Spend over 400
200
129
321
Assumptions
1. SRS of _______ from _______
2. SRS< 10% of the population
3. All expected counts are >1
4. No more than 20% of Expected Counts can be less than 5
5. Make sure the data are counts.
Degrees of Freedom (r-1) (c-1) for independence and homogeneity. For a goodness of fit it
is (c – 1) for degrees of freedom.
Conclusion
Jeane Swaynos
AP Workshop July 2008
20
Matching Histograms, Box Plots, and Normality Plots
21
There is a _p-value_____% probability that a result of __ or larger from your
sample_______would happen by chance alone if the true _Null______were _______. I _will
or will not_______reject the Null at alpha = .05
Conclusion must always connect to the context of the problems.
Hypothesis Test for Slope
The hypothesis test for slope allows us to determine if there is a useful linear relationship
between x and y in the population. That is, does the slope of the population model differ
from 0 – does y tend to change linearly with changes in x? If there is a linear relationship
between the two variables, the slope should not equal 0. The Null will state:
HO   0
Here  is used as the symbol for the slope of a linear regression. This is the same symbol
that is often use as a Type 2 error so be careful and make sure you define the variable of
interest.
The Null will most often say   0 but it could also say  <0 or  >0. The formula to
compute the test statistic (t score) is
t
b1
where b1 is the slope of the sample and SEb is the standard error of the slope
SEb1
1
The number of degrees of freedom is n-2 because you are looking at two variables for each
data set. The P-value will be computed using the t-distribution.
The assumptions
1. SRS of ____ from _____
2. The scatter plot looks linear
3. The residuals do not show a pattern
4. The residuals follow a NORMAL distribution
Conclusion
There is a _____% probability I would get a slope of ______ given the true slope is 0. I
______ reject the Null at  =.05 (Include a statement that ties this in with the
CONTEXT of the problem)
Jeane Swaynos
AP Workshop July 2008
21
Matching Histograms, Box Plots, and Normality Plots
22
Key Words in Advance Placement Statistics Questions
If it says…………
Describe the distribution (numerical)
Describe the distribution (Categorical)
Compare the distributions
Design an experiment
Then you must address…..
 Center
 Shape
 Spread
 Gaps
 Outlier
G-SOCS
 Make sure the graph is appropriate
compare
 Compare each of the sets ( 2009 #1)
 Label and Scale
 If the data are not the same size then
you must use relative frequency as the
y axis
 Use words like bigger, smaller, wider
 You must compare the center, shape
and spread
 When comparing the shape just list the
types of distribution
 Be very careful using the word
NORMAL, not all symmetric
distributions are NORMAL but all
NORMAL distributions are symmetric.
 Random -allocation of experimental
units to treatments RAT
 Define what tool you will use to
randomize
 Same size groups
 Clearly define groups
 Replication – enough experiment units
 Control – Control for lurking
variables (often by blocking)
 Draw picture
 Define what you will compare and give
units
 Write in sentences
What additional information…….

Based on the diagram above…….

Based on parts a and b…….

Give appropriate statistical evidence

Provide evidence
Jeane Swaynos
You must state NEW information not
what is previously stated
You must use the graph or diagram that
is provided
You must use your answers from the
previous parts
Must perform a test of significance,
Hypothesis test choices
o One sample t
o Two sample t
AP Workshop July 2008
22
Matching Histograms, Box Plots, and Normality Plots
23
Conduct an appropriate analysis

o One sample proportion z test
o Two sample proportion z test
o Chi Square
o Linear Regression
Must show all parts (Null &
Alternative, name test, assumptions,
test statistic, p value, df, conclusion)
In context of the question, drawing, table,
graph, etc.
You must connect back to the stem of the
problem.
Explain the slope
___ change in y is PREDICTED for every 1unit change in x.
CONTEXT
Fudge factor or wiggle word must be there
R2% of the variation in Y is explained by the
LSRL of Y on X. CONTEXT
Strength, direction, context
Means x or less
Means x or less
Means x or greater
Pvalue, alpha level, context, reject or not reject,
conditional probability statement
 Never accept the alternative
 Never reject the alternative
 Never accept the Null
Use a linear regression
Do not make your interval special.
Your are ____% confident that the true ____ is
between___ and ____
 This is referring to the process and all
the other intervals that were obtains.
 If I did this process again and again I
would capture the TRUE ____
approximately ___% of the time in the
various INTERVALS.
Few words are underline, pay attention to what
words the author chooses to underline. They
direct you to the answer they are looking for.
Strength, direction, placement of the data,
outliers, CONTEXT
Explain the coefficient of determination
Explain “r”
At most
No more than
At least
Conclusion/linkage
Predict
Confidence Interval
Confidence Level
Underlined words
Describe scatterplots
Jeane Swaynos
AP Workshop July 2008
23
Matching Histograms, Box Plots, and Normality Plots
24
Assumption and Conditions
Assumptions
Proportions (z)
 One Sample
1. Individuals are independent
2. Samples are sufficiently large

Two Samples
1. Samples are independent
2. Data in each sample are
independent
3. Both samples are sufficiently large
Means (t)
One sample (df = n-1)
1. Individuals are independent
2. Population distribution is normal

If data is given you must show a
normality plot or histogram and check
the data
Jeane Swaynos
Conditions that support the assumptions
1. SRS of ____ from _____and n <
10% of the population
2. np>10 and n(1-p) > 10 this is the
same as there are at least 10
successes and 10 failures in each
sample
1. (Think about how the data was
collected)
2. Both samples are SRS of ___ from
____ and n < 10% of the population
OR Random Allocation of
Treatment
3. np>10 and n(1-p) > 10 both each
sample or there are at least 10
successes and 10 failures in each of
the two samples
1. SRS of ___ from ---- and n < 10%
of the population
2. n < 15 data comes from a normal
population.
For 15 < n<30 the data has no outliers
or skewness
For n> 30 data this is considered
sufficiently large and outliers could
e a problem. The data does not
have to be normal
AP Workshop July 2008
24
Matching Histograms, Box Plots, and Normality Plots
25
Matched Pair (df = n-1)
1. Data are matched
2. Individuals are independent
3. Population of differences is Normal

If data is given you must show a
normality plot or histogram and check
the data of the differences
1. (Think about how the data was
collected)
2. SRS of ___ from ___ and n < 10%
of the population OR Random
Allocation of Treatment
3. n<15 Data comes from a normal
population
For 15 < n<30 the data has not
outliers or skewness
For n> 30 data this is considered
sufficiently large and outliers could
be a problem. The data does not
have to be normal
Two independent samples
(df from technology)
1. Samples are independent
2. Data in each sample are
independent
3. Both population have a Normal
distribution
4. Data is independent of each other
Jeane Swaynos
1. (Think about the design)
2. SRS of ___ from ___ and n < 10%
of the population OR Random
Allocation of Treatment for each of
the samples
3. The data for each sample is
approximately normal for n < 15
For 15 < n<30 both data sets has not
outliers or skewness
For n> 30 both data sets are is
considered sufficiently large and
outliers could be a problem. The
data does not have to be normal
4. (Think about how the data was
collected)
AP Workshop July 2008
25
Matching Histograms, Box Plots, and Normality Plots
26
AP Statistics
Writing Conclusions and Interpretations for Statistical Inference
Note: All conclusions must be connected to the context of the problem. The must include pvalue, alpha level, reject or not reject, and must state given the Null.
Interpretation of R-sqaure
______% of the variation in ______ is explained by the least square regression line of
___ on _____ (y, y, x)
Interpretation of SLOPE
The ______ will change by APPROXIMATELY _____ as ____ increases by 1.

Example;
y
= 3 +14x
y = mileage x = gas
The mileage will increase by approximately 14 as the gas increase by 1 gallon.
You may also define the slope as the PREDICTED slope instead of approximate slope
Know how to read a computer printout. Know how to find the standard deviation of the
residuals from the printout and how to interpret this in context to the problem
Interpretation of R
This is the correlation involves a LINEAR relationship and has strength and direction.
You must address all three of these issues.
Residual
Distance from the Observed – Expected. Points above the line are an UNDER prediction
and points below the line are an OVER prediction.
Confidence Interval
I am _____% confident that the true ______ is between ____ and ______
Confidence Level
If I repeated this process again and again I would capture the true _____ ___% of the
time in the various intervals.
Jeane Swaynos
AP Workshop July 2008
26
Matching Histograms, Box Plots, and Normality Plots
27
Conclusion for a Linear Regression Slope
With a p-value of ______ I will or will not reject the Null at the ___% level. I would get
this t score of ____ OR LARGER, SMALLER OR BOTH DEPENDING ON THE
PROBLEM by chance alone _____% of the time given that the TRUE slope was 0.
You could also say given that the x and y are independent.
Conclusion for one sample t test or z test
There is a _____probability that I would get a result of ____ or (larger, smaller, or both)
by chance alone given the true mean of ________(context)________ is ________I will
or will not reject the Null at the ___% level.
CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT
Conclusion for a two sample t test or z test
There is a ______ probability that I would get a difference of _______(larger, smaller,
or both) by chance alone given the true difference in ________context is 0. I will or will
not reject the null at alpha equal to .05.
CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT
Conclusion for one sample proportion
There is a _____ probability that I would get a proportion of ___________(larger,
smaller, or both) by chance alone given the true proportion of __________context
____________is _____. I will or will not reject the Null at alpha equal to .05.
CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT
Conclusion for two sample proportion
There is a _______ probability that I would get this proportion difference of _____ or
LARGER by chance alone given the true difference _________________(context)
___________is 0. I will or will not reject the Null at the ___% level.
CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT
Conclusion for Chi Square TEST FOR INDEPENDENCE (You have one sample and are
comparing two difference categories within the one sample)
There is a _____ probability I would get a Chi Square value of ___ or larger. I will or will
not reject the Null at alpha equal to .05. There is a _____ probability I would get (give
observed values here and context) With a p-value of ____ I will or will not reject the Null at
the __% level. I would get a chi square value of ___ by chance alone ___ % of the time
if the ___ and ___ were independent ( or I could say if the ___ and ___ had no
association)
CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT CONTEXT
Jeane Swaynos
AP Workshop July 2008
27
Matching Histograms, Box Plots, and Normality Plots
28
Conclusion for Chi-Square for TEST OF GOODNESS OF FIT (you have one sample and
are comparing the distribution to something that the manufacturer claims)
There is a ____ probability that I would get this Chi Square value ______or larger by
chance alone. I will or will not reject the Null at alpha equal to .05. There is a ____
probability I would get ( give the observed values and the context here) by chance alone
given the true proportions were (give expected here with context.)
Conclusion for Chi- Square for TEST OF HOMOGENITY (you have two samples and are
comparing the distributions of each)
With a p-value of _____ I will or will not reject the Null at the ___% level. I would get a
chi square value of ___ or larger by chance alone ___% of the time given that the two
(or more) distributions were the same. CONTEXT
Jeane Swaynos
AP Workshop July 2008
28
Matching Histograms, Box Plots, and Normality Plots
29
Here are few pointers and reminders to help you do well on the AP Statistics Exam.
The Exam
The AP Stat exam has 2 sections that take 90 minutes each. The first section is 40
multiple choice questions, and the second section is 6 (technically, 4 to 7, but it’s always
been 6) free response questions. Each section counts for half of the overall score. The last
free response question counts for
25% of the Section II score. You are allowed to use your calculator(s) throughout the
exam, and a standard set of formulas and tables is printed right in the test booklet for your
use.
General tips for writing free response answers
Understand your obligation as a test taker
You are being evaluated not only on the correctness of your answers, but also on your
ability to
communicate the methods you used to reach them. The answer is everything you write
down, not just the last line or number at the end. Convince the reader that you understand
the key concepts in the question. Don’t just give them the numbers and hope they will
assume you understand the concepts.
Be smart about multi-part questions
Most AP Stat questions have several parts. Read all the parts before you start answering
and think about how they might be related (sometimes they aren’t). If the last part asks
you to answer a question based on your results to the previous parts, be sure to actually
use your prior results to answer. If you couldn’t do one of the previous parts, make up an
answer and explain what you would have done.
Answer the question you are asked
The test writers spend over a year writing these questions. They word them carefully and
specifically. Spend more time reading and less time writing to make sure you really
understand what is being asked. When you have answered the question asked, stop
writing. They give you much more space than you need. Don’t panic because you haven’t
used all the space provided.
Answer in context
Most, if not all, AP Stat problems will have a real life context. Make sure your answers
include the context. This is especially important when defining symbols/variables and
writing conclusions.
Use vocabulary carefully
This isn’t English class. There’s no poetic license here. Terms like normal, independent,
and sampling distribution have specific meanings. Don’t say “normal” if you mean
“approximately normal” and
don’t mix up populations and samples in either words or symbols.
Leave enough time for the last question
Jeane Swaynos
AP Workshop July 2008
29
Matching Histograms, Box Plots, and Normality Plots
30
The last free response question counts for more points and is designed to take 20 to 30
minutes. At least read it first, and if you feel OK about it, go ahead and answer. If it looks
hard, you can save it for the end, but no matter what, when there are 30 minutes left in the
test, stop and go to the last question.
Relax
Having met many of the people who write the exam and grading standards, I can assure
they are not out to trick you. They write challenging but straightforward questions
designed to give you an opportunity to demonstrate what you have learned. Seize the
opportunity and do your best. Keep in mind that you only need to earn roughly 65 to 70%
(it varies from year to year) of the points on the exam to get a 5.
Collecting Data
There are 2 broad areas of data collection we cover in AP Stat, Experiments and
Sampling. You are expected to know some general concepts and specific techniques
related to each area.
Experiments vs. Samples
Many students confuse experimentation with sampling or try to incorporate ideas from
one into the other. This is not totally off-base since some concepts appear in both areas,
but it is important to keep them straight.
The purpose of sampling is to estimate a population parameter by measuring a
representative subset of the population. We try to create a representative sample by
selecting subjects randomly using an appropriate technique.
The purpose of an experiment is to demonstrate a cause and effect relationship by
controlling extraneous factors. Experiments are rarely performed on random samples
because both ethics and practicality make it impossible to do so. For this reason, there is
always a concern of how far we can generalize the results of an experiment. Generalizing
results to a population unlike the subjects in the experiment is very dangerous.
Blocking vs. Stratifying
Students (and teachers) often ask, "What is the difference between blocking and
stratifying?" The simple answer is that blocking is done in experiments and stratifying is
done with samples. There are similarities between the two, namely the dividing up of
subjects before random assignment or selection, but the words are definitely not
interchangeable.
Jeane Swaynos
AP Workshop July 2008
30
Matching Histograms, Box Plots, and Normality Plots
31
Blocking
In blocking we divide our subjects up in advance based on some factor we know or
believe is relevant to the study and then randomly assign treatments within each block.
The key things to remember:
1. You don't just block for the heck of it. You block based on some factor that you
think will impact the response to the treatment
2. The blocking is not random. The randomization occurs within each block
essentially creating 2 or more miniature experiments.
3. Blocks should be homogenous (i.e. alike) with respect to the blocking factor.
For example, I want to find out if playing classical music during tests will result in higher
mean scores. I could randomly assign half my students to the room with the music and
the other half to the normal room, but I know that my juniors consistently score higher
than my seniors, and I want to account for this source of variation in the results. I block
according to grade by separating the juniors and seniors first and then randomly assigning
half the juniors to the music room and the other half to the normal room. I do the same
with the seniors. For this design to be valid, I have to expect that each grade will respond
to the music similarly. In other words, I know that juniors will score higher, but I expect
to see a similar improvement or decline in both groups as a result of having the music. At
the end of my study I can subtract out the effect of grade level to reduce the unaccounted
for variation in the results.
You have learned how to analyze the results of one special type of blocked design,
namely, matched pairs. In matched pairs you subtract each pair of values which
eliminates the variation due to the subject. Similar techniques are available for fancier
blocked designs.
Stratified Sampling vs. Cluster Sampling
Many students confuse stratified and cluster sampling since both of them involve groups
of subjects. There are 2 key differences between them. First, in stratified sampling we
divide up the population based on some factor we believe is important, but in cluster
sampling the groups are naturally occurring (I picture schools of fish). Second, in
stratified sampling we randomly select subjects from each stratum, but in cluster
sampling we randomly select one or more clusters and measure every subject in each
selected cluster. (Note: There are more advanced techniques in which samples are taken
within the cluster(s))
Final Thoughts
It is especially important to stay focused when answering questions about design. Too
many students get caught up in minor details but miss the big ideas of randomization and
control. Always remember that your mission in responding to questions is to demonstrate
your understanding of the major concepts of the course.
Jeane Swaynos
AP Workshop July 2008
31
Matching Histograms, Box Plots, and Normality Plots
32
Describing Data
IQR is a number
Many students write things like "The IQR goes from 15 to32". Every AP grader knows
exactly what you mean, namely, "The box in my boxplot goes from 15 to 32.", but this
statement is not correct. The IQR is defined a Q3 - Q1 which gives a single value.
Writing the statement above is like saying "17 goes from 15 to 32." It just doesn't make
sense.
Be able to construct graphs by hand
You may be asked to draw boxplots (including outliers), stemplots, histograms, or other
graphs by hand. The test writers have become very clever and present problems in such a
way that you cannot depend on your calculator to graph for you.
Label, Label, Label
Any graph you are asked to draw should have clearly labeled axes with appropriate
scales. If you are asked to draw side-by-side boxplots, be sure to label which boxplot is
which.
Refer to graphs explicitly
When answering questions based on a graph(s), you need to be specific. Don¹t just say,
"The female times are clearly higher than the male times.", instead say, "The median
female time is higher than the first quartile of the male times." You can back up your
statements by marking on the graph. The graders look at everything you write, and, often,
marks on the graph make the difference between 2 scores.
Look at all aspects of data
When given a set of data or summaries of data, be sure to consider the Center, Spread,
Shape, and Outliers/Unusual Features. Often a question will focus on one or two to these
areas. Be sure to focus your answer to match.
It's skewed which way?
A distribution is skewed in the direction that the tail goes, not in the direction where the
peak is. This sounds backwards to most people, so be careful.
Slow down
The describing data questions appear easy, so many students dive in and start answering
without making sure they know what the problem is about. Make sure you know what
variable(s) are being measured and read the labels on graphs carefully. You may be given
a type of graph that you have never seen before.
Jeane Swaynos
AP Workshop July 2008
32
Matching Histograms, Box Plots, and Normality Plots
33
Inference
Not every problem involves inference
You have spent most if not all of this semester on inference procedures. This leads many
students to try to make every problem an inference problem. Be careful not to turn
straightforward probability or normal distribution questions into full-blown hypothesis
tests.
Hypotheses are about populations
The point of a hypothesis test is to reach a conclusion about a population based on a
sample from it. We don't need to make hypotheses about the sample. When writing
hypotheses, conclusions, and formulas, be careful with your wording and symbols so that
you do not get the population and sample mixed up. For example, don't write "Ho: x =
12" or "µ = mean heart rate of study participants".
Check Assumptions/Conditions
Checking assumptions/conditions is not the same thing as stating them. Checking means
actually showing that the assumptions are met by the information given in the problem.
For example, don't just write "np>10". Write "np=150(.32)=48>10". Everyone knows
you can do the math in your head or on your calculator, but writing it down makes it very
clear to the reader that you're tying the assumption to the problem rather than just writing
a list of things you memorized.
Confidence intervals have assumptions too
Confidence intervals have the same assumptions as their matching tests, and you need to
check them just as carefully.
Link conclusions to your numbers
Don't just say "I reject Ho and conclude that the mean heart rate for males is greater than
78." This sentence doesn't tell us why you rejected Ho. Instead, say "Since the p-value of
.0034 is less than .05, I reject Ho and ...”
Be consistent
Make sure your hypotheses and conclusion match. If you find an error in your
computations, change your conclusion if necessary. Even if your numbers are wrong, you
will normally get credit for a conclusion that is correct for your numbers. If you get
totally stuck and can't come up with a test statistic or p-value, make them up and say
what you would conclude from them.
Interpreting a confidence interval is different than interpreting the confidence level
Interpreting the confidence interval usually goes something like, "I am 95% confident
that the proportion of AP Statistics students who are highly intelligent is between 88%
and 93%" or "The superintendent should give seniors Fridays off since we are 99%
confident that between 72% and 81% of parents support this plan."
Jeane Swaynos
AP Workshop July 2008
33
Matching Histograms, Box Plots, and Normality Plots
34
Interpreting a confidence level usually goes something like "If this procedure were
repeated many times, approximately 95% of the intervals produced would contain the
true proportion of parents who support the plan."
Jeane Swaynos
AP Workshop July 2008
34
Matching Histograms, Box Plots, and Normality Plots
35
Regression
Graph First, Calculate Later
The most important part of the regression process is looking at plots. Regression
questions will frequently provide a scatterplot of the original data along with a plot of
residuals from a linear regression. Look at these plots before answering any part of the
question and make sure you understand the scales used.
Is it linear?
Remember that an r value is only useful for data we have already decided is linear.
Therefore, an r value does not help you decide if data is linear. To determine if data is
linear, look at a scatterplot of the original data and the residuals from a linear regression.
If a line is an appropriate model, the residuals should appear to be randomly scattered.
Computer Output
It is very likely that you will be given computer output for a linear regression. If you can
read the output correctly, these questions are normally easy. You should be able to write
the regression equation using the coefficients in the output and also be able to find the
values of r and r2. Most software packages provide the value of r2. If you are asked for
the value of r, you will need to take the square root and look at the slope to determine if r
should be positive or negative.
Interpreting r
If asked to interpret an r value, be sure to include strength, direction, type, and the
context. A good interpretation will be something like, “There is a weak positive linear
relationship between the number of math classes a person has taken and yearly income.”
After you make a 5, be sure to take more statistics in college.
Jeane Swaynos
AP Workshop July 2008
35
Download