Algebra 1 Summer Institute 2014 The Table Categorization

advertisement
Algebra 1 Summer Institute 2014
The Table Categorization
Summary
Goals
Participant Handouts
 Distinguish between
categorical data and
numerical data.
 Summarize data on two
categorical variables
collected from a sample
using a two-way
frequency table
 Given a two-way
frequency table,
construct a relative
frequency table and
interpret relative
frequencies.
 Calculate and interpret
conditional relative
frequencies from twoway frequency tables.
 Explain why association
does not imply
causation.
Technology
Source
Methods for analyzing
categorical data are
developed in this lesson.
Participants also work with
a random sample and build
on their understanding of a
random sample.
Materials
Paper
Colored Pencils
LCD Projector
Facilitator Laptop
Excel
GeoGebra
i-Clickers
Engageny.org
Stattrek.com
1. The Table
Categorization
Estimated Time
120 minutes
Mathematics Standards
Common Core State Standards for Mathematics
MAFS.8.SP.1: Investigate patterns of association in bivariate data
1.4: Understand that patterns of association can also be seen in bivariate categorical
data by displaying frequencies and relative frequencies in a two-way table.
Construct and interpret a two-way table summarizing data on two categorical
variables collected from the same subjects. Use relative frequencies calculated for
rows or columns to describe possible association between the two variables. For
example, collect data from students in your class on whether or not they have a
curfew on school nights and whether or not they have assigned chores at home. Is
there evidence that those who have a curfew also tend to have chores.
1
Algebra 1 Summer Institute 2014
MAFS.912.S-ID.2: Summarize, represent, and interpret data on two categorical and
quantitative variables
2.5: Summarize categorical data for two categories in two-way frequency table.
Interpret relative frequencies in the context of the data (including joint, marginal,
and conditional relative frequencies). Recognize possible associations and trends
in the data.
Standards for Mathematical Practice
1. Make sense of problems and persevere in solving them
2. Reason abstractly and quantitatively
3. Construct viable arguments and critique the reasoning of others
4. Model with mathematics
5. Use tools appropriately
Instructional Plan
Categorical data are often summarized in the media, research studies, or general
discussions. However, categorical data are summarized differently than numerical data.
There is no mean or median that answers the question “What is your favorite soft drink?”
Methods for analyzing categorical data are developed in this activity.
The two-way frequency table is used to develop a relative frequency table that will allow
participants to compare the responses of males and females. However, the statistical
question is still not clearly answered. As participants complete the exercises in this
activity, they begin to see the need for conditional relative frequencies. Participants also
begin to understand how conditional summaries will be used to answer the statistical
question.
One-way Table
A one-way table refers to one variable. A one-way table is the tabular equivalent of a bar
chart. Like a bar chart, a one-way table displays categorical data in the form of frequency
counts and/or relative frequencies.
1. Example: using the i-clickers, ask participants for their favorite color between:
(Slide 2)
a.
b.
c.
d.
e.
Red
Orange
Yellow
Green
Blue
2
Algebra 1 Summer Institute 2014
2. Display the results as a bar chart using GeoGebra or Excel. (Slide 3)
3. Transfer the results to a frequency one-way table. Compare the table to the bar
chart.
Choice
Red Orange Yellow Green Blue
Frequency
4. When a one-way table shows relative frequencies (i.e., percentages or
proportions) for particular categories of a categorical variable, it is called a
relative frequency table. (Slide 4)
Convert the frequencies to proportions and then to percentages.
5. What statistical question could be answered with this data?
Two-way Tables
Statisticians use two-way tables and segmented bar charts to examine the relationship
between two categorical variables.
Entries in the cells of a two-way table can be displayed as frequency counts or as relative
frequencies (just like a one-way table). Or they can be displayed graphically as a
segmented bar chart.
6. Example: The two-way table shows the favorite leisure activities for 50 adults 20 men and 30 women. Because entries in the table are frequency counts, the
table is a frequency table.
Dance Sports Total
Male
4
16
20
Female
18
12
30
Total
22
28
50
7. Entries in the "Total" row and "Total" column are called marginal frequencies or
the marginal distribution. Entries in the body of the table are called joint
frequencies.
If we looked only at the marginal frequencies in the Total row, we might conclude
that the two activities had roughly equal appeal. Yet, the joint frequencies show a
3
Algebra 1 Summer Institute 2014
strong preference for dance among women, and little interest in dance among
men.
8. We can also display relative frequencies in two-way tables. The relative
frequencies in the body of the table are called conditional frequencies or the
conditional distribution. The tables below show preferences for leisure activities
in the form of proportions and relative frequencies. (Slide 6)
Dance Sports Total
Male
4/50 16/50 20/50
Female 18/50 12/50 30/50
Total
22/50 28/50 50/50
Male
Female
Total
Dance Sports Total
.08
.32
.40
.36
.24
.60
.44
.56
1.00
9. Two-way tables can show relative frequencies for the whole table, like the one
above. The following tables show relative frequencies for rows: (Slide 7)
Dance Sports Total
Male
4/20 16/20 20/20
Female 18/30 12/30 30/30
Total
22/50 28/50 50/50
Dance Sports Total
Male
.20
.80
1.00
Female
.60
.40
1.00
Total
.44
.56
1.00
The tables below show relative frequencies for columns.
Dance Sports Total
Male
4/22 16/28 20/50
Female 18/22 12/28 30/50
Total
22/22 28/28 50/50
Dance Sports Total
4
Algebra 1 Summer Institute 2014
Male
Female
Total
.18
.82
1.00
.57
.43
1.00
.40
.60
1.00
10. Each type of relative frequency table makes a different contribution to
understanding the relationship between gender and preferences for leisure
activities. For example, "Relative Frequency for Rows" table most clearly shows
the probability that each gender will prefer a particular leisure activity. For
instance, it is easy to see that the probability that a man will prefer dance is 20%;
the probability that a woman will prefer dance is 60%; the probability that a man
will prefer sports is 80%; and so on.
The relative frequency for columns show that there is a big difference in the
percent of women that prefer dance compared to men (82% to 18%), however, the
difference in sports is not so noticeable (57% for men and 43% for women).
11. The information can also be displayed in a segmented bar graph. The following
pictures shows the segmented or stacked bar graphs done in Excel:
Possible association based on Conditional Relative Frequencies (Slide 8)
Two categorical variables are associated if the row conditional relative
frequencies (or column relative frequencies) are different for the rows (or
columns) of the table. For example, if the selection of leisure activities selected
for females is different than the selection of leisure activities for males, then
gender and leisure activities are associated. This difference indicates that
knowing the gender of a person in the sample indicates something about their
activity preference.
The evidence of an association is strongest when the conditional relative
frequencies are quite different. If the conditional relative frequencies are nearly
equal for all categories, then there is probably not an association between
5
Algebra 1 Summer Institute 2014
variables.
Examine the conditional relative frequencies in the two-way table of conditional
relative frequencies. Note that for each activity, the conditional relative
frequencies are different for females and males.
Male
Female
Total
Dance Sports Total
.08
.32
.40
.36
.24
.60
.44
.56
1.00
Male
Female
Total
Dance Sports Total
.18
.57
.40
.82
.43
.60
1.00
1.00 1.00
1. For what activity would you say that the conditional relative frequencies
for females and males are very different?
2. For what activities are the conditional relative frequencies nearly equal
for males and females?
3. Suppose a person is selected at random from the people who completed
the survey. If you had to predict which activity this person selected,
would it be helpful to know the person’s gender? Explain your answer.
4. Is there evidence of an association between gender and favorite activity?
Explain why or why not.
Association and Cause and Effect
The following example will introduce the important idea that you should not infer a
cause-and-effect relationship from an association between two categorical variables.
Students were given the opportunity to prepare for a college placement test in
mathematics by taking a review course. Not all students took advantage of this
opportunity. The following results were obtained from a random sample of
students who took the placement test: (Slide 9)
Placed in Placed in
Math 200 Math 100
Took
Review
Course
Did not take
Review
Placed in
Math 50
Total
40
13
7
60
10
15
15
40
6
Algebra 1 Summer Institute 2014
Course
Total
50
28
22
100
Read through the example with participants.
Pose the following questions to the class. Let participants discuss their ideas.



Do you think there is an association between taking the review course and a
student’s placement in a math class?
If you knew that a student took a review course, would it make a difference in
what you predicted for which math course they were placed in?
Do you think taking a course caused a student to place higher in a math
placement?
Let participants work in groups of two to construct a row conditional relative frequency
table.
Placed in
Math 200
Placed in
Math 100
Placed in
Math 50
Took review
course
𝟒𝟎
≈ 𝟎. 𝟔𝟔𝟕
𝟔𝟎
𝟏𝟑
≈ 𝟎. 𝟐𝟏𝟕
𝟔𝟎
𝟕
≈ 𝟎. 𝟏𝟏𝟕
𝟔𝟎
Did not take
review course
𝟏𝟎
= 𝟎. 𝟐𝟓𝟎
𝟒𝟎
𝟏𝟓
= 𝟎. 𝟑𝟕𝟓
𝟒𝟎
𝟏𝟓
= 𝟎. 𝟑𝟕𝟓
𝟒𝟎
Total
𝟐𝟖
𝟓𝟎
= 𝟎. 𝟐𝟖𝟎
= 𝟎. 𝟓𝟎𝟎
𝟏𝟎𝟎
𝟏𝟎𝟎
𝟐𝟐
𝟏𝟎𝟎
= 𝟎. 𝟐𝟐𝟎
Total
𝟔𝟎
𝟔𝟎
= 𝟏. 𝟎𝟎𝟎
𝟒𝟎
𝟒𝟎
= 𝟏. 𝟎𝟎𝟎
𝟏𝟎𝟎
𝟏𝟎𝟎
= 𝟏. 𝟎𝟎𝟎
1. Based on the conditional relative frequencies, is there evidence of an
association between whether a student takes the review course and the
math course in which the student was placed? Explain your answer.
(Slide 11)
There is evidence of association as the conditional relative frequencies
are noticeably different for those students who took the course and those
students who did not take the course.
7
Algebra 1 Summer Institute 2014
2. Looking at the conditional relative frequencies, the proportion of
students who placed into Math 200 is much higher for those who took
the review course than for those who did not. One possible explanation
is that taking the review course caused improvement in placement test
scores. What is another possible explanation?
Another possible explanation is that students who took the review course
are more interested in mathematics or were already better prepared in
mathematics and, therefore, performed better on the mathematics
placement test.
3. Do you think that this is an example of a cause-and-effect relationship?
Be sure that they understand that even though there is an association, this does not
mean that there is a cause and effect relationship.
Now consider the following statistical study:
Fifty students were selected at random from students at a large middle school.
Each of these students was classified according to sugar consumption (high or
low) and exercise level (high or low). The resulting data are summarized in the
following frequency table. (Slide 12)
Sugar
Consumption
High
Low
Total
Exercise Level
High
Low
14
18
14
4
28
22
Total
32
18
50
1. Calculate the row conditional relative frequencies, and display them in a
row conditional relative frequency table.
High
Sugar
Consumption
Low
Exercise Level
High
Low
𝟏𝟒
𝟏𝟖
𝟑𝟐
𝟑𝟐
= 𝟎. 𝟒𝟑𝟕𝟓
= 𝟎. 𝟓𝟔𝟐𝟓
𝟏𝟒
≈ 𝟎. 𝟕𝟕𝟖
𝟏𝟖
𝟒
≈ 𝟎. 𝟐𝟐𝟐
𝟏𝟖
Total
𝟑𝟐
𝟑𝟐
= 𝟏. 𝟎𝟎𝟎
𝟏𝟖
𝟏𝟖
= 𝟏. 𝟎𝟎𝟎
8
Algebra 1 Summer Institute 2014
Total
𝟐𝟖
= 𝟎. 𝟓𝟔
𝟓𝟎
𝟐𝟐
= 𝟎. 𝟒𝟒
𝟓𝟎
𝟓𝟎
𝟓𝟎
= 𝟏. 𝟎𝟎𝟎
2. Is there evidence of an association between sugar consumption category
and exercise level? Support your answer using conditional relative
frequencies.
There is a noticeable difference in the conditional relative frequencies
based on whether a person selected had high or low sugar consumption.
The differences suggest an association between sugar consumption and
exercise level.
3. Do you think it is reasonable to conclude that high sugar consumption is
the cause of the observed differences in the conditional relative
frequencies? What other explanations could explain a difference in the
conditional relative frequencies? Explain your answer.
Participants are encouraged to think about their responses to this exercise
based on their understanding that the results should not be interpreted as
a cause-and-effect relationship. Other factors such as eating habits and
lifestyle could be mentioned by students.
4. It is possible that students in the above study who are more health conscious tend
to be in the low sugar consumption category and also tend to be in the high
exercise level category.
5. It is not possible to determine if the difference in the conditional relative
frequencies is due to a cause-and-effect relationship.
6. The data summarized in this study were collected in an observational study. In an
observational study, any observed differences in conditional relative frequencies
might be explained by some factor other than the variables examined in the study.
With an observational study, evidence of an association may exist, but it is not
possible to imply that there is a cause-and-effect relationship.
9
Download