AP Statistics: Chapter 3 Categorical Data MAKE A PICTURE! First

advertisement
Categorical Data
AP Statistics: Chapter 3
MAKE A PICTURE!
First, create a frequency table
Example: number of students at CB South in each grade:
Grade
10
11
12
TOTAL
534
552
515
Proportion = decimal: .30, .05
Percent = %: 30%, 5%
Frequency = # of things (count)
Relative frequency = % of things
Distribution (of a variable)- shows the values of the variable ad how often the sample takes each value
Examples: bar chart, pie chart, histogram, stemplot, etc.
Categorical Distributions:
1. Bar Chart
Percent of Students Taking AP
Exam
Number of Students Taking AP
Exam
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
80
#
40
20
9th
10th
11th
12th
grade
2.
Notice the spaces
In between bars
60
Frequency
%
Percent
Relative frequency
0
9th
10th
grade
Pie Chart
Percent of Students Taking AP
Exam by Grade Level
9th
6%
12th
43%
10th
19%
11th
32%
Be sure to use labels and
percents!
11th
12th
3.
Contingency tables (aka 2-Way tables)
Frosh
Soph
Male
Female
Total
gender
Junior
grade
Senior
Total
cells
margins
Identify:
 Row variable  gender
 Column variable  grade
 Values of the variable  the different rows/columns
 Total (n)  bottom right of chart
 # of Cells  8 (don’t count totals)
 Totals  margins
Example: Hospitals
Hospital A Hospital B
Died
Survived
63
16
79
2037
784
2821
2100
800
 What percent of people died?
79
P( D) 
 .0272  2.72%
2900
Notation:
Probability: P(event)
2900
Given/Of: P( A B)
And:
(overlap)
Probability of A given B





Of those people that went to Hospital A, what percent died?
63
P( D A) 
 .03  3%
2100
Given that someone went to Hospital B, what is the chance that they died?
16
P( D B) 
 .02  2%
800
Of those people who died, what percent went to Hospital A?
63
P( A D) 
 .7975  79.75%
79
What percent of people died and went to Hospital B?
16
P( D  B) 
 .005  .5%
2900
What percent of people survived or went to Hospital A?
2837
P( S  B) 
 .9783  97.83%
2900
Or:
2 types of Distributions for Categorical Variables
1) MARGINAL DISTRIBUTIONS
 How to make: Convert totals into percentages

Example: Hair color vs. Gender
Brown
Blonde
Black
Red
Total
MALE
26
24
10
3
63
FEMALE
20
35
12
6
73
TOTALs
46
59
22
9
136
-
Find the marginal distribution for the HAIR COLOR variable
46
 33.82%
Brown:
136
59
 43.38%
Blonde:
136
22
 16.18%
Black:
136
9
 6.62%
Red:
136
-
Find the marginal distribution for the GENDER variable
63
 46.32%
Male:
136
73
 53.68%
Female:
136

MAKE A PICTURE! BAR CHART
50
45
40
35
30
% 25
20
15
10
5
0
margins
56
54
52
50
%
48
46
44
42
Brown
Blonde
Black
Hair Color
Red
Male
Female
Gender
2) CONDITIONAL DISTRIBUTIONS
 Look at … one variable




Then look at … each value of the variable individually
Break down … each value into its pieces
ALWAYS … in %
Example: Hair Color vs. Gender
Brown
Blonde
Black
Red
Total
MALE
26
24
10
3
63
FEMALE
20
35
12
6
73
TOTALs
46
59
22
9
136
- Find the conditional Distribution for the HAIR COLOR variable
Brown:
Blonde:
Black:
26
24
10
 56.52%
 40.68%
 45.45%
46
59
22
20
35
12
 43.48%
 59.32%
 54.55%
46
59
22
Red:
3
 33.33%
9
6
 66.67%
9
- Find the conditional Distribution for the GENDER variable
Male:
Female:
26
20
 41.27%
 27.40%
63
73
24
35
 38.10%
 47.95%
63
73
10
12
 15.87%
 16.44%
63
73
3
6
 4.76%
 8.22%
63
73

Represented visually: SEGMENTED (or STACKED) BAR GRAPH
o Each bar = 100%
o Values of variable on the x-axis
o Bars are segmented into parts of each value
100%
100%
80%
80%
60%
60%
Red
40%
Black
%
Female
40%
Male
20%
0%
%
Blonde
20%
Brown
0%
Brown Blonde Black
Hair Color
Red
Male
Female
Gender
Independence: When one variable does not affect the other variable
How do we tell independence? Independence exists when the conditional distributions looks the
same throughout all values of the variable (when the sections look approximately the same). There is
generally less than a 5 % difference between percentages. When categorical variables are dependent,
they are said to be associated.
Independent:
Dependent:
AP Stat- worksheet 3A- Categorical Variables practice
In a survey of adult Americans, people were asked to indicate their age and to categorize their political
preference (liberal, moderate, conservative). The results are as follows:
Liberal Moderate Conservative Total
1.
2.
3.
4.
5.
6.
under 30
83
140
73
296
30 - 50
119
280
161
560
over 50
88
284
214
586
total
290
704
448
1442
What are the row and column variables?
What percent of Liberals are under 30?
Of those over 50, what percent are Liberals?
Of those that are moderates, what percent are 30-50?
What percent of respondents are moderate and under 30?
Calculate the marginal distribution for the AGE variable. Write these down. Then make a bar
graph of the marginal distribution for age.
7. Calculate the marginal distribution for the PREFERENCE variable. Write these down. Then
make a bar graph of this marginal distribution.
8. Calculate the conditional distribution of the AGE variable. Write these down. Then make a
segmented bar graph of this marginal distribution.
9. Calculate the conditional distribution of the PREFERENCE variable. Write these down. Then
make a segmented bar graph of this marginal distribution.
10. Are the two variables independent?
AP Stat- worksheet 3B- Categorical Variable practice
A 4-year study reported in The New York Times, on men more than 70 years old analyzed blood
cholesterol and noted how many men with different cholesterol levels suffered nonfatal or fatal heart
attacks.
Nonfatal heart
attacks
Fatal heart
attacks
Low
cholesterol
Medium
cholesterol
High
cholesterol
29
17
18
19
20
9
a. Calculate the marginal distribution for cholesterol level and make a bar graph.
b. Calculate the marginal distribution for severity of heart attack and make a bar graph.
c. Calculate three conditional distributions for the three levels of cholesterol and make a stacked bar
graph.
d. Calculate the conditional distributions for the type of heart attack and make a stacked bar graph.
e. Are the two variables independent?
Download