ECO 138 – Section 1

advertisement
DID YOU SIGN UP FOR MY STAT LAB?
1.
2.
Yes
No
0%
1
0%
2
Slide
3- 1
ANNOUNCEMENTS

Homework #1 due Sunday at 10:00 pm

Quiz #1 in class August 28th

Part 1 of the Data Project due September 4th
DATA PROJECT

Objective: Ask a question and try to answer
it using statistics.
Step 1: DATA COLLECTION - Due
Wednesday September 4th in class.
 Step 2: DESCRIPTION OF DATA – Due
Monday September 16th in class
 Step 3: QUESTIONS – Due Monday October
28th in class
 Step 4: FINAL DATA PROJECT – Due by
Thursday December 5th 5PM

COLLECT DATA






Bureau of Labor Statistics (BLS):
http://bls.gov/
Energy Information Administration (EIA):
http://www.eia.gov/
Bureau of Economic Analysis (BEA):
http://www.bea.gov/
Environmental Protection Agency (EPA):
http://epa.gov/
U.S. Census Bureau: http://www.census.gov/
Google Data
http://www.google.com/publicdata/directory
REVIEW FROM LAST CLASS


A categorical (or qualitative) variable names categories
and answers questions about how cases fall into those
categories.
A quantitative variable is a measured variable (with
units) that answers questions about the quantity of
what is being measured.

Quantitative examples: income ($), height (inches), weight
(pounds)
Slide
3- 5
REVIEW FROM LAST CLASS


Ordinal variables there are no natural units for
the variable interest in teaching, but the order of
the number reveals information.
Identifier variables are categorical variables with
exactly one individual in each category.
Slide
3- 6
HOMEWORK PROBLEM



We want to study the law of demand and if it
applies to hot dogs.
Compile a list of 20 hotdogs, giving the brand,
price, size in ounces, type (beef, pork, turkey,
vegetarian), and overall taste rating (good, fair,
bad).
Implement the survey on Monday and
Wednesday at 5 different grocery stores and also
collect the daily sales.
WHAT TYPE OF VARIABLE IS BRAND?
1.
2.
3.
4.
Categorical
Quantitative
Ordinal
Identifier
0%
0%
0%
0%Slide
3- 8
1
2
3
4
WHAT TYPE OF VARIABLE IS PRICE?
1.
2.
3.
4.
Categorical
Quantitative
Ordinal
Identifier
0%
0%
0%
0%Slide
3- 9
1
2
3
4
WHAT TYPE OF VARIABLE IS OVERALL
TASTE RATING (GOOD, FAIR, BAD)?
1.
2.
3.
4.
Categorical
Quantitative
Ordinal
Identifier
0%
0%
0%
0%Slide
3- 10
1
2
3
4
WHAT TYPE OF VARIABLE IS DAILY SALES?
1.
2.
3.
4.
Categorical
Quantitative
Ordinal
Identifier
0%
1
0%
2
0%
3
0%
4
WHERE, WHEN, AND HOW
 When
and Where give us some nice
information about the context.
 Example: Values recorded at a
large public university may mean
something different than similar
values recorded at a small private
college.
Slide
2- 12
WHERE, WHEN, AND HOW
Class Grade of Econ 101 classes.
 Class 1 – 2.56
 Class 2 – 3.34

Where – Washington State University
 When – during the fall and spring semesters

WHERE, WHEN, AND HOW (CONT.)

How the data are collected can make the
difference between insight and nonsense.

Example: results from voluntary Internet surveys are
often useless

Example: Data collection of ‘Who will win Republican
Primary?’
Survey ISU students on campus
 Run a Facebook survey
 Rasmussen Reports national telephone survey

IDENTIFY THE WHO IN THE FOLLOWING
DATASET?



Are physically fit people less likely to die of
cancer?
Suppose an article in a sports medicine journal
reported results of a study that followed 22,563
men aged 30 to 87 for 5 years.
The physically fit men had a 57% lower risk of
death from cancer than the least fit group.
WHO ARE THEY STUDYING?
1.
2.
3.
4.
The cause of death for 22,563
men in
the study
25%
25%
25%
25%
The fitness level of the 22,563 men in the study
The age of each of the 22,563 men in the study
The 22,563 men in the study
1
2
3
4
ARE FIT PEOPLE LESS LIKELY TO DIE OF
CANCER? -------------WHO IS THE POPULATION OF INTEREST?
1.
2.
3.
4.
All people
All men who exercise
All men who die of cancer
All men
25%
1
25%
25%
2
3
25%
4
CHAPTER 3


Displaying and Describing
 Categorical Data
Two datasets
Students currently in my class
 Passengers on the Titanic.

METHODS OF DISPLAYING DATA
Frequency Table
 Relative Frequency table
 Bar Chart
 Relative Frequency bar chart
 Pie Chart
 Contingency table
 Contingency tables and Conditional Distributions
 Segmented Bar charts

Slide
3- 19
DATA ON STUDENTS
Gender
Year in
School
Major
My Class
Kim B.
Female
Sr.
Elem. Ed. ECO 138
Section 1
Stacie M.
Female
So.
Math
ECO 138
Section 1
Tom A.
Male
Gr.
Econ
ECO 435
Section 1
Tim B.
Male
Gr.
Econ
ECO 435
Section 1
Kelly Y.
Male
Gr.
Econ
ECO 435
Section 2
…
Slide
3- 20
FREQUENCY TABLES: MAKING PILES
We can “pile” the data by counting the number of
data values in each category of interest.
 We can organize these counts into a frequency
table, which records the totals and the category
names.

ECO 138
Male
22
Female
11
Total
33
FREQUENCY TABLES: MAKING PILES
(CONT.)

A relative frequency table is similar, but gives
the percentages (instead of counts) for each
category.
ECO 138
Male
22 / 33 * 100 =
66.67%
Female
11 / 33 * 100 =
33.33%
Total
33/33 * 100 =
100 %
BAR CHARTS



A bar chart displays the distribution of a categorical
variable, showing the counts for each category next to
each other for easy comparison.
A bar chart stays true
to the area principle.
Thus, a better display
for the ship data is:
Slide
3- 23
BAR CHARTS (CONT.)



A relative frequency bar chart displays the relative
proportion of counts for each category.
A relative frequency bar chart also stays true to the
area principle.
Replacing counts
with percentages
in the ship data:
Slide
3- 24
WHAT YEAR IN SCHOOL ARE YOU?
1.
2.
3.
4.
Freshman
Sophomore
Junior
Senior
61%
17%
17%
6%
Slide
3- 25
1
2
3
4
PIE CHARTS



When you are interested in parts of the whole, a
pie chart might be your display of choice.
Pie charts show the whole
group of cases as a circle.
They slice the circle into
pieces whose size is
proportional to the
fraction of the whole
in each category.
Slide
3- 26
METHODS OF DISPLAYING DATA
Frequency Table (How much?)
 Relative Frequency table (What percentage?)

Bar Chart (How much?)
 Relative Frequency bar chart (What percentage?)


Pie Chart (What percentage? Or How much?)
Contingency table and Marginal Distributions
 Contingency tables and Conditional Distributions

Slide
3- 27
CONTINGENCY TABLES


A contingency table allows us to look at two
categorical variables together.
It shows how individuals are distributed along each
variable, contingent on the value of the other
variable.
 Example: we can examine the class of ticket and
whether a person survived the Titanic:
Slide
3- 28
CONTINGENCY TABLES (CONT.)

Each cell of the table gives the count for a
combination of values of the two values.
 For example, the second cell in the crew column
tells us that 673 crew members died when the
Titanic sunk.
Slide
3- 29
CONTINGENCY TABLES
The two variables in this contingency table are
gender and class/section number.
ECO 138 –
Section 1
ECO 435Section 1
Total
Male
22
4
26
Female
11
3
14
Total
33
7
40
CONTINGENCY TABLES (CONT.)


The margins of the table, both on the right and on the
bottom, give totals and the frequency distributions for each
of the variables.
Each frequency distribution is called a marginal distribution
of its respective variable.
Slide
3- 31
MARGINAL DISTRIBUTIONS
The two variables in this contingency table are
gender and class/section number.
ECO 138 –
Section 1
ECO 435Section 1
Total
Male
22/40*100=
55%
4/40*100=
10%
26/40*100=65
%
Female
11/40*100=
27.5%
3/40*100=
7.5%
14/40*100=35
%
Total
33/40*100=
82.5
7/40*100=
17.5%
40/40*100=
100%
CONDITIONAL DISTRIBUTIONS

A conditional distribution shows the distribution
of one variable for just the individuals who
satisfy some condition on another variable.

The following is the conditional distribution of ticket
Class, conditional on having survived:
CONDITIONAL DISTRIBUTIONS (CONT.)

The following is the conditional distribution of ticket
Class, conditional on having perished:
Slide
3- 34
CONDITIONAL DISTRIBUTIONS –
CONDITIONED UPON GENDER
The two variables in this contingency table are
gender and class/section number.
Male
ECO 138 –
Section 1
ECO 435Section 1
Total
22/26*100=
84.6%
4/26*100=
15.4%
26/26*100=
100%
CONDITIONAL DISTRIBUTIONS –
CONDITIONED UPON GENDER
The two variables in this contingency table are
gender and class/section number.
Female
ECO 138 –
Section 1
ECO 435Section 1
Total
11/14*100=
78.6%
3/14*100=
21.4%
14/14*100=
100%
CONDITIONAL DISTRIBUTIONS –
CONDITIONED UPON CLASS
The two variables in this contingency table are
gender and class/section number.
ECO 138 –
Section 1
Male
22/33*100=
66.7%
Female
11/33*100=
33.3%
Total
33/33*100=
100%
CONDITIONAL DISTRIBUTIONS –
CONDITIONED UPON CLASS
The two variables in this contingency table are
gender and class/section number.
ECO 435Section 1
Male
4/7*100=
57.1%
Female
3/7*100=
42.9%
Total
7/7*100=
100%
WHAT CAN GO WRONG? (CONT.)

Don’t confuse similar-sounding percentages—pay
particular attention to the wording of the context.

The percentage of students that are female & in ECO
138 Section 1


The percentage of females that are in ECO 138
Section 1


(cell distribution)
(conditioned upon females)
The percentage of ECO 138 Section 1 students that
are females

(conditioned upon ECO 138 Section 1)
CONDITIONAL DISTRIBUTIONS (CONT.)
 The
conditional distributions tell us that there is
a difference in class for those who survived and
those who perished.
 This
is better
shown with
pie charts of
the two
distributions:
Slide
3- 40
IF YOU ARE MALE, WHAT YEAR IN SCHOOL
ARE YOU?
1.
2.
3.
4.
Fr.
So.
Jr.
Sr.
Slide
3- 41
IF YOU ARE FEMALE, WHAT YEAR IN
SCHOOL ARE YOU?
1.
2.
3.
4.
Fr.
So.
Jr.
Sr.
Slide
3- 42
CONDITIONAL DISTRIBUTIONS (CONT.)



We see that the distribution of Class/Section for the
male is different from that of the female.
This leads us to believe that Class/Section and Gender
are associated, that they are not independent.
The variables would be considered independent when
the distribution of one variable in a contingency table is
the same for all categories of the other variable.
SEGMENTED BAR CHARTS
A segmented bar
chart displays the
same information as a
pie chart, but in the
form of bars instead of
circles.
 Here is the segmented
bar chart for ticket
Class by Survival
status:

Slide
3- 44
Slide
3- 45
Slide
3- 46
Slide
3- 47
Slide
3- 48
Slide
3- 49
Slide
3- 50
Slide
3- 51
WHICH OF THE COMPARISONS DO YOU
CONSIDER MOST VALID?
1.
2.
3.
Overall average, b/c it does not differentiate
93%
between the four programs.
Individual program comparisons, b/c they take
into account the different number of applicants
and admission rates for each of the four
programs.
Overall average, b/c it takes into account the
differences in number of applicants and
admission rates for each of the four programs.
7%
1
0%
2
3
DID YOU SIGN UP FOR MYLAB AND WHAT IS YOUR
GENDER?
1.
2.
3.
4.
Female – Yes
Female – No
Male – Yes
Male – No
31%
31%
23%
15%
Slide
3- 53
1
2
3
4
NEXT TIME…

Chapter 4 – Displaying Quantitative Data
Slide
3- 54
Download