Uploaded by mjom0100

Lecture 01

advertisement
ECMT 1010
Introduction to economic statistics
Semester 1, 2024
1
statistics uses data to answer questions . . .
. . . data are a collection of variables measured on
individual cases or observations
• each variable contains specific information on each case
• data are often organized into a spreadsheet (matrix)
• questions are usually framed in terms of a hypothesis
2
EXAMPLES http://www.lock5stat.com/datapage.html
• countries dataset
AllCountries.xlsx
• student survey dataset
StudentSurvey.xls
3
. . . types of variable
• a categorical variable: defines groups
•
e.g., gender, award, year
• a quantitative variable: numerical measure
•
e.g., SAT, height, pulse, year
. . . relationships between variables
• we often use one variable, the explanatory variable,
to understand or predict the values of another, the
response variable
• for example:
• does meditation help reduce stress?
• does sugar consumption increase hyperactivity?
• does the interest rate affect the exchange rate?
THE key concepts
A population includes all the individuals
or objects of interest
• a sample consists of the cases selected into a dataset;
a sample is a subset of the population
• the process of using a sample to gain information
about the population is called inference
population
sampling
sample
inference
7
Most famous stuff-up in stats history
8
The newspaper was published before the end of the
1948 U.S. presidential election, based on the results of
a large telephone poll.
The poll showed that Thomas Dewey would easily
defeat Harry Truman.
• the problem is: Truman won the election.
• what went wrong?
9
sampling bias occurs when the method used to
select the sample causes it to differ from the
population in a relevant way
• if sampling bias exists, we cannot trust any
generalization from the sample to the population
• that is, we will make incorrect inferences
population
sample
sample
11
• how can we avoid sampling bias?
take a RANDOM sample
• imagine putting the names of the entire
population into a hat, and drawing out 2,000
names at random
• (we can use technology to do this)
12
random sampling
Before the 2008 U.S. election, Gallup (a polling firm)
took a random sample of 2,847 Americans.
• 52% of those sampled supported Obama
• 53% voted for Obama in the election
• in this case, the inference was accurate
13
random versus non-random sampling
•
random samples (usually) provide accurate
information on the population
•
non-random samples (usually) suffer from sampling
bias; any implied population information will be wrong
•
non-random samples cannot be trusted to make
generalizations about the population
14
Reality check . . .
• a random sample is ideal, but may not be feasible
• you may have to alter the ‘target population’ to get
something feasible to sample
EXAMPLE: suppose you are interested in all student
opinions, but you only have data from one class
• inferences are limited to the population sampled
15
EXAMPLE – sampling bias
Suppose you want to estimate the average number of
hours students spend studying each week
Which is the best method of sampling?
1.
Go to the library and ask all the students how much they study.
2.
Email all students asking how much they study and use the responses.
3.
Hand out a questionnaire in class and make every student respond.
4.
Stand outside Manning Bar and ask the people going in.
16
Bad methods of sampling – 1
• sampling based on something obviously related to
the variable(s) of interest
• e.g., sampling students in the library (or pub) about study
habits
EXAMPLE: many online surveys
• sydneycyclingclub.org.au or www.mynrma.com.au
17
Bad methods of sampling – 2
• allowing the sample to be made up of whoever
chooses to participate (volunteer bias)
• e.g., email all students and base your analysis on the
replies
• responders may not be representative of the population
EXAMPLE: sites posting reviews, e.g., Google
18
sampling bias?
population
other sources
of bias?
sample
19
association vs causation
two variables are associated if their
values are related to one another
two variables are causally associated
if the value of one variable influences
the value of the other
70
Mexico
Sri Lanka
China
Egypt
Morocco
Vietnam
Iraq
60
Pakistan
Yemen
50
Haiti
Uganda
r = 0.74
South Africa
Angola
0
200
another example of
association
Russia
Cambodia
Madagascar
40
Life Expectancy
80
Japan
Australia
France Canada
United KingdomUnited States
400
600
800
1000
• TVs do not cause people
to live longer
• again, not a causal
association
TVs per 1000 People
→ association does not imply causal association
21
What’s going on with TVs and life expectancy?
wealth
number of TVs
per capita
life
expectancy
→ wealth is a confounding variable
22
a third variable associated with both the
explanatory variable and the response
variable is called a confounding variable
• confounding variables are a major problem when
you are trying to establish causal association
• causal association cannot be determined when
confounding variables are present
EXAMPLE 1 effect of exercise on dementia in the elderly
lifestyle
choices
exercise
dementia
how can we eliminate confounding variables?
by RANDOMLY assigning the
values of the explanatory variable
. . . a process referred to as randomization
25
EXAMPLE: randomly assign elderly people to either an
exercise program or not
lifestyle
choices
random assignment
exercise
dementia
randomized experiment or trial
• different levels of the explanatory variable are called
treatments
• we randomly divide subjects into groups, and assign a
different treatment to each group
• because the groups are chosen randomly, they should look
(roughly) similar in every aspect except the treatment
• emergent group differences may be attributed to the different
treatments
if a randomized experiment yields a strong
association, we may establish causation from the
explanatory to the response variable
randomized experiments are very powerful
because they allow us to infer causality
28
• an experimental study is a setting where the researcher
controls the explanatory variable along with random
assignment
• e.g., randomly assign exercise/non-exercise groups of elderly
• e.g., randomly assign textbook/non-textbook groups of students
• an observational study uses information gathered from
observed behaviour as it naturally exists
• e.g., look at exercise habits and dementia among the elderly
• e.g., look at whether a student buys a textbook and their mark
29
EXAMPLE: Exercise and the brain
Reynolds, “Phys Ed: Your Brain on Exercise", NY Times, July 7, 2010.
An experiment to determine whether exercise changes the brain
Step 1: Assign rats randomly to one of two groups
Treatment group
Control group
Step 2: Measure brain activity and IQ in the two groups
• can this experiment a causal link between exercise and the brain?
30
“energetic”
gene
random assignment
exercise
IQ
• an observational study has a confounding variable
• an experiment eliminates it, so causality may be established
31
EXAMPLE: Knee Surgery for Arthritis (1)
Researchers conducted a study on the effectiveness of a knee
surgery to cure pain from arthritis
• whether people got knee surgery was randomly determined
• the surgery group reported less pain than the control group
Is this evidence that the surgery causes a decrease in pain?
• not necessarily . . . due to placebo effect
• patients believe they are better because they have been treated
32
EXAMPLE: Knee Surgery for Arthritis (2)
“The Placebo Prescription,” NY Times Magazine, 1/9/00
In another study, the control group received fake knee surgery
(patients were anaesthetized and cut open, but no surgery was
performed)
• both groups are subject to a placebo effect, so it has been eliminated
• the reported pain reduction was the same for both groups!
Conclusion: the knee surgery is not effective
33
randomness in data collection
is the sample
randomly selected?
Yes
No
possible to
generalize to
the population
cannot
generalize to
the population
is the explanatory variable
randomly assigned?
Yes
possible to make
conclusions
about causality
No
cannot make
conclusions
about causality
34
does this mean observational studies are useless?
• a random sample is not always achievable (e.g., in economics)
o if the focus is estimating a statistic about a population, you
need a random sample but not a randomized experiment
• e.g., election polling, GDP, unemployment, etc.
o if the focus is establishing causality, you need a randomized
experiment
• e.g., drug testing
35
Download