Uploaded by Tiany Hoang

Lecture 01

advertisement
ECMT 1010
Introduction to economic statistics
Semester 1 2022
Tim Fisher
What is statistics?
. . . a collection of methods “used to analyze the
patterns in data, with a view to distinguishing random
variation from underlying causes”
[John Whitfield · Replication Crisis: Shoddy Papers · LRB 7 October 2021]
2
Why study statistics?
Economics
The truth is out there . . .
“Lies, damned lies and statistics”
3
To answer questions, statistics uses data . . .
. . . measurements based on individual units or cases
• a dataset is a collection of variables measured on
individual cases or observations
• a variable contains specific information on each case
• data are often organized into a spreadsheet (matrix)
4
EXAMPLES http://www.lock5stat.com/datapage.html
• countries dataset
AllCountries.xlsx
• student survey dataset
StudentSurvey.xlsx
5
. . . types of variable
• a categorical variable: defines groups
•
e.g., gender, award, year
•
used to calculate proportions
• a quantitative variable: numerical measure
•
e.g., SAT, height, pulse, year
•
used to calculate averages
6
. . . relationships between variables
• we often use one variable, the explanatory variable, to
understand or predict the values of another, the response
variable
• for example:
• does meditation help reduce stress?
• does sugar consumption increase hyperactivity?
• does the interest rate affect the exchange rate?
7
TWO fundamental concepts
a population includes all the
individuals or cases of interest
a sample consists of the individuals
or cases selected into a dataset
• the process of using a sample to gain information about a
population is known as inference, the main topic of ECMT1010
8
population
sampling
sample
inference
9
EXAMPLE – student life
Suppose we want to investigate aspects of university
student life in Australia.
• What is the population?
• What could be a sample?
• Can we use the sample data to make inferences
about the population?
10
Most famous stuff-up in stats history
11
The newspaper was published before the votes were all
counted in the 1948 U.S. presidential election, based
on the results of a large telephone poll.
The poll showed that Thomas Dewey would easily
defeat Harry Truman.
• the problem is: Truman won the election.
• what went wrong?
12
sampling bias occurs when the method
used to select the sample causes it to differ
from the population in a relevant way
• if sampling bias exists, we cannot trust any
generalization from the sample to the population
• in other words, we will make incorrect inferences
13
population
sample
sample
14
. . . so how can we avoid sampling bias?
take a RANDOM sample
• imagine putting the names of the all the election-day
voters into a hat and drawing 2,000 names at
random
• (we can use technology to do this)
15
random sampling
In a random sample by Gallup (a polling firm) of
2,847 voters before the 2008 U.S. election, 52% of
the sample supported Obama.
• in the election 53% voted for Obama
• the election is, by definition, the population of voters
• in this case, the inference was accurate
16
random versus non-random sampling
•
random samples (usually) provide accurate
information on the population
•
non-random samples (almost always) suffer from
sampling bias
•
•
any inference about the population will be wrong
non-random samples cannot be trusted to generalize about
populations
17
Reality check . . .
• a random sample is ideal, but may not be feasible
• you may have to alter the ‘target’ population to get
something feasible for inference
EXAMPLE: suppose you are interested in all university
student opinions, but only have data from one class
• inferences are limited to the population sampled
18
EXAMPLE – sampling bias
Suppose you want to estimate the average number of
hours students spend studying each week.
Which is the best method of sampling?
1.
Go to the library and ask all the students how much they study.
2.
Email all students and ask how much they study and use the responses.
3.
Hand out a questionnaire in class and make every student respond.
4.
Stand outside the student pub and ask the people going in.
19
Bad methods of sampling – 1
• sampling based on something obviously related to
the variable(s) of interest
• e.g., sampling students in the library (or pub) about study
habits
EXAMPLE: should there be more bike paths in Sydney?
• surveys based on either Sydney Cycling Club or NRMA
members wouldn’t be reliable indicators of overall views
20
Bad methods of sampling – 2
• allowing the sample to be made up of whoever
chooses to participate (volunteer bias)
• e.g., emailing all students and then making inferences
based on the responders
• probably not representative of the entire population
EXAMPLE: online reviews of restaurants, products, etc.
• people with an “OK” experience probably won’t bother
21
sampling bias?
population
other sources
of bias?
sample
22
. . . other sources of bias
• even a random sample may give a biased sample,
especially when data are collected on (or by) humans
• possible sources of bias
• framing (wording or context) of survey questions
• inaccurate or lazy responses
• sources of research funding
. . . many other possibilities
23
1. wording
A random sample was asked: Should there be a tax cut, or
should money be used to fund new government programs?
60% choose tax cut
A different random sample was asked: Should there be a tax cut,
or should money be spent on programs for education, the
environment, health care, crime-fighting, and military defense?
22% choose tax cut
24
2. context
Survey participants were given a list of worries about parenting
and various reasons not to have kids. Then they were asked: If
you had it to do it over again, would you have children?
30% said Yes
A random sample of all parents was asked the same question
without any leading material.
91% said Yes
25
3. inaccurate responses
In a study of young adults, 93% said they were in the
top half of the population in terms of driving skill.
Svenson, O. (February 1981). “Are we all less risky and more skillful than our fellow drivers?” Acta
Psychologica 47 (2): 143–148.
In random sample of U.S. university students, 23%
reported using illicit drugs.
Substance Abuse and Mental Health Services Administration (2010). “Results from the 2009 National Survey
on Drug Use and Health: Volume 1.”
26
how to enhance your statistical literacy . . .
. . . think carefully about how the sample has been collected
. . . recognize that not all forms of data collection will lead to
valid inference
always QUESTION sample SELECTION
27
The 7 things you should be able to do by the end of the semester . . .
1. understand key concepts of inference: estimation with intervals and
testing for significance
2. analyze data using modern resampling or traditional methods
3. use computer software in statistical procedures
4. understand the importance of data collection, the limitations in collection
methods, and be aware of how it affects inference
5. know which statistical methods to use in which situations
6. interpret statistical results effectively and in context
7. be aware of the power of data analysis
28
Download