ECMT 1010 Introduction to economic statistics Semester 1, 2024 1 statistics uses data to answer questions . . . . . . data are a collection of variables measured on individual cases or observations • each variable contains specific information on each case • data are often organized into a spreadsheet (matrix) • questions are usually framed in terms of a hypothesis 2 EXAMPLES http://www.lock5stat.com/datapage.html • countries dataset AllCountries.xlsx • student survey dataset StudentSurvey.xls 3 . . . types of variable • a categorical variable: defines groups • e.g., gender, award, year • a quantitative variable: numerical measure • e.g., SAT, height, pulse, year . . . relationships between variables • we often use one variable, the explanatory variable, to understand or predict the values of another, the response variable • for example: • does meditation help reduce stress? • does sugar consumption increase hyperactivity? • does the interest rate affect the exchange rate? THE key concepts A population includes all the individuals or objects of interest • a sample consists of the cases selected into a dataset; a sample is a subset of the population • the process of using a sample to gain information about the population is called inference population sampling sample inference 7 Most famous stuff-up in stats history 8 The newspaper was published before the end of the 1948 U.S. presidential election, based on the results of a large telephone poll. The poll showed that Thomas Dewey would easily defeat Harry Truman. • the problem is: Truman won the election. • what went wrong? 9 sampling bias occurs when the method used to select the sample causes it to differ from the population in a relevant way • if sampling bias exists, we cannot trust any generalization from the sample to the population • that is, we will make incorrect inferences population sample sample 11 • how can we avoid sampling bias? take a RANDOM sample • imagine putting the names of the entire population into a hat, and drawing out 2,000 names at random • (we can use technology to do this) 12 random sampling Before the 2008 U.S. election, Gallup (a polling firm) took a random sample of 2,847 Americans. • 52% of those sampled supported Obama • 53% voted for Obama in the election • in this case, the inference was accurate 13 random versus non-random sampling • random samples (usually) provide accurate information on the population • non-random samples (usually) suffer from sampling bias; any implied population information will be wrong • non-random samples cannot be trusted to make generalizations about the population 14 Reality check . . . • a random sample is ideal, but may not be feasible • you may have to alter the ‘target population’ to get something feasible to sample EXAMPLE: suppose you are interested in all student opinions, but you only have data from one class • inferences are limited to the population sampled 15 EXAMPLE – sampling bias Suppose you want to estimate the average number of hours students spend studying each week Which is the best method of sampling? 1. Go to the library and ask all the students how much they study. 2. Email all students asking how much they study and use the responses. 3. Hand out a questionnaire in class and make every student respond. 4. Stand outside Manning Bar and ask the people going in. 16 Bad methods of sampling – 1 • sampling based on something obviously related to the variable(s) of interest • e.g., sampling students in the library (or pub) about study habits EXAMPLE: many online surveys • sydneycyclingclub.org.au or www.mynrma.com.au 17 Bad methods of sampling – 2 • allowing the sample to be made up of whoever chooses to participate (volunteer bias) • e.g., email all students and base your analysis on the replies • responders may not be representative of the population EXAMPLE: sites posting reviews, e.g., Google 18 sampling bias? population other sources of bias? sample 19 association vs causation two variables are associated if their values are related to one another two variables are causally associated if the value of one variable influences the value of the other 70 Mexico Sri Lanka China Egypt Morocco Vietnam Iraq 60 Pakistan Yemen 50 Haiti Uganda r = 0.74 South Africa Angola 0 200 another example of association Russia Cambodia Madagascar 40 Life Expectancy 80 Japan Australia France Canada United KingdomUnited States 400 600 800 1000 • TVs do not cause people to live longer • again, not a causal association TVs per 1000 People → association does not imply causal association 21 What’s going on with TVs and life expectancy? wealth number of TVs per capita life expectancy → wealth is a confounding variable 22 a third variable associated with both the explanatory variable and the response variable is called a confounding variable • confounding variables are a major problem when you are trying to establish causal association • causal association cannot be determined when confounding variables are present EXAMPLE 1 effect of exercise on dementia in the elderly lifestyle choices exercise dementia how can we eliminate confounding variables? by RANDOMLY assigning the values of the explanatory variable . . . a process referred to as randomization 25 EXAMPLE: randomly assign elderly people to either an exercise program or not lifestyle choices random assignment exercise dementia randomized experiment or trial • different levels of the explanatory variable are called treatments • we randomly divide subjects into groups, and assign a different treatment to each group • because the groups are chosen randomly, they should look (roughly) similar in every aspect except the treatment • emergent group differences may be attributed to the different treatments if a randomized experiment yields a strong association, we may establish causation from the explanatory to the response variable randomized experiments are very powerful because they allow us to infer causality 28 • an experimental study is a setting where the researcher controls the explanatory variable along with random assignment • e.g., randomly assign exercise/non-exercise groups of elderly • e.g., randomly assign textbook/non-textbook groups of students • an observational study uses information gathered from observed behaviour as it naturally exists • e.g., look at exercise habits and dementia among the elderly • e.g., look at whether a student buys a textbook and their mark 29 EXAMPLE: Exercise and the brain Reynolds, “Phys Ed: Your Brain on Exercise", NY Times, July 7, 2010. An experiment to determine whether exercise changes the brain Step 1: Assign rats randomly to one of two groups Treatment group Control group Step 2: Measure brain activity and IQ in the two groups • can this experiment a causal link between exercise and the brain? 30 “energetic” gene random assignment exercise IQ • an observational study has a confounding variable • an experiment eliminates it, so causality may be established 31 EXAMPLE: Knee Surgery for Arthritis (1) Researchers conducted a study on the effectiveness of a knee surgery to cure pain from arthritis • whether people got knee surgery was randomly determined • the surgery group reported less pain than the control group Is this evidence that the surgery causes a decrease in pain? • not necessarily . . . due to placebo effect • patients believe they are better because they have been treated 32 EXAMPLE: Knee Surgery for Arthritis (2) “The Placebo Prescription,” NY Times Magazine, 1/9/00 In another study, the control group received fake knee surgery (patients were anaesthetized and cut open, but no surgery was performed) • both groups are subject to a placebo effect, so it has been eliminated • the reported pain reduction was the same for both groups! Conclusion: the knee surgery is not effective 33 randomness in data collection is the sample randomly selected? Yes No possible to generalize to the population cannot generalize to the population is the explanatory variable randomly assigned? Yes possible to make conclusions about causality No cannot make conclusions about causality 34 does this mean observational studies are useless? • a random sample is not always achievable (e.g., in economics) o if the focus is estimating a statistic about a population, you need a random sample but not a randomized experiment • e.g., election polling, GDP, unemployment, etc. o if the focus is establishing causality, you need a randomized experiment • e.g., drug testing 35