** STA 1020 - Part 1 (24/Sep/13) ** MATERIAL FOR EXAM #1 Contents Exam 1 of 3: Producing Data STA 1020 Quizzes every chapter and then First Partial Exam Fall 2013 Section 09 MWF 10:40-11:35 0035 State Chapter 1 - Where Do Data Come From? Chapter 2 - Samples, Good and Bad Instructor: Dr. J.L. Menaldi Chapter 3 - What Do Samples Tell Us? Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm Chapter 4 - Sample Surveys in the Real World – mostly skipped! Chapter 5 - Experiments, Good and Bad Chapter 6 - Experiments in the Real World Chapter 7 - Data Ethics – skipped! “Statistics” is the Science of collecting, describing and interpreting data... Chapter 8 - Measuring It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) STA 1020 Ch01 - Where Do Data Come From? 1 / 100 STA 1020 STA 1020 Ch01 - Where Do Data Come From? STATISTICS is the Science of collecting (organizing), describing (displaying, summarizing) and interpreting (understanding, comparing) data (number in context) Using ‘data’ to draw a conclusion about something unknown Decision making in the presence of uncertainty (or partial knowledge) Pieces of information or Numbers are ‘data’ only if the information has a meaning attached! Data comes from Observational Studies and from Experiments NOW: How data is/are obtained? ‘Individuals’ (are the objects described by a set of data), and ‘variables’ (are the characteristics of an individual) Again, Statistics is a collection of procedures and principles for gathering data and analyzing information to help people make decisions when faced with uncertainty. Ch01 - Where Do Data Come From? JLM (WSU) Taking about data Part 1: How to tell if ‘data’ are well made? Chapter 1 JLM (WSU) Chapter 9 - Do the Number Make Sense? – skipped! What was wrong in the Literary Digest poll? Founded in 1890, the magazine correctly predicted the winners in the presidential elections of 1916, 1920, 1924, 1928, and 1932. In the 1936 presidential contest between A. Landon and F.D. Roosevelt, the magazine sent out 10 million ballots and received about 1.3 million ballots for Landon and 0.9 million ballots for Roosevelt, so it appeared that Landon would get 57% of the votes. The size of the poll was extremely large when compared with the size of other typical pools. ....................................................................... In that same 1936 presidential election, George Gallup used a much smaller poll of 50,000 people, and he correctly predicted that Roosevelt would win. ....................................................................... How could it happen that the larger Literary Digest poll could be wrong by such a large margin? What went wrong? ....................................................................... Key words: Sample method, Great Depression, disproportionately wealthy people (car owner, magazine subscription, people with phones) voluntary response. JLM (WSU) 3 / 100 Example 1: Who recycles? STA 1020 Ch01 - Where Do Data Come From? Researchers spend lots of time and money weighting (in pound) the stuff (the curbside recycling basket each weak) of each residence in two neighborhoods in a California city (referred to as Upper Crust and Lower Mid). The Upper Crust households contributed more pounds per week on the average than did the folk in Lower Mid. Can we say that the rich are more serious about recycling? No. Someone notice that Upper Crust recycling baskets contained lots of heavy glass wine bottles. In Lower Mid, they put out lots of light plastic soda bottles and light metal beer/soda cans. Weight tells us little about commitment to recycling. How to rectify this? 2 / 100 Roosevelt-Landon Election 4 / 100 Other Examples Read Example 2: What’s your race? Ex 3: Do power lines causes leukemia in children? Context: Electric currents generate magnetic fields. Really strong magnetic fields can disturb living cells in laboratory studies. What about the weaker magnetic fields we experience if we live near power lines? Some data suggested that more children in these locations might develop leukemia (a cancer of the blood cells) Result: No evidence (of more than a chance connection between magnetic fields and childhood leukemia) No risk? (it says that a very careful study could not find any risk that stands out the play of chance that distributes leukemia cases across some landscape, at least as far as we know!) Whenever risk statistics are reported, there is a risk that they are misreported. Journalists often present risk data in a way that produces the best story rather than in a way that provides the best information. Very commonly, news reports either don’t contain or don’t emphasize the information you need to understand risk. JLM (WSU) STA 1020 5 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 1 / 17 6 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch01 - Where Do Data Come From? Observational Study / Experiment Ch01 - Where Do Data Come From? Data Context of data (representing what, measuring what, which units, . . . ) Source of data (how and who got them, particular interest, . . . ) Sampling method (random, voluntary or self-selected, . . . ) Conclusion (statistical significance vs. practical significance, . . . ) Observational Study Observes individuals and measures variables of interest but does not attempt to influence the responses Describes some group or situation Means “no cause-and-effect” conclusion Sample Surveys are a type of observational study Experiment Deliberately imposes some treatment on individuals in order to observe their responses Studies whether the treatment causes change in the response A “cause-and-effect” conclusion is desired STA 1020 Ch01 - Where Do Data Come From? Statistics is about variation, ‘data’ vary because we don’t see everything and because even what we do see and measure, we measure imperfectly. So, in a very basic way, ‘statistics’ is about the real, imperfect world in which we live. Individuals are the objects described by the set of data, they may be people, but also, animals or things Variable is any characteristic of an individual, which may change from individual to individual Population is the entire group of individuals about which we want information Sampling Frame (or a list or set) of the individuals (not necessarily the same as the population) from which the sample will be drawn Sample is the subset of individuals from which information is collected (and used to draw conclusions about the whole) In reality, you have ‘theoretical’ population, an ‘implementable’ sampling frame, and an ‘actual’ sample What are the key words? (statistics jargon) JLM (WSU) Common Language - 1 JLM (WSU) 7 / 100 Common Language - 2 STA 1020 Ch01 - Where Do Data Come From? Observational Studies try to gather information without disturbing the scene they are observing Sample Survey is a type of observational study, data collected on a sample, looks only at part of the population Census is a sample survey that attempts to include the entire population in the sample Experiments actually do something (called treatments) to individuals in order to see how they respond. Usually, the goal is to learn whether some treatment actually causes a certain response 8 / 100 Example 4: Public Opinion Polls Poll such as those conducted by Gallup and many news organizations ask people’s opinions on a variety of issues. The variables measured are responses to questions about public issues. Though most noticed at election time, these polls are conducted on a regular basis throughout the year. For a typical poll: Population: US residents 18 years of age and over. Non citizens and even illegal immigrant are included Sample: Between 1,000 and 1,500 people interviewed by telephone Sampling Frame? Reachable population? Response is a variable that measures an outcome or result of a study. Three steps to doing Statistics right: (1) Think first, know where you are headed and why. (2) Show is what folks think Statistics is about, the mechanics of calculating statistics and making displays is important, but not the most important part of Statistics. (3) Tell what you have learn, until you have explained your results so that someone else can understand your conclusion, the job is not done. JLM (WSU) STA 1020 Ch01 - Where Do Data Come From? JLM (WSU) 9 / 100 Example 5: CPS STA 1020 Ch01 - Where Do Data Come From? 10 / 100 Example 6: TV Ratings Government economic and social data come from large sample surveys of a nation’s individuals, households, or businesses. The monthly Current Population Survey (CPS) is the most important government sample survey in the United States. Many of the variables recorded by the CPS concern the the employment or unemployment of everyone over 16 year old in a household. The CPS also records many other economic and social variables. For the CPS: Market research is designed to discover what consumers want and what they use. One example of market research is the television-rating service Nielsen Media Research. These ratings influence how much advertisers will pay to sponsor a program and whether or not the program stay on the air. For the Nielsen national TV ratings: Population: The more than 111 millions US households (i.e., all people who share the same living quarters, regardless of how they are related) Sample: About 25,000 households that agree to use a “people meter” to record the TV viewing of all people in the household Sample: About 60,000 households interviewed Population: The over 111 millions US households that have a television set Sampling Frame? Do you use a people meter? Sampling Frame? Have you been interviewed? JLM (WSU) STA 1020 11 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 2 / 17 12 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch01 - Where Do Data Come From? Example 7: GSS Ch01 - Where Do Data Come From? Social science research makes heavy use of sampling. The General Social Survey (GSS), carried out every second year by the National Opinion Research Center at the University of Chicago, is the most important social science sample survey. The variables covered the subject’s personal and family background, experiences and habits, and attitudes and opinions on subjects from abortion to war. Population: Adults (aged 18 and over) living the United States, that can be interviewed in English, excluding adults in prisons and college dormitories (& homeless) Sample: About 3,000 adults interviewed in person in their homes. Sampling Frame? Have you been interviewed? Example 8: Helping welfare mothers find jobs Most adult recipients of welfare are mothers of young children. Observational studies of welfare mothers show that many are able to increase their earning and leave the welfare system. Some take advantage of the voluntary job-training programs to improve their skills. Should participating in job-training and job-search programs be required of all able-bodied welfare mothers? Observational studies of the current system cannot tell us what the effects of such a policy would be. Even if the mother studied are a properly chosen sample of all welfare recipients, those who seek out training and find jobs may differ in many ways from those who do not. They are observed to have more education, for example, but they may also differ in values and motivation, things that cannot be observed. * To see if a required jobs program will help mothers escape welfare, such a program must actually be tried. Choose two similar groups of mothers when they apply for welfare. Require one group to participate in a job-training program, but do not offer the program to the other group This is an experiment. Comparing the income and work record of the two groups after several years will show whether requiring training has the desired effect. * If we hope the training will raise earnings, is it ethical to offer it to some women and not to others? JLM (WSU) STA 1020 Ch01 - Where Do Data Come From? 13 / 100 JLM (WSU) Key Concepts STA 1020 Ch01 - Where Do Data Come From? 14 / 100 Data Set The first time you see a data set, ask yourself these questions: NOW IT’S YOUR TURN: 1.1 Federal Funding - Yes or No? What are the objects of interest? 1.2 Posting lectures on the class web site - Was this an Experiment? What variables were measured? Answer the question in Case Study Evaluated What are the units of measurement? How were the variables measured? Who collected the data? Knowing about statistical methods will have practical consequences in your every day lives Experiments versus Observational Studies Common Terms (Individuals, Population, Sampling Frame, Sample, Sample Survey, Census, Variable) Data and Variables: identify, classify, and describe the Who, What, When, Where, Why and How How did they collect the data? Where were the data collected? Why did they collect the data? A data file consists of rows and columns, where each row represents a unique individual (or object), and each column represents a variable (or characteristic) that describes the individuals. Numerical variables describe quantities of the individuals, and Categorical variables describe qualities of the individuals. JLM (WSU) STA 1020 Ch01 - Where Do Data Come From? 15 / 100 When discussing the change in the rate or risk of occurrence of something, make sure you also include the base rate or baseline risk. A representative sample of only a few thousand, or perhaps even a few hundred, can give reasonably accurate information about a population of many millions. An unrepresentative sample, even a large one, tells you almost nothing about the population. Cause-and-effect conclusions cannot generally be made on the basis of an observational study. Unlike with observational studies, cause-and-effect conclusions can generally be made on the basis of randomized experiments. A “statistically significant” finding does not necessarily have practical importance. When a study reports a statistically significant finding, find out the magnitude of the relationship or difference. A secondary moral to this story is that the implied direction of cause and effect may be wrong. In this case, it could be that people who were more lonely and depressed were more prone to using the Internet. And as the follow-up research makes clear, remember that “truth” does not necessarily remain fixed across time. Any study should be viewed in the context of society at the time it was done. [Utts & Heckard - Mind on Statistics - Brooks-Cole (2007)] STA 1020 STA 1020 Ch01 - Where Do Data Come From? Simple summaries of data can tell an interesting story and are easier to digest than long lists. JLM (WSU) JLM (WSU) Morals to Remember Ch01 16 / 100 Exercise Ch01 1.10 What is the population? For each of the following sampling situations, identify the population as exactly as possible. That is, say what kind of individuals the population consists of and say exactly which individuals fall in the population. If the information given is not sufficient, complete the description of the population in a reasonable way. (a) A sociologist is interested in determining the extent to which teens are self-motivated. She selects a sample of four high schools in a large city and interviews all tenth-graders in each of the schools. (b) The lecturer in a large introductory mathematics course is concerned about the accuracy with which multiple-choice tests are graded by her teaching assistants. After the most recent test, she selects a sample of the exams and regrades them. (c) The host of a local radio talk show wonders if people who are actively religious are happier than those who are not. The station receives calls from 48 listeners who voice their opinions. 17 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 3 / 17 18 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch01 - Where Do Data Come From? Exercise (answer) Ch01 Ch01 - Where Do Data Come From? **Answers 1.10 What is the population. Exact descriptions of the populations may vary. (a) Teenagers (or “tenth-graders,” but from the description of the situation, the researcher would like information about all teens). (b) The most recent set of exams (or perhaps, all exams for the course). (c) All adults, or everyone (the radio host’s question does not necessarily exclude children from consideration). JLM (WSU) STA 1020 Ch01 - Where Do Data Come From? Multiple choice Ch01 The monthly government sample survey that produces the unemployment rate and other data about employment and earnings is called (a) the National Household Survey. (b) the General Social Survey. (c) the Survey of Employment. (d) the Current Population Survey. Answer: (d) ....................................................................... Each month, the commissioner of the Bureau of Labor Statistics appears before Congress. His most recent testimony (October 3, 2008) began, “Thank you for the opportunity to discuss the September employment and unemployment data that we released this morning. The unemployment rate was unchanged at 6.1 percent ...” The large sample survey that produces monthly data on employment and unemployment is called the (a) General Social Survey. (b) Current Population Survey. (c) Federal Employment Survey. (d) Gallup Poll. Answer: (b) JLM (WSU) 19 / 100 Exercise 2 Ch01 STA 1020 20 / 100 Ch02 - Samples, Good and Bad Identifying Data Sets. In a recent survey, 1500 adults in the United States were asked if they thought there was solid evidence of global warming. Eight hundred fifty-five of the adults said yes. Identify the population and the sample. Describe the sample data set. STA 1020 Fall 2013 Section 09 MWF 10:40-11:35 0035 State **Solution. The population consists of the responses of all adults in the United States, and the sample consists of the responses of the 1500 adults in the United States in the survey. The sample is a subset of the responses of all adults in the United States. The sample data set consists of 855 yes’s and 645 no’s. Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm ....................................................................... Definition [Chapter 02] * A ‘statistic’ is a numerical description of a sample characteristic (e.g., percentage of yes in sample, i.e., 855/1500 = 57%). * A ‘parameter’ is a numerical description of a population characteristic (e.g., percentage of yes in population, i.e., ?). JLM (WSU) STA 1020 Ch02 - Samples, Good and Bad JLM (WSU) 21 / 100 Thought Questions 1,2. . . Popular magazines often contain surveys that ask their readers to answer questions about hot topics in the news. Do you think the responses the magazines receive are representative of public opinion? Explain why or why not. ....................................................................... A survey on poverty and welfare included the following question, “Do you agree with the popular notion that government policy should attempt to assist those individuals who have had the misfortune to end up living in poverty by providing them with much needed financial assistance until they can get back on their feet?” Based on the wording, do you think the author of this question was looking for support or opposition to welfare programs? Explain. STA 1020 STA 1020 Ch02 - Samples, Good and Bad Chapter 2 JLM (WSU) “Statistics” is the Science of collecting, describing and interpreting data... It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible 22 / 100 Thought Questions 3,4. . . The Cable News Network (CNN) often asks its viewers to call the network with their opinions on certain political issues, like whether or not they favor current foreign policy. Do you think the results of these polls represent the feelings of the general population? Do you think they represent the feelings of all those watching CNN at the time? Explain. ....................................................................... Suppose you access an online listing of all courses at your institution, alphabetized by department, to determine what proportion of all courses have a statistics course as a prerequisite. If you decide to sample 50 courses in order to get a representative sample of courses, how would you select them? Would it be appropriate to simply select the first 50 courses listed? 23 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 4 / 17 24 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch02 - Samples, Good and Bad Good & Bad Samples Ch02 - Samples, Good and Bad Objective: To obtain samples that are representative of the population (i.e., less bias as possible, a reduced picture of the whole) Biased sampling methods The design of a statistical study is biased if it systematically favors certain outcomes Convenience Sampling: Selection of whichever individuals are easiest to reach Voluntary Response Sample: Individuals responding to a general appeal, i.e., write-in or call-in opinion polls. Convenience Sampling and Voluntary Response Sample are biased! Example 1: At The Mall Manufacturers and advertising agencies often use interviews at shopping malls to gather information about the habits of consumers and the effectiveness of ads. A sample of mall shoppers is fast and cheap. But people contacted at shopping malls are not representative of the entire US population They are richer, for example, and more likely to be teenagers or retired. Moreover, the interviewers tend to select neat, safe-looking individuals from the stream of customers. Mall samples are biased: they systematically over-represent some parts of the population (prosperous people, teenagers, and retired people) and under-represent others. The opinions of such a convenience sample may be very different from those of the population as a whole. Random selection methods are “better”, like “drawing names out of a hat”. . . Use Tables of Random Numbers or Computer Softwares JLM (WSU) STA 1020 Ch02 - Samples, Good and Bad 25 / 100 JLM (WSU) Example 2: Write-in A Simple Random Sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected A Table of Random Digits is a long string of the ten digits 0, 1, . . . , 9 with the following properties: People who feel strongly about an issue, particularly people with strong negative feelings, are more likely to take the trouble to respond. Ann Landers’ results are strongly biased. Abigail Van Buren (niece of AL) revisited this question in her column, and repeat the poll, and the majority of respondents would have children again. 1 2 Write-in and call-in opinion polls are almost sure to lead to strong biased. In fact, only about 15% of the public have ever responded to a call-in poll, and these tend to be the same people who call radio talk shows. STA 1020 Ch02 - Samples, Good and Bad Note: Usually, a table of random digits is organized by lines, and digits are grouped (Table A) or (Table A 8Ed) 27 / 100 JLM (WSU) For example, pick a line and column at random: suppose we get line 111, column 3 28 / 100 Example 3 - SRS Joan’s small accounting firm serves 30 business clients. Joan wants to interview a sample of 5 clients to find ways to improve client satisfaction. To avoid bias, she chooses an SRS of size 5. How to do it? 1 Give a numerical label. using as few digits as possible. Two digits are needed to label 30 clients, for instance 01, 02, . . . , 29, 30. It is also correct to use labels 00 to 29, or 31 to 60 2 Enter Table A anywhere and read two-digit groups. For instance, if we enter at line 130, then the first 10 two-digits groups in this line are 69 05 16 48 17 87 17 40 95 17. Any two-digit group in Table A is equally likely to be any of the 100 possible groups 00, 01, . . . , 98, 99. Joan used only labels 01 to 30, we can ignore repeated two-groups and all other two-groups, because (in Table A) each digit is independent of each other. 3 Thus, of the first 10 labels in line 130, we retain only 3 labels, namely, 05, 16, 17. We continuous (same line or with line 131, if needed) until five labels are chosen. 4 Finally, we form the SRS with the clients corresponding to the chosen labels Random labels: 605 130 929 700 412 712 . . . Give details . . . STA 1020 STA 1020 Ch02 - Samples, Good and Bad Use a random digits table to select which 50 courses to sample JLM (WSU) Each entry in the table is equally likely to be any of the ten digits 0, 1, . . . , 9 The entries are independent of each other, i.e., the knowledge of one part of the table gives no information about any other part Courses w/Sta Prerequisite Suppose there are 800 courses at an institution, alphabetized by department (and numbered 001-800), and you decide to randomly select 50 of them to determine what proportion of all the courses have a statistics course as a prerequisite 26 / 100 SRS A Random Sample (RS) of size n consists of n individuals from the population chosen in such a way that every individual has an equal chance (or is equally likely) to be the sample actually selected Ann Landers once asked the readers of her advice column, “If you had it to do over again, would you have children?” She received nearly 10,000 responses, almost 70% saying “NO!” Can it be true that 70% of parents regret having children? Not at all. This is a voluntary response sample. JLM (WSU) STA 1020 Ch02 - Samples, Good and Bad 29 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 5 / 17 30 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch02 - Samples, Good and Bad SRS - How to? Ch02 - Samples, Good and Bad Choose an SRS in two steps Stratified RS (Ch4!) Choose an Stratified Random Sample in two steps 1 Labeling. Assign a numerical label to every individual in the population (or sample frame). If you are planing to use a table of random digits, then be sure that all labels have the same number of digits. 1 Strata. Divide the population into groups of similar individuals, called strata. Choose the strata according to any special interest you have in certain groups within the population or because the individuals in each stratum resemble each other. 2 Software or Table. Use random digits to select labels at random. 2 SRS. Take a separate SRS in each stratum and combine these to up to make the complete sample. Example 4 SRS using software http://www.randomizer.org Repeat Example 3. We ask the Randomizer to generate one set of number with five number per set. We specified the number range as 1 to 30. We requested that each number remain unique, and that the numbers be sorted from least to greatest. We asked to view the outputted numbers with the pace marked off. After clicking on the “Randomize Now” button, we obtain the result (which may be different each time we try!) *Example 8 Ch 4: Stratifying a sample of students A large university has 30,000 students, of whom 3,000 (10%) are graduate students. An SRS of 500 students gives every student the same change to be in the sample 500/30000 = 1/60. We expect an SRS of 500 to contain only about 50 grad students, and sample of size 50 is not large enough to estimate grad students opinion with reasonable accuracy. We may prefer a stratified sample of 200 grad students and 300 undergraduates *Check also Multistage Sample Design and other sampling techniques JLM (WSU) STA 1020 Ch02 - Samples, Good and Bad 31 / 100 JLM (WSU) Sample Surveys (Ch4!) STA 1020 Ch02 - Samples, Good and Bad 32 / 100 Believe a Poll (Ch4!) Questions to ask before you believe a poll Errors in sampling Sampling errors are errors caused by the act of taking a sample. They cause sample results to be different from the results of a census, e.g., undercoverage (recall sampling frame!). Random sampling error is the deviation between the sample statistic and the population parameter caused by chance in selecting a random sample. The margin of error, in a confidence statement, includes only random sampling error. Nonsampling errors are errors not related to the act of selecting a sample from the population. They can be present even in a census, e.g. processing errors, nonresponse, wording questions The announced margin of error for a sample survey covers only random sampling error. Undercoverage, nonresponse, and other practical difficulties can cause large bias that is not covered by the margin of error. 1 Who carried out the survey? Even a political party should hire a professional sample survey firm which follows good survey practices. 2 What was the population? That is, whose opinions were being sought? 3 How was the sample selected? Look for mention of random sampling. 4 How large was the sample? Even better, find out both the sample size and the margin of error within for a 95% confidence level 5 What was the response rate? That is, what percent of the original subjects actually provided information? 6 How were the subjects contacted? By telephone? Mail? Face-to-face interview? 7 When was the survey conducted? Was it just after some even that might have influenced opinion? 8 What were the exact questions asked? Note: Academic survey centers and government statistical offices answer these questions when they announce the results of a sample survey. JLM (WSU) STA 1020 Ch02 - Samples, Good and Bad 33 / 100 STA 1020 STA 1020 Ch02 - Samples, Good and Bad 2.4 Instant opinion. The BusinessWeek online poll can be found at the Web site indicated in the “Notes and Data Sources”. The latest question appears on the screen, and visitors to the site can simply click buttons to vote. On March 29, 2007, the question was “Do you think Google is too powerful?” In all, 1336 (35.9%) said “Yes,” 2051 (55.1%) said “No,” and 335 (9.0%) said “I’m not sure.” (a) What is the sample size for this poll? (b) At the Web site, BusinessWeek includes the following statement about its online poll. “Note: These are surveys, not scientific polls.” Explain why the poll may give unreliable information. (c) Just above the poll question was the following statement: “Google’s accelerating lead in search and its moves into software and traditional advertising are sparking a backlash among rivals.” How might this statement affect the poll results JLM (WSU) JLM (WSU) Exercise Ch02 34 / 100 Exercise (answer) Ch02 **Answers 2.4 Instant opinion. (a) The sample size is 3,722. (b) Online polls may not be reliable for a number of reasons (e.g., not everyone has a chance to access the poll or to know about the poll, we don’t know if people can answer the poll more than once). (c) This statement might bias more people to say “Yes” than they otherwise would. 35 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 6 / 17 36 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch02 - Samples, Good and Bad Multiple choice Ch02 Ch03 - What Do Samples Tell Us? The student newspaper runs a weekly question that readers can answer online or by campus mail. One question was “Do you think the college is doing enough to provide student parking?” Of the 82 people who responded, 79% said “No.” When we say that the newspaper poll is biased, we mean that (a) faculty may have a different opinion from students. (b) repeated polls would give results that are very different from each other. (c) the question asked shows gender or racial bias. (d) repeated polls would miss the truth about the population in the same direction. Answer: (d) .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In a table of random digits, (a) each pair of digits 00, 01, 02, ..., 99 appears exactly once in any row of the table. (b) any pair of entries is equally likely to be any of the 100 possible pairs 00, 01, 02, . . . , 99. (c) a specific pair such as 00 cannot be repeated until all other pairs have appeared. (d) the pair 00 can appear, but 000 is not random and can never appear in the table. Answer: (b) JLM (WSU) STA 1020 Ch03 - What Do Samples Tell Us? Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm “Statistics” is the Science of collecting, describing and interpreting data... It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) 37 / 100 Thought Questions. . . STA 1020 Ch03 - What Do Samples Tell Us? Chapter 3 38 / 100 Taking a Sample Sampling Terminology During a medical exam, the doctor measures your cholesterol two times. Do you think both measurements would be exactly the same? Why or why not? ....................................................................... To estimate the percentage of all adults who have an internet connection in their homes, a properly chosen sample of 1100 adults across the US was sampled, and 60% said “yes”. How close do you think that is to the percentage of the entire country who have an internet connection? Within 30%? 10%? 5%? 1%? Exactly the same? JLM (WSU) STA 1020 Fall 2013 Section 09 MWF 10:40-11:35 0035 State STA 1020 Ch03 - What Do Samples Tell Us? Statistical inference is the process of drawing conclusions about the population based on a sample Parameter is a fixed, but unknown, number that describes some characteristic the population Statistic is a known value calculated from a sample, a statistic is used to estimate a parameter Bias in repeated samples: the sample statistic consistently misses the population parameter in the same direction Variability: different samples from the same population may yield different values of the sample statistic Sampling distribution is the distribution of a given statistic based on a random sample, essential for statistical inference JLM (WSU) 39 / 100 Example 1: Proportion in Favor STA 1020 Ch03 - What Do Samples Tell Us? 40 / 100 Example 2: Lots of Samples The proportion of all adults who favor a constitutional amendment (that would define marriage as being between a man and a woman) is a parameter describing the population of 220 millions of US adults. Call it p, for “proportion.” Alas, we do not know the numerical value of p (Why?). To estimate p, Gallup took a sample of 2527 adults. The proportion of the sample who favor such an amendment is a statistic. Call it p̂. It happens that 1289 of this sample (of size 2527) said that they favor an amendment, so for this sample p̂ = 1289/2527 = 0.51 (i.e., 51%). Figure 3.1 Draw 1000 SRSs of size 100 from the same population ....................................................................... Because all adults had the same chance of be among the chosen 2527 (SRS), it seems reasonable to use the statistic p̂ as an estimate of the unknown parameter p. It is a fact that 51% of the sample favored an amendment (we know because we asked them). We do not know what percentage of all adults favor an amendment, but we estimate that about 51% do (Based on what?) Figure 3.2 Draw 1000 SRSs of size 2527 from the same population as in Figure 3.1. JLM (WSU) STA 1020 41 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 7 / 17 42 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch03 - What Do Samples Tell Us? Two Type of Errors Ch03 - What Do Samples Tell Us? Example 3: TV News Here is what the TV news announcer says: “A new Gallup Poll finds that a slim majority of 51% of American adults favor a constitutional amendment To reduce bias, use random sampling that would define marriage as being between a man and a woman, thus barring The margin of error for the poll was 2 percent points.” Plus or minus 2% starting at 51% is from 49% to 53%. Most marriages between gay and lesbian couples. To reduce variability, use larger samples people think Gallup claims that the truth about the entire population lies in that range. This is what Gallup actually said: “For results based on a sample of this size, one A good sampling method has both small bias and small variability. can say that with 95% confidence that the error attributable to sampling and other That is, Gallup tells us that the margin of error includes the truth about the entire population for only 95% of all its sample. The “95% confidence” is a shorthand for that. The news report left out the “95% confidence.” random effects could be plus o minus 2 percents points for adults.” Figure 3.3: Bias and variability in shooting arrows at a target. Bias means the archer systematically misses in the same direction. Variability means that the arrows are scattered JLM (WSU) STA 1020 Ch03 - What Do Samples Tell Us? 43 / 100 In 95% of surveys, the sample proportion will not differ from the population proportion by any more than the margin of error. (“95% confidence”) More details will be discussed in later Chapters... Recall two issues: √ √ Margin of Error p̂ − 1/ n ≤ p ≤ p̂ + 1/ n 95% Confidence, i.e., only 5% of the SRS may miss the margin of error All based on the sampling distribution! STA 1020 Ch03 - What Do Samples Tell Us? √ STA 1020 1 1 = = 0.020, 50.27 2527 about 2.0% Gallup announced a margin of error of 2%, and our quick method agree with this. In general our quick method may disagree a bit with Gallup’s for two reasons. First, polls usually round their announced margin of error to the nearest whole percent to keep press releases simple. Second, our rough formula works for SRS, more details in later chapters JLM (WSU) 45 / 100 STA 1020 Ch03 - What Do Samples Tell Us? For instance, to decrease by half the margin or error we need to increase the sample size four times. 44 / 100 Example 4: What is the MoE? The Gallup Poll in Example 1 interviewed 2527 people. The margin of error for 95% confidence will be about Example 5: MoE and Sample Size In Example 2, we compared the results of taking many SRSs of size n = 100 and many SRSs of size n = 2527 from the same population. We found that the spread of the middle 95% of the sample results was about five times larger for the smaller samples. Our quick formula estimates the margin of error for SRSs of size 2527 to be√ about 2.0%. The margin of error for SRSs of size 100 is about 1/ 100 = 1/10 = 0.1, i.e., 10%. Because 2527 is roughly 25 times 100 and the square root of 25 is 5, the margin of error is about five times larger for samples of 100 people than for samples of 2527 people. JLM (WSU) STA 1020 Ch03 - What Do Samples Tell Us? The amount by which the proportion obtained from the sample (p̂) will differ from the true population proportion (p) rarely exceeds the margin of error. √ Typical Margin of Error: 1/ n JLM (WSU) JLM (WSU) Margin of Error / Confidence 46 / 100 Confidence statements A confidence statement has two parts: a margin of error and a level of confidence. The margin of error says how close the sample statistic lies to the population parameter. The level of confidence says what percentage of all possible samples satisfy the margin of error. ....................................................................... Population size does not matter: The variability of a statistic from a random sample does not depend on the size of the population as long as the population is at least 100 times larger than the sample. For a SRS of √ size n, the margin of error for 95% confidence is equal to 1/ n 47 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 8 / 17 48 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch03 - What Do Samples Tell Us? Interpreting confidence statements Ch03 - What Do Samples Tell Us? ** We must be careful to interpret confidence intervals correctly. There is a correct interpretation and many different and creative incorrect √ √ interpretations of the confidence interval p̂ − 1/ n ≤ p ≤ p̂ + 1/ n √ Correct: “We are 95% confident that the interval from p̂ − 1/ n to √ p̂ + 1/ n actually does contain the true value of the population proportion p.” This means that if we were to select many different samples of size n and construct the corresponding confidence intervals, then at least, 95% of them would actually contain the value of the population proportion p. The level of 95% refers to the success rate of the process being used to estimate the proportion. ....................................................................... INCORRECT: “There is a 95% chance that the true value of p will fall √ √ between p̂ − 1/ n and p̂ + 1/ n.” It would also be incorrect to say that “95% of sample proportions fall √ √ between p̂ − 1/ n and p̂ + 1/ n.” JLM (WSU) STA 1020 Ch03 - What Do Samples Tell Us? Theoretical Probabilities: 0.05652 3.8 A sampling experiment. Let us illustrate sampling variability in a small sample from a small population. Ten of the 25 club members listed below are female. Their names are marked with asterisks in the list. The club chooses 5 members at random to receive free trips to the national convention. Alonso Binet* Blumenbach Chase* Chen* 0.25692 JLM (WSU) 49 / 100 0.38538 0.23715 Darwin Epstein Ferri Gonzales* Gupta Herrnstein Jimenez* Luo Moll* Morales* Vogt* Went Wilson Yerkes Zimmer STA 1020 Ch03 - What Do Samples Tell Us? 0.05929 Myrdal Perez* Spencer* Thomson Toulmin (a) Draw 20 SRSs of size 5, using a different part of Table A each time. Record the number of females in each of your samples. Make a histogram like that in Figure 3.1 to display your results. What is the average number of females in your 20 samples? (b) Do you think the club members should suspect discrimination if none of the 5 tickets go to women? Exercise (answer) Ch03 **Answers 3.8 A sampling experiment. (a) Results will vary (see note following), but the theoretical set of probabilities is shown following; the theoretical mean number of women is 2. (b) We would expect only about 1 in 20 samples to have no women in them; this is not impossible, but is at least unlikely, so it may be enough reason to suspect discrimination. Exercise Ch03 50 / 100 Multiple choice Ch03 The student newspaper runs a weekly question that readers can answer online or by campus mail. One question was “Do you think the college is doing enough to provide student parking?” Of the 82 people who responded, 79% said “No.” 1 0.00474 The number 79% is a (a) statistic. (b) parameter. (c) reliability. (d) margin of error. Answer: (a) . ................................................................. These theoretical probabilities were computed using a hypergeometric distribution. 90% of all students who do this exercise will have no women in two or fewer of their 20 samples, while nearly all (99.5% of all students) will have no women in no more than four samples. JLM (WSU) STA 1020 2 If we applied the quick method to the poll we would obtain this 95% confidence interval: (a) 79% ± 11%. (b) 79% ± 9%. (c) 82 ± 79. (d) 82 ± 11%. Answer: (a) 51 / 100 JLM (WSU) Ch05 - Experiments, Good and Bad STA 1020 Ch05 - Experiments, Good and Bad 52 / 100 Thought Questions. . . Chapter 5 In an observational study, researchers observe what individuals do (or have done) naturally, while in an experiment, they randomly assign the individuals to groups to receive one of several “treatments”. STA 1020 Fall 2013 Section 09 MWF 10:40-11:35 0035 State 1 Ex1 Learning on the Web: . . . students learning undergraduate courses on line were “equal in learning” to students taking the same courses in class . . . 2 Ch1 - Ex3 Do power lines causes leukemia in children? Electric currents generate magnetic fields. Strong magnetic fields can disturb living cells. What about if you live near power lines? Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm “Statistics” is the Science of collecting, describing and interpreting data... This last example is a situation where an experiment would not be feasible and thus an observational study is used. It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) STA 1020 53 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 9 / 17 54 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch05 - Experiments, Good and Bad Alzheimer’s disease Ch05 - Experiments, Good and Bad In studies to determine the relationship between two conditions (activities, traits, etc.), one of them is often defined as the explanatory (independent) variable and the other as the outcome or response (dependent) variable. In an experiment to determine whether the drug memantine improves cognition of patients with moderate to severe Alzheimer’s disease, whether or not the patient received memantine is one variable, and cognitive score is the other. Which is the explanatory variable and which is the response variable? How would you go about randomizing 100 patients to the two treatment groups (memantine group & placebo group)? Why is it necessary to randomly assign the subjects, rather than having the experimenter decide which patients should get which treatment? JLM (WSU) STA 1020 Ch05 - Experiments, Good and Bad For Experiments or Clinical Trials Response variable: what is measured as the outcome or result of a study Explanatory variable: what we think explains or causes changes in the response variable, often determines how subjects are split into groups) Subjects: the individuals that are participating in a study Treatments: specific experimental conditions, (related to the explanatory variable) applied to the subjects. If an experiment has several explanatory variables, a treatment is a combination of specific values of these variables. The Experimental units are the objects on which measurements are taken 55 / 100 JLM (WSU) Ex.: The effect of day care STA 1020 Ch05 - Experiments, Good and Bad . . . The Carolina Abecedarian Project has followed a group of children since 1972. The Abecedarian Project is an experiment in which the subjects are 111 people who in 1972 were healthy but low-income black infants in Chapel Hill, North Caroline. All the infants received nutritional supplements and help from social workers. Half, chosen at random, were also placed in an intensive preschool program. The experiment compares these two treatments. The explanatory variable is just “preschool, yes or no”. There are many response variables, recorded over more than 30 years, including academic test scores, college attendance, and employment. This long and expensive experiment did not completely established that a good day care makes a big difference in later school and work. Common Language 56 / 100 Back to Ex1, Really? Do students who take a course via the Web learn as well as those who take the same course in a traditional classroom? The best way to find out is to assign some students to the classroom and other to the Web. That is an experiment. If students chose for themselves whether to enroll in a classroom or online version of a course, then this is not an experiment (no treatment is imposed to the subjects). Actually, students who chose the online course were very different from the classroom students, e.g., their average score on tests on the course material given before the courses started was 40.70, against only 27.64 for the classroom students. Confounded variables? Lurking variables! A following question that we would like to answer is: “how good must day care be to really help children succeed in life” . . . JLM (WSU) STA 1020 Ch05 - Experiments, Good and Bad 57 / 100 JLM (WSU) Ex1 Confounded Variables STA 1020 Ch05 - Experiments, Good and Bad 58 / 100 Confounding A lurking variable is a variable that has an important effect on the relationship among the variables in a study but is not one of the explanatory variables studied Two variables are confounded when their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables. Placebo Effect: A placebo is dummy treatment with not active ingredients. Many patients respond favorable to any treatment, even a placebo. The response to a dummy treatment is the placebo effect. Perhaps the placebo effect is in our minds, based on the trust in the doctor, expectations of a cure, or else . . . (read Example 3) Figure 5.1 Confounding in the Nova Southeastern University study. The influence of course setting (the explanatory variable) cannot be distinguished from the influence of student preparation (a lurking variable) JLM (WSU) STA 1020 59 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 10 / 17 60 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch05 - Experiments, Good and Bad Logic Design Ch05 - Experiments, Good and Bad Cause-and-Effect Conclusions Randomization produces groups of subjects that should be similar in all respect before we apply the treatments Comparative design ensures that influences other than experimental treatments operate equally on all groups Example 4 Sickle-cell Anemia . . . is an inherited disorder of the red blood cells . . . The NIH carried out a clinical trial of the drug hydroxyurea . . . half received placebo . . . schedule medical checkups . . . Lurking variables affected both groups equally . . . The experiment was stopped ahead of schedule because the hydroxyurea groups had many fewer pain episodes . . . Therefore, difference in the response variable (as far as our knowledge goes!) must be due to the effects of the treatments We use chance to choose the groups in order to eliminate any systematic bias in assigning the subjects to groups, i.e., use SRSs to form the groups. Figure 5.2 The design of a randomized comparative experiment to compare hydroxyurea with a placebo for treating sickle-cell anemia, for Example 4. JLM (WSU) STA 1020 Ch05 - Experiments, Good and Bad 61 / 100 JLM (WSU) Example 5 Conserving Energy STA 1020 Ch05 - Experiments, Good and Bad An electric company considers placing electronic meters in households to show what the cost would be if the electricity use at that moment continued for a month. One cheaper approach is The experiment compares theses two approaches (meter & chart) and also a control (information, but no help). to give customers a chart and information about monitoring their electricity use. 62 / 100 Design & Conclusions A solution to handle Lurking Variables for an Experiment: randomize experimental units to receive different treatments (possible confounding variables should “even out” across groups) Observational Study: measure potential confounding variables and determine if they have an impact on the response (may then adjust for these variables in the statistical analysis) If an experiment or observational study finds a difference in two (or more) groups, is this difference really important? If the observed difference is larger than what would be expected just by chance, then it is labeled statistically significant. Rather than relying solely on the label of statistical significance, also look at the actual results to determine if they are practically important Figure 5.3 The design of a randomized comparative experiment to compare three programs to reduce electricity use by households, for Example 5 JLM (WSU) STA 1020 Ch05 - Experiments, Good and Bad 63 / 100 Control the effects of lurking variables on the response, most simply by comparing two or more treatments Randomize-use impersonal chance to assign subjects to treatments Use enough subjects in each group to reduce chance variation in the results An observed effect of a size that would rarely occur by chance is called statistically significant Ex 6 (Living longer through religion) and Ex 7 (Sex bias in treatment heart disease?) Case Study Evaluated! STA 1020 STA 1020 Ch05 - Experiments, Good and Bad The basic principles of statistical design of experiments are: JLM (WSU) JLM (WSU) Principles of experimental design 64 / 100 Exercise Ch05 5.6 Aspirin and heart attacks. Can aspirin help prevent heart attacks? The Physicians’ Health Study, a large medical experiment involving 22,000 male physicians, attempted to answer this question. One group of about 11,000 physicians took an aspirin every second day, while the rest took a placebo. After several years the study found that subjects in the aspirin group had significantly fewer heart attacks than subjects in the placebo group. (a) Identify the experimental subjects, the explanatory variable and the values it can take, and the response variable. (b) Use a diagram to outline the design of the Physicians’ Health Study. (When you outline the design of an experiment, be sure to indicate the size of the treatment groups and the response variable. The diagrams in Figures 5.2 and 5.3 are models.) (c) What do you think the term “significantly” means in “significantly fewer heart attacks”? 65 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 11 / 17 66 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch05 - Experiments, Good and Bad Exercise (answer) Ch05 Ch05 - Experiments, Good and Bad **Answers 5.6 Aspirin and heart attacks. (a) The subjects are the physicians, the explanatory variable is medication (aspirin or placebo), and the response variable is health, specifically whether the subjects have heart attacks. (b) Random Assignment Group 1 11,000 physicians Treatment 1 Aspirin Group 2 11,000 physicians Treatment 2 Placebo JLM (WSU) 1 This study is (a) a randomized comparative experiment. (b) an experiment, but without randomization. (c) a simple random sample. (d) an observational study, but not a simple random sample. Answer: (d) 2 The explanatory variable in this study is (a) whether the subject had an auto accident. (b) whether the subject was using a cell phone. (c) the risk of an accident. (d) whether the subject Answer: (b) owned a cell phone. 3 An example of a lurking variable that might affect the results of this study is (a) whether the subject had an auto accident. (b) whether the subject was using a cell phone. (c) whether the subject was talking to a passenger in the Answer: (c) car. (d) whether the subject owned a cell phone. Observe heart attacks (c) “Significantly” means “unlikely to have occurred by chance if there were no difference between the aspirin and placebo groups.” STA 1020 Multiple choice Ch05 Does using a cell phone while driving make an accident more likely? Researchers compared telephone company and police records to find 699 people who had cell phones and were also involved in an auto accident. Using phone billing records, they compared cell phone use in the period of the accident with cell phone use the same period on a previous day. Result: the risk of an accident was four times higher when using a cell phone. JLM (WSU) 67 / 100 Ch06 - Experiments in the Real World STA 1020 Ch06 - Experiments in the Real World 68 / 100 Thought Questions. . . Chapter 6 Suppose you are interested in determining if drinking a glass of red wine each day helps prevent heartburn. STA 1020 Fall 2013 Section 09 MWF 10:40-11:35 0035 State You recruit 40 adults age 50 and older to participate in an experiment. You want half of them to drink a glass of red wine each day and the other half to not do so. Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] You ask them which they would prefer, and 20 say they would like to drink the red wine and the other 20 say they would not. You ask each of them to record how many cases of heartburn they have in the next six months. At the end of that time period, you compare the results reported from the two groups. Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm “Statistics” is the Science of collecting, describing and interpreting data... Give three reasons why this is not a good experiment It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) STA 1020 Ch06 - Experiments in the Real World 69 / 100 To find out if a new breakfast cereal provides good nutrition, we compare the weight gains of young rats fed the new product and rats fed a standard diet. The rats are randomly assigned to diets and are housed in large racks of cages. It turn out that rats in upper cages grows a bit faster than rats in bottom cages. So, if we put rats fed the new product at the top and those fed the standard diet below, the experiment is biased. Cholesterol level . . . human affection . . . All of the rabbit subjects ate the same diet. Some (chosen at random) were regularly removed from their cages to have their furry heads scratched by friendly people. The Rabbits who received affection had lower cholesterol . . . STA 1020 STA 1020 Ch06 - Experiments in the Real World Rats & Rabbits that are specially bread to be uniform in their inherited characteristics are subjects in experiments JLM (WSU) JLM (WSU) Ex1 Rats & Rabbits 70 / 100 Ex2 Powerful Placebo One study found that 42% of balding men maintained or increased the amount of hair on their heads when they took a placebo. Another study told 13 people who were very sensitive to poison ivy that the stuff being rubbed on one arm was poison ivy. It was a placebo, but all 13 broke out in a rash. The stuff rubbed in the other arm really was poison ivy, but the subjects were told it was harmless, and only 2 of the 13 developed a rash. The strength of the placebo effect (as we can observe!) in medical treatment is hard to pin down because it depends on the exact environment. Even, how enthusiastic the doctor is seems to matter a lot. When the ailment is vague and psychological, like depression, some experts thinks 3/4 of the effect of drugs is just placebo effect. Certainly, other disagree! 71 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 12 / 17 72 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch06 - Experiments in the Real World Double-blinding Ch06 - Experiments in the Real World In a Double-blinding experiment neither the subjects not the people who works with them know which treatment each subject is receiving. Help to control respondent bias Examples 3 & 4 Sample surveys suffer from non response due to failure to contact some people selected for the sample and refusal of other to participate. Experiments with human subject may suffer from similar problems Minorities, women, the poor and the elderly have long been under represented in clinical trials, for various reasons. But, refusals remain a problem Subjects who participate but don’t follow the experimental treatment (called non adherers) can also cause bias Experiments that continue over an extended period of time also suffer dropouts, i.e., subjects who begin the experiment but do not complete it JLM (WSU) STA 1020 Ch06 - Experiments in the Real World 73 / 100 Are our findings statistically significant?, i.e., Are they too strong to often occur just by change? [or equivalently, our findings would rarely occur by chance along?] STA 1020 Ch06 - Experiments in the Real World STA 1020 Ch06 - Experiments in the Real World The treatments, the subjects or the environment may not be realistic JLM (WSU) JLM (WSU) Can we generalize? A psychologist wants to study the effects of failure and frustration on the relationships among members of a work team. She forms a team of students, brings them to the psychology lab, and has them play a game that requires teamwork. The game is rigged so that they lose regularly. The psychologist observes the students through a one-way windows and note the changes in their behavior during an evening of game playing. . . The subjects (students who know they are subjects in an experiment), the treatment (a rigged game) and the environment (the psychology lab) are all unrealistic if the goal is to reach conclusions about the effects of frustration on teamwork in the workplace 75 / 100 JLM (WSU) Ex6 Center brake lights STA 1020 Ch06 - Experiments in the Real World Cars sold in US since 1986 have been required to have a high center brake light in addition the usual two brake lights at the rear of the vehicle. This safety requirement was justified by the randomized comparative experiment with fleets of rental and business cars. The experiments showed that the third brake light reduced rear-end collisions by as much as 50%. 74 / 100 Ex5 Studying frustration 76 / 100 Ex8 Completely randomized Effects of TV advertising . . . All subjects viewed a 40 min TV program that include ads for a digital camera. Some subjects saw a 30-sec commercial; other a 90-sec version. The same commercial was repeated either 1, 3 or 5 times during the program. After viewing, all of the subject answered questions about their recall of the ad, their attitude toward the camera, and their intention to purchase it. After a decade in actual use, the Insurance Institute found only a 5% reduction in rear-end collisions. Most cars did not have the extra brake light when the experiment were carried out, so it caught the eye of following drivers. Now that almost all cars have the third light, it no longer capture attention. Read Example 7: Are subjects treated too well? Carolina Abecedarian Project (Ex 2, Ch05) faces the same “too good to be realistic” question . . . Figure 6.1 The treatments in the experiment of Example 8. Combinations of two explanatory variables form 6 treatments . . . the interaction of several factors can produce effects that could not be predicted from looking at the effect of each factor alone . . . Now it’s your turn. Tasty cakes. JLM (WSU) STA 1020 77 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) . . . baking time or temperatures . . . STA 1020 13 / 17 78 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch06 - Experiments in the Real World Ex9 Coke versus Pepsi Ch06 - Experiments in the Real World Pepsi wanted to demonstrate that Coke drinkers prefer Pepsi when they taste both colas blind. The subjects, all people who said they were Coke drinkers tasted both colas from glasses without brand markings and said which they liked better. This is a matched pairs design in which each subject compares the two colas. Because responses may depend on which cola is tasted first, the order of tasting should be chosen ar random for each subject. When more that half the Coke drinkers chose Pepsi, Coke claimed that the experiment was biased. The Pepsi glasses were marked M and the Coke glasses were marked Q. Aha, said Coke, the results could just mean that people like the letter M better than the letter Q. The matched pair is OK, but a more careful experiment would avoid any distinction other than Coke versus Pepsi Ex10 Men, women & ads An experiment to compare the effectiveness of three TV commercials for the same product will want to look separately at the reactions of men and women, as well as the overall response to the ads . . . Figure 6.2 A block design to compare the effectiveness of three TV advertisements. Female and male subjects form two blocks Ex 11 (Comparing welfare) and Statistical Controversies! JLM (WSU) STA 1020 Ch06 - Experiments in the Real World JLM (WSU) 79 / 100 Some Techniques STA 1020 Ch06 - Experiments in the Real World In a Double-blinding experiment neither the subjects not the people who works with them know which treatment each subject is receiving. This help to control experimenter/respondent bias. In a Completely Randomized experimental design, all experimental subjects are allocated at random among all treatments. A block is a group of experimental subjects that are known before the experiment to be similar in some way that is expected to affect the response to treatments. In a block design, the random assignment of subjects to treatments is carried out separately within each block. Matched Pairs Design is a particular case. 80 / 100 Exercise Ch06 6.8 Fatty acids and depression. A group of medical researchers studied the effects of the intake of omega-6 fatty acids relative to omega-3 fatty acids on depression. They randomly assigned 88 highly stressed and depressed subjects to either a diet high in omega-6 fatty acids relative to omega-3 fatty acids or a diet with a much lower amount of omega-6 fatty acids relative to omega-3 fatty acids. They found that subjects generally showed increased symptoms of depression on the high omega-6 diet compared with those on the low omega-6 diet. The researchers themselves cautioned against interpreting these experimental results as a general warning that diets rich in omega-6 fatty acids increase depression. Why? Simple or multistage or stratified random samples (or experiments). JLM (WSU) STA 1020 Ch06 - Experiments in the Real World 81 / 100 JLM (WSU) Exercise (answer) Ch06 STA 1020 Ch06 - Experiments in the Real World **Answers 6.8 Fatty acids and depression. Since the subjects were “highly stressed and depressed,” the researchers need to be cautious about releasing a general warning, since the existing stress and depression was confounded with the treatments of interest. 82 / 100 Multiple choice Ch06 A study of a drug to prevent hair loss showed that 86% of the men who took it maintained or increased the amount of hair on their heads. But so did 42% of the men in the same study who took a placebo instead of the drug. This is an example of (a) a sampling error: the study should not have included men whose hair grew without the drug. (b) the placebo effect: a treatment often works if you believe that it will work. (c) an error in calculating percentages. (d) failure to use the double-blind idea. Answer: (b) .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A psychologist wants to know if adults with normal vision can be fooled by a certain optical illusion. She recruits 50 students from her PSY 120 class and finds that 42 of them are fooled by the illusion. The biggest potential weakness of experiments is (a) they do not give good evidence for cause and effect. (b) they only work when we can give a placebo. (c) it can be hard to generalize conclusions beyond the actual subjects to a wider population. (d) informed consent is often not possible. Answer: (c) JLM (WSU) STA 1020 83 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 14 / 17 84 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch08 - Measuring Ch08 - Measuring Thought Questions. . . Chapter 8 A local health club is doing a survey to see if there is a relationship between strength and fitness. They want to measure the fitness and strength of a sample of 100 members of the club. Which of these two attributes do you think will be easier for them to measure? Explain STA 1020 Fall 2013 Section 09 MWF 10:40-11:35 0035 State fitness: ‘the quality of being suitable’ or ‘good physical condition; being in shape’ strength: ‘the property of being physically or mentally strong’ or ‘physical energy or intensity’ Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, ....................................................................... by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Would you get the same result if you measure (something) again, and again? ....................................................................... Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm “Statistics” is the Science of collecting, describing and interpreting data... It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) STA 1020 Ch08 - Measuring A study on customer service found that there were more customer complaints registered at a large local grocery store in the past year than at a small local market. Is it fair to conclude that the local market had better customer service? What would be a fairer way to see the numbers? JLM (WSU) 85 / 100 Ex1 Patients / Measurement STA 1020 Ch08 - Measuring 86 / 100 Ex2 Length. . . Clinical trials tend to measure things that are easy to measure, e.g., blood pressure, tumor size, virus concentration in the blood. They often do not directly measure what matters most to patients (does the treatment really improve their lives?) One study found that only 5% of trials published between 1980 and 1997 measured the effect of treatment on patients’ emotional well-being or their ability to function in social setting To measure the length of a bed, you can use a tape measure as instrument (in inches or centimeters, as unit of measurement) We measure a property of a person or thing when we assign a number to represent the property, i.e., the property is being quantified. You might decide to use the number of people who die in motor vehicle accidents in a year as a variable to measure highway safety The government’s Fatal Accident Reporting System collects data on all fatal traffic crashes (instrument?, unit?) We often use an instrument to make a measurement. We may have a choice of the units we use to record the measurements. Questions to ask in any statistical study: The result of measurement is a numerical variable that takes different values for people or things that differ in whatever we are measuring. JLM (WSU) STA 1020 Ch08 - Measuring To measure a student’s readiness for college, you might ask the student to take the SAT Reasoning exam (instrument?). The variable is the student’s score (in points, between 600 and 2400, combining Writing, Critical Reading and Math sections) 1 Exactly, how are the variables defined? 2 Are the variables a valid way to describe the properties they claim to measure? 3 How accurate are the measurements? JLM (WSU) 87 / 100 Ex3 Measuring unemployment STA 1020 Ch08 - Measuring Each month the Bureau of Labor Statistics (BLS) announces the unemployment rate for the previous month. People who are not available for work (i.e, retired, student who do not want to work while in school, etc) are not counted as unemployed. The unemployment rate is the rate between the number of people unemployment and the number of people in the labor force. 88 / 100 Ex3 Measuring unemployment The interviewer for BLS cannot simply ask “Are you employed”? Many questions are needed to classify a person as employed, unemployed, or in the labor force. To complete the exact definition of unemployment rate, the BLS has very detailed descriptions of what it means to be “in the labor force” and what it means to be “unemployed”. For instance, if you are on strike but expect to return to the same job, you count as employed. If you are not working and did not look for work in the last two weeks, you are not in the labor force. The details matter Figure 8.1 The unemployment rate from August 1991 to July 1994. The gap shows the effect of a change in how the government measures unemployment JLM (WSU) STA 1020 89 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 15 / 17 90 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch08 - Measuring Known your variables Ch08 - Measuring A variable is a valid measure of a property if it is relevant or appropriate as a representation of that property Often a rate (a fraction, proportion, or percent) at which something occurs is a more valid measure than a simple count of occurrences A measurement of a property has predictive validity if it can be used to predict success on tasks that are related to the property measured. Often it is difficult to determine if a measurement is valid, especially for behavioral properties Ex4 Measuring highway safety Roads get better. Speed limits increase. Big SUVs replace cars. Enforcement campaigns reduce drunk driving. How as highway safety changed over time in this changing environment? The Fatal Accident Reporting System says there were 40,716 deaths in 1994 and 42,642 deaths 12 years later in 2006. The number of death has increased. But the number of licensed drivers rose from 175 million in 1994 to 201 million in 2006. The number of miles that people drove rose from 2,358 billion to 2,996 billion. If more people drive more miles, there may be more deaths even if the roads are safer. The count of deaths in not a valid measure of highway safety. Rather than a count, we should use a rate. The number of deaths per mile driven taken into account the fact that more people drive more miles than in the past. This deaths rate is the ratio between “motor vehicle deaths” and “100s of millions of miles driven”, i,e, 42642/29960 = 1.4 for 2006, and 40716/23580 = 1.7 for 1994. That is a decrease, i.e., there were 18% fewer deaths per mile driven in 2006 than in 1994. Driving is getting safer. . . JLM (WSU) STA 1020 Ch08 - Measuring JLM (WSU) 91 / 100 Ex5/6 Achievement/IQ tests STA 1020 Ch08 - Measuring When you take a statistics exam, you hope that it will ask you about the points of the course syllabus. If it does, the exam is a valid measure of how much you know about the course material . . . Experts can judge validity by comparing the test questions with the syllabus of material covered. ....................................................................... Psychologists would like to measure aspects of the human personality that cannot be observed directly, such as “intelligence” or authoritarian personality”. Some psychologists affirm that IQ test measure intelligence, but others disagree. If we cannot agree on exactly what intelligence is, we cannot agree on how to measure it! ....................................................................... 92 / 100 Accuracy & Reliability Errors in measurement We can think about errors in measurement this way: measured value = true value + bias + random error A measurement process has bias if it systematically overstates or understates the true value of the property it measures A measurement process has random error if repeated measurements on the same individual give different results. If the random error is small, we say the measurement is reliable No measuring process is perfectly reliable. The average of several repeated measurements of the same individual is more reliable (less variable) than a single measurement Read Ex7 (The SAT again), Statistical Controversies JLM (WSU) STA 1020 Ch08 - Measuring .................................................................................... Improving reliability, reducing bias. What time is it? Much modern technology requires very exact measurement of time, such as the Global Positioning System (GPS), which uses satellite signals to tell you where you are. Time starts with earth’s path around the sun, which last one year, but the earth is much too erratic. Since 1967, time starts with the standard second, defined to be the time required for 9,192,631,770 vibrations of a cesium atom. Physical clocks are bothered by changes in temperature, humidity and air pressure. The cesium atom does not care. .................................................................................... Navigation: Measuring latitude (an imaginary line around the Earth parallel to the equator) position at night in relation to the stars is relatively simple, but measuring the longitude (angular distance between a point on any meridian and the prime meridian at Greenwich) position was a ‘problem’ in the past . . . STA 1020 STA 1020 Ch08 - Measuring In the mid-19th century, it was thought that measuring the volume of a human skull would measure the intelligence of the skull’s owner . . . A professor of surgery showed that filling a skull with small lead shot, then pouring out the shot and weighing it, gave a reliable measurement of the skull’s volume. This accurate measurements do not, however, give a valid measure of intelligence. Skull volume turned out to have no relation to intelligence or achievement. JLM (WSU) JLM (WSU) 93 / 100 Ex8 Smart brains? 94 / 100 Ex9 Really accurate time NIST’s (National Institute of Standards and Technology) atomic clock is very accurate but not perfectly accurate. The world standard is Coordinated Universal Time, compiled by International Bureau of Weights and Measures (BIPM) in Sévres, France. BIPM does not have a better clock that NIST. It calculates the time by averaging the results of more than 200 atomic clocks around the world. NIST tells us (after the fact) how much it misses the correct time by (about 10−9 sec). In the long run, NIST’s measurement of time are not biased (sometimes shorter other longer than BIPM). The average (mean) of several measurements is more reliable than a single measurement. The National Institute of Standards and Technology (NIST) keeps an even more accurate atomic clock and broadcasts the results (with some loss in transmission) by radio, telephone and internet. Read Ex 10 (Measuring unemployment again ) 95 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 16 / 17 96 / 100 ** STA 1020 - Part 1 (24/Sep/13) ** Ch08 - Measuring Ex11 Authoritarian personality Ch08 - Measuring In 1950, a group of psychologists developed the “F-scale” as an instrument to measure authoritarian personality. The F-scale asks how strongly you agree or disagree with statements such as the following: Obedience and respect for authority are the most important virtues children should learn Science has it place, but there are many important things that can never be understood by the human mind Exercise Ch08 8.12 Testing job applicants. The law requires that tests given to job applicants must be shown to be directly job related. The Department of Labor believes that an employment test called the General Aptitude Test Battery (GATB) is valid for a broad range of jobs. As in the case of the SAT, blacks and Hispanics get lower average scores on the GATB than do whites. Describe briefly what must be done to establish that the GATB has predictive validity as a measure of future performance on the job. Strong agreement with such statements mark you are authoritarian. The F-scale and the idea of the authoritarian personality continue to be prominent in psychology, e.g., in studies of prejudice and right-wing extremist movements. Read Case Study Evaluated JLM (WSU) STA 1020 Ch08 - Measuring 97 / 100 8.12 Testing job applicants. It must be shown that scores on the GATB predict future job performance. First, give the GATB to a large number of job applicants for a broad range of jobs. Then, after some time, rate each applicant’s actual performance. These ratings should be objective when possible; if workers are rated by supervisors, the rating should be blind in the sense that the rater does not know the GATB score. Arranging reliable and unbiased rating of job performance may be the hardest part of the task. Finally, examine the relationship between GATB scores and later job ratings. See Constance Holden, “Academy joins the fray over job testing,” Science, vol. 244(1989), pp. 1036-1037 for a discussion of the GATB that contains some nice statistical points. The National Academy of Sciences, reviewing the evidence, said that: (1) The GATB is valid, and is in fact the best single predictor of future job performance, beating interviews, educational background, and past work experience. But the correlation with future ratings is modest, in the neighborhood of 0.3. (2) The GATB predicts just as well for minorities as for whites. (3) Nonetheless, the GATB scores of lower-scoring minority groups should be adjusted upward to avoid adverse impact so severe that civil rights law would rule out use of the test. STA 1020 STA 1020 Ch08 - Measuring **Answers JLM (WSU) JLM (WSU) Exercise (answer) Ch08 98 / 100 Multiple choice Ch08 A psychologist says that scores on a test for “authoritarian personality” can’t be trusted because the test counts religious belief as authoritarian. The psychologist is attacking the test’s (a) validity. (b) reliability. (c) margin of error. (d) confidence level. Answer: (a) ....................................................................... In one of the first attempts to discover the speed of light, Simon Newcomb in 1882 made 66 measurements of the time light takes to travel between the Washington Monument and his laboratory on the Potomac River. Why did Newcomb repeat his measurement 66 times and the take the average of the 66 as his final result? (a) Averaging several measurements reduces any bias that is present in his instruments. (b) The average of several measurements is more reliable (less variable) than a single measurement. (c) Even if a measuring process is not valid, averaging several measurements made by this process will be valid. (d) Both (a) and (c) but not (b). Answer: (b) 99 / 100 http://www.math.wayne.edu/˜menaldi/teach/ JLM (WSU) STA 1020 17 / 17 100 / 100