MAT 1000 Mathematics in Today's World Last Time 1. What is statistics? Numbers plus context (data). 2. The structure of data: individuals and variables 3. Two methods to collect data: observational studies and experiments Last Time Individuals: the people or objects being studied Variables: the individuals’ characteristics or attributes being studied. Variables can be numeric or nonnumeric. Observational study: the researchers performing the study merely observe the individuals. Experiment: the researchers attempt to modify, influence, or affect the individuals they are studying. Last Time Example Every month the government calculates the unemployment rate. Individuals: American adults. Variable: current job status. This must be done with an observational study. The researchers are not trying to change the job status of any of the individuals in the study. Today 1. Two types of observational study: census and sample survey. 2. Three methods for choosing a sample— two bad methods and one good one. Population The data we collect are attributes of some type of individual (people or objects). The collection of all of the individuals is called the population. Example 1: if the individuals are Wayne State students, the population is the collection of all Wayne State students. Example 2: if the individuals are American cities, then the population is the collection of all American cities. Two types of observational study Important design issue for observational studies: Which individuals in the population to observe? All, or only part? Two types of observational study The government could try to determine the unemployment rate by observing all of the individuals in the population, that is by asking every working age adult whether they are employed. This would be incredibly expensive and time-consuming. More than that, we can actually get a reasonably accurate answer by only asking a small fraction of all the working age adults. Two types of observational study The two types of observational study are 1. Census: the researchers try to observe all of the individuals in the population. 2. Sample survey: the researchers only observe certain individuals in the population. In a sample survey, individuals selected for observation are called the sample. Two types of observational study Individuals Population Two types of observational study Sample Two types of observational study Census Choosing a sample When you choose a sample, the method you use is important. Would like a sample which represents the whole population—but we can never be sure. However, some sampling methods tend to produce samples which are different from the whole population in some important way. Choosing a sample Three commonly used methods for choosing a sample are 1. Convenience samples 2. Voluntary response samples 3. Simple random samples The first two methods are bad sampling methods. Bad sampling methods Suppose a teacher wants to know whether his students are understanding a lecture. He could stop his lecture to ask the class questions, and wait for volunteers to answer. What’s the problem? Only the students who understand will ever volunteer to participate. The sample of students the teacher is interacting with may not represent the class as a whole. Bad sampling methods A voluntary response sample consists of those individuals who volunteer to be in the sample. Voluntary response samples usually fail to represent the population as a whole. Opinion polls often allow anyone to participate. But the people who participate are the ones who tend to feel strongly about an issue. Bad sampling methods Example An advice columnist (Ann Landers) wanted to know how many parents regretted having children. So she asked her readers. She received over 10,000 responses, and 70% said they did regret having children. Does this sound plausible? No. This was a voluntary response sample survey. The sample was almost surely not representative of the population of all parents. Bad sampling methods A convenience sample includes the individuals who it is easiest to observe. An employee of a grocery store inspects a large shipment of oranges. If there are too many damaged fruits, the grocery store will return the shipment. The employee might only look at the top crates, and only select the oranges lying at the top of those crates? This is a convenience sample. It will almost surely not represent the population (the whole shipment of oranges). If there are any damaged or unacceptable oranges, they are probably going to be at the bottom of a crate. Bad sampling methods Both voluntary response and convenience sampling have a similar flaw: they typically lead to unrepresentative samples. A method of choosing a sample is biased if it systematically favors certain outcomes. To understand the word “systematic” in this definition, we need a thought experiment. You should imagine taking a sampling method and repeating it several times. If the sample we collect will usually fail to represent the population in the same way, we say the sampling method is “biased.” Bad sampling methods Suppose the teacher leaves the room, and, one after another, several different teachers come in, and ask for volunteers to answer questions. Each teacher may talk to a different sample of students, but the volunteers will usually be the students who best understand the material. They are using a biased sampling method. All of these teachers will end up overestimating the level of understanding of their students (samples misrepresent the population in the same way). Bad sampling methods Back at the grocery store, ten employees take turns inspecting the same shipment of oranges. Each one uses convenience sampling—they inspect the oranges that are easiest for them to find (from the top of the crates). Maybe each person inspects a different sample of oranges. But every one of these inspectors will probably overestimate the quality of the shipment. The reason is that convenience sampling is biased. Bad sampling methods Notice that “bias” is a property of a sampling method. So Ann Landers’ opinion poll is a sample survey that uses a biased sampling method (voluntary response). Random sampling Is there a method of choosing a sample that will always pick a good representation of the population? No! We don’t know anything about the population as a whole. So we can never know for sure that a sample really represents the population as a whole. Nevertheless it is possible to choose a sample and be fairly confident that it represents the population. We rely on randomness. Random sampling By choosing individuals at random, our sample is more likely (not guaranteed) to represent the population. If instead of asking for volunteers, an instructor calls on students at random, probably the students called on will give a good representation of the class as a whole. Of course, we might randomly choose only the students who understand the material. In other words, it is possible to use voluntary response sampling or random sampling and end up picking the exact same people. Random sampling So what’s the advantage of a random sample? As opposed to voluntary response, we now have a chance of picking students who may not understand the material. And we will see later in the course that the odds of picking an unrepresentative sample at random are quite low. For now, we will just look at a practical method for generating a random sample. Simple random sampling The method we will discuss is called simple random sampling (SRS). Suppose we want to choose a sample of size n (here n is just some natural number). In a SRS, any group of n individuals in the population has an equal chance of being selected as the sample. Simple random sampling To understand what this means, think of the example of a grocer inspecting a shipment of oranges. Suppose he needs to pick 25 from a large shipment. If he uses convenience sampling, say by picking only the oranges that are at the top the crates, he could never pick a group of 25 that includes some oranges from the bottom of the crates. If he uses SRS, any of these groups has an equal chance of being picked. So he could randomly pick 25 that are all at the bottom of a crate, or 25 that are all at the top. But the most likely thing is that he will pick a mixture. Simple random sampling How do we pick a simple random sample? Example John’s small accounting firm serves 30 business clients. John wants to interview a sample of 5 clients to find ways to improve client satisfaction. To avoid bias, he chooses an SRS of size 5. A-1 Plumbing Accent publishing Action Sport Shop Anderson Construction Bailey Trucking Balloons, Inc. Bennett Hardware Best's Camera Shop Blue print specialties Central Tree Service Classic Flowers Computer Answers Darlene's Dolls Fleisch Realty Hernandez Electronics JL Records Johnson Commodities Keiser Construction Liu's Chinese Restaurant MagicTan Peerless Machine Photo Arts River City Books Riverside Tavern Rustic Boutique Satellite Services Scotch Wash Sewer's Center Tire Specialties Von's Video Store Simple random sampling Step 1: Label Give each client a numerical label, using as few digits as possible. Here there are 30 clients, so we can’t use one digit numbers. Two digit numbers will work: 01, 02, 03, …, 28, 29, 30 A-1 Plumbing Accent publishing Action Sport Shop Anderson Construction Bailey Trucking Balloons, Inc. Bennett Hardware Best's Camera Shop Blue print specialties Central Tree Service Classic Flowers Computer Answers Darlene's Dolls Fleisch Realty Hernandez Electronics JL Records Johnson Commodities Keiser Construction Liu's Chinese Restaurant MagicTan Peerless Machine Photo Arts River City Books Riverside Tavern Rustic Boutique Satellite Services Scotch Wash Sewer's Center Tire Specialties Von's Video Store Simple random sampling Step 1: Label Give each client a numerical label, using as few digits as possible. Here there are 30 clients, so we can’t use one digit numbers. Two digit numbers will work: 01, 02, 03, …, 28, 29, 30 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 A-1 Plumbing Accent publishing Action Sport Shop Anderson Construction Bailey Trucking Balloons, Inc. Bennett Hardware Best's Camera Shop Blue print specialties Central Tree Service Classic Flowers Computer Answers Darlene's Dolls Fleisch Realty Hernandez Electronics 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 JL Records Johnson Commodities Keiser Construction Liu's Chinese Restaurant MagicTan Peerless Machine Photo Arts River City Books Riverside Tavern Rustic Boutique Satellite Services Scotch Wash Sewer's Center Tire Specialties Von's Video Store Simple random sampling Step 2: Generate random numbers. In practice we use a computer to do this. For the classroom, we can use a “table of random digits.” 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 A-1 Plumbing Accent publishing Action Sport Shop Anderson Construction Bailey Trucking Balloons, Inc. Bennett Hardware Best's Camera Shop Blue print specialties Central Tree Service Classic Flowers Computer Answers Darlene's Dolls Fleisch Realty Hernandez Electronics 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 JL Records Johnson Commodities Keiser Construction Liu's Chinese Restaurant MagicTan Peerless Machine Photo Arts River City Books Riverside Tavern Rustic Boutique Satellite Services Scotch Wash Sewer's Center Tire Specialties Von's Video Store Chapter 2 32 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 A-1 Plumbing Accent publishing Action Sport Shop Anderson Construction Bailey Trucking Balloons, Inc. Bennett Hardware Best's Camera Shop Blue print specialties Central Tree Service Classic Flowers Computer Answers Darlene's Dolls Fleisch Realty Hernandez Electronics 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 JL Records Johnson Commodities Keiser Construction Liu's Chinese Restaurant MagicTan Peerless Machine Photo Arts River City Books Riverside Tavern Rustic Boutique Satellite Services Scotch Wash Sewer's Center Tire Specialties Von's Video Store We will pick our sample using the following random digits: 69051 64817 87174 09517 84534 06489 87201 97245 Our labels are 2 digit numbers, so we read 2 digits at a time (ignore the gaps in the list of digits): 69 05 16 48 17 87 17 40 95 17 We ignore all two digit groups greater than 30. This leaves 05 16 17 17 17 The clients labeled 05, 16, and 17 go into the sample (we only use 17 once) But we need two more! Here are the next few 2 digits groups: 84 53 40 64 89 87 20 19 72 45 Disregarding the numbers greater than 30, we are left with 20 19 So clients 20 and 19 go into the sample as well. Hence, the sample consists of clients labeled 05, 16, 17, 19, and 20: Bailey Trucking JL Records Johnson Commodities MagicTan Liu’s Chinese Restaurant