PowerPoint - Wayne State University

advertisement
MAT 1000
Mathematics in Today's World
Last Time
1. What is statistics? Numbers plus
context (data).
2. The structure of data: individuals and
variables
3. Two methods to collect data:
observational studies and experiments
Last Time
Individuals: the people or objects being studied
Variables: the individuals’ characteristics or attributes
being studied. Variables can be numeric or nonnumeric.
Observational study: the researchers performing the
study merely observe the individuals.
Experiment: the researchers attempt to modify,
influence, or affect the individuals they are studying.
Last Time
Example
Every month the government calculates the
unemployment rate.
Individuals: American adults.
Variable: current job status.
This must be done with an observational study. The
researchers are not trying to change the job status of
any of the individuals in the study.
Today
1. Two types of observational study:
census and sample survey.
2. Three methods for choosing a sample—
two bad methods and one good one.
Population
The data we collect are attributes of some type of individual
(people or objects).
The collection of all of the individuals is called the population.
Example 1: if the individuals are Wayne State students, the
population is the collection of all Wayne State students.
Example 2: if the individuals are American cities, then the
population is the collection of all American cities.
Two types of observational study
Important design issue for observational
studies:
Which individuals in the population to
observe?
All, or only part?
Two types of observational study
The government could try to determine the unemployment
rate by observing all of the individuals in the population, that is
by asking every working age adult whether they are employed.
This would be incredibly expensive and time-consuming.
More than that, we can actually get a reasonably accurate
answer by only asking a small fraction of all the working age
adults.
Two types of observational study
The two types of observational study are
1. Census: the researchers try to observe all of
the individuals in the population.
2. Sample survey: the researchers only observe
certain individuals in the population.
In a sample survey, individuals selected for
observation are called the sample.
Two types of observational study
Individuals
Population
Two types of observational study
Sample
Two types of observational study
Census
Choosing a sample
When you choose a sample, the method you
use is important.
Would like a sample which represents the
whole population—but we can never be sure.
However, some sampling methods tend to
produce samples which are different from the
whole population in some important way.
Choosing a sample
Three commonly used methods for choosing a
sample are
1. Convenience samples
2. Voluntary response samples
3. Simple random samples
The first two methods are bad sampling
methods.
Bad sampling methods
Suppose a teacher wants to know whether his students are
understanding a lecture.
He could stop his lecture to ask the class questions, and wait
for volunteers to answer.
What’s the problem?
Only the students who understand will ever volunteer to
participate.
The sample of students the teacher is interacting with may not
represent the class as a whole.
Bad sampling methods
A voluntary response sample consists of those individuals who
volunteer to be in the sample.
Voluntary response samples usually fail to represent the
population as a whole.
Opinion polls often allow anyone to participate. But the people
who participate are the ones who tend to feel strongly about
an issue.
Bad sampling methods
Example
An advice columnist (Ann Landers) wanted to know how many
parents regretted having children. So she asked her readers.
She received over 10,000 responses, and 70% said they did
regret having children.
Does this sound plausible?
No. This was a voluntary response sample survey. The sample
was almost surely not representative of the population of all
parents.
Bad sampling methods
A convenience sample includes the individuals who it is easiest
to observe.
An employee of a grocery store inspects a large shipment of
oranges. If there are too many damaged fruits, the grocery
store will return the shipment.
The employee might only look at the top crates, and only select
the oranges lying at the top of those crates?
This is a convenience sample. It will almost surely not represent
the population (the whole shipment of oranges).
If there are any damaged or unacceptable oranges, they are
probably going to be at the bottom of a crate.
Bad sampling methods
Both voluntary response and convenience sampling have a
similar flaw: they typically lead to unrepresentative samples.
A method of choosing a sample is biased if it systematically
favors certain outcomes.
To understand the word “systematic” in this definition, we
need a thought experiment.
You should imagine taking a sampling method and repeating it
several times.
If the sample we collect will usually fail to represent the
population in the same way, we say the sampling method is
“biased.”
Bad sampling methods
Suppose the teacher leaves the room, and, one after another,
several different teachers come in, and ask for volunteers to
answer questions.
Each teacher may talk to a different sample of students, but
the volunteers will usually be the students who best
understand the material.
They are using a biased sampling method.
All of these teachers will end up overestimating the level of
understanding of their students (samples misrepresent the
population in the same way).
Bad sampling methods
Back at the grocery store, ten employees take turns inspecting
the same shipment of oranges.
Each one uses convenience sampling—they inspect the oranges
that are easiest for them to find (from the top of the crates).
Maybe each person inspects a different sample of oranges.
But every one of these inspectors will probably overestimate
the quality of the shipment.
The reason is that convenience sampling is biased.
Bad sampling methods
Notice that “bias” is a property of a
sampling method.
So Ann Landers’ opinion poll is a sample
survey that uses a biased sampling
method (voluntary response).
Random sampling
Is there a method of choosing a sample that will always pick a
good representation of the population?
No!
We don’t know anything about the population as a whole. So we
can never know for sure that a sample really represents the
population as a whole.
Nevertheless it is possible to choose a sample and be fairly
confident that it represents the population.
We rely on randomness.
Random sampling
By choosing individuals at random, our sample is more likely (not
guaranteed) to represent the population.
If instead of asking for volunteers, an instructor calls on students
at random, probably the students called on will give a good
representation of the class as a whole.
Of course, we might randomly choose only the students who
understand the material.
In other words, it is possible to use voluntary response sampling
or random sampling and end up picking the exact same people.
Random sampling
So what’s the advantage of a random sample?
As opposed to voluntary response, we now have a chance of
picking students who may not understand the material.
And we will see later in the course that the odds of picking an
unrepresentative sample at random are quite low.
For now, we will just look at a practical method for generating a
random sample.
Simple random sampling
The method we will discuss is called simple random sampling
(SRS).
Suppose we want to choose a sample of size n (here n is just
some natural number).
In a SRS, any group of n individuals in the population has an
equal chance of being selected as the sample.
Simple random sampling
To understand what this means, think of the example of a grocer
inspecting a shipment of oranges. Suppose he needs to pick 25
from a large shipment.
If he uses convenience sampling, say by picking only the oranges
that are at the top the crates, he could never pick a group of 25
that includes some oranges from the bottom of the crates.
If he uses SRS, any of these groups has an equal chance of being
picked. So he could randomly pick 25 that are all at the bottom
of a crate, or 25 that are all at the top. But the most likely thing
is that he will pick a mixture.
Simple random sampling
How do we pick a simple random sample?
Example
John’s small accounting firm serves 30 business clients. John
wants to interview a sample of 5 clients to find ways to improve
client satisfaction. To avoid bias, he chooses an SRS of size 5.
A-1 Plumbing
Accent publishing
Action Sport Shop
Anderson Construction
Bailey Trucking
Balloons, Inc.
Bennett Hardware
Best's Camera Shop
Blue print specialties
Central Tree Service
Classic Flowers
Computer Answers
Darlene's Dolls
Fleisch Realty
Hernandez Electronics
JL Records
Johnson Commodities
Keiser Construction
Liu's Chinese Restaurant
MagicTan
Peerless Machine
Photo Arts
River City Books
Riverside Tavern
Rustic Boutique
Satellite Services
Scotch Wash
Sewer's Center
Tire Specialties
Von's Video Store
Simple random sampling
Step 1: Label Give each client a numerical label, using as few
digits as possible. Here there are 30 clients, so we can’t use one
digit numbers. Two digit numbers will work:
01, 02, 03, …, 28, 29, 30
A-1 Plumbing
Accent publishing
Action Sport Shop
Anderson Construction
Bailey Trucking
Balloons, Inc.
Bennett Hardware
Best's Camera Shop
Blue print specialties
Central Tree Service
Classic Flowers
Computer Answers
Darlene's Dolls
Fleisch Realty
Hernandez Electronics
JL Records
Johnson Commodities
Keiser Construction
Liu's Chinese Restaurant
MagicTan
Peerless Machine
Photo Arts
River City Books
Riverside Tavern
Rustic Boutique
Satellite Services
Scotch Wash
Sewer's Center
Tire Specialties
Von's Video Store
Simple random sampling
Step 1: Label Give each client a numerical label, using as few
digits as possible. Here there are 30 clients, so we can’t use one
digit numbers. Two digit numbers will work:
01, 02, 03, …, 28, 29, 30
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
A-1 Plumbing
Accent publishing
Action Sport Shop
Anderson Construction
Bailey Trucking
Balloons, Inc.
Bennett Hardware
Best's Camera Shop
Blue print specialties
Central Tree Service
Classic Flowers
Computer Answers
Darlene's Dolls
Fleisch Realty
Hernandez Electronics
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
JL Records
Johnson Commodities
Keiser Construction
Liu's Chinese Restaurant
MagicTan
Peerless Machine
Photo Arts
River City Books
Riverside Tavern
Rustic Boutique
Satellite Services
Scotch Wash
Sewer's Center
Tire Specialties
Von's Video Store
Simple random sampling
Step 2: Generate random numbers. In practice we use a
computer to do this. For the classroom, we can use a “table of
random digits.”
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
A-1 Plumbing
Accent publishing
Action Sport Shop
Anderson Construction
Bailey Trucking
Balloons, Inc.
Bennett Hardware
Best's Camera Shop
Blue print specialties
Central Tree Service
Classic Flowers
Computer Answers
Darlene's Dolls
Fleisch Realty
Hernandez Electronics
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
JL Records
Johnson Commodities
Keiser Construction
Liu's Chinese Restaurant
MagicTan
Peerless Machine
Photo Arts
River City Books
Riverside Tavern
Rustic Boutique
Satellite Services
Scotch Wash
Sewer's Center
Tire Specialties
Von's Video Store
Chapter 2
32
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
A-1 Plumbing
Accent publishing
Action Sport Shop
Anderson Construction
Bailey Trucking
Balloons, Inc.
Bennett Hardware
Best's Camera Shop
Blue print specialties
Central Tree Service
Classic Flowers
Computer Answers
Darlene's Dolls
Fleisch Realty
Hernandez Electronics
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
JL Records
Johnson Commodities
Keiser Construction
Liu's Chinese Restaurant
MagicTan
Peerless Machine
Photo Arts
River City Books
Riverside Tavern
Rustic Boutique
Satellite Services
Scotch Wash
Sewer's Center
Tire Specialties
Von's Video Store
We will pick our sample using the following random digits:
69051
64817 87174 09517 84534 06489 87201 97245
Our labels are 2 digit numbers, so we read 2 digits at a time (ignore the gaps
in the list of digits):
69 05 16 48 17 87 17 40 95 17
We ignore all two digit groups greater than 30. This leaves
05 16 17 17 17
The clients labeled 05, 16, and 17 go into the sample (we only use 17 once)
But we need two more! Here are the next few 2 digits groups:
84 53 40 64 89 87 20 19 72 45
Disregarding the numbers greater than 30, we are left with
20 19
So clients 20 and 19 go into the sample as well.
Hence, the sample consists of clients labeled 05, 16, 17, 19, and 20:
Bailey Trucking
JL Records
Johnson Commodities
MagicTan
Liu’s Chinese Restaurant
Download