Toward Statistical Inference

Stat 226 – Introduction to Business Statistics I
Question: What is the average height of all Stat 226 students?
We have several options to answer this question:
Spring 2009
Professor: Dr. Petrutza Caragea
Section A
Tuesdays and Thursdays 9:30-10:50 a.m.
“wild guess”
collect everybody’s height and compute exact average
take a representative sample and compute sample mean
Chapter 3, Section 3.3
Let’s recall the “big picture”
out of the three options it becomes obvious that the third:
taking a representative sample and computing the sample mean
appears to be the most reasonable one. However, this option raises a new
and even more important question, namely:
How reliable is our estimate based on the sample?
Answer: depends on
the choice of the sample, i.e. in which way was the sample obtained
the sample size
(the larger the sample ⇒ the more information we have at hand ⇒
the more accurate and precise our estimate should be)
how do we obtain a representative sample?
µ is the overall mean of the population
we distinguish two types of studies in statistics:
µ is fixed value but unknown
observational studies versus experiments
µ is referred to as a population parameter
observational study: observe individuals w.r.t. a variable of interest
x̄ is the mean of the sample taken from the population
x̄ varies from sample to sample (random but we will know its value
once we collected the sample)
in a 1981 study researchers compared scholastic performance of music
students with that of non-music students at a California High School
x̄ is referred to as a sample statistic
music students had a much higher overall GPA than non-music students
a whooping 16% of music students had all A’s compared with only 5%
of the non-music students
as a result of the study music programs were expanded nationwide
a group of patients gets randomly assigned to one of two treatment
groups — new drug and standard drug
Students were simply observed, recording the choices (music
education, no music education) they made and the overall outcome
receiving the standard drug is called the control treatment
patients do not know which drug they receive to eliminate bias
if neither doctors nor patients know who is receiving which treatment,
then this study is called a double-blinded study
Observational study
In observational studies, treatments don’t get assigned to study
individuals, individuals are simply observed.
experiment: we actively impose a treatment on individuals and
observe variable of interest
Is a new drug more effective in lowering blood cholesterol level
compared to standard drugs?
Researchers tried to show an association between music, education
and grades. But the study was neither a survey, nor were students
assigned to get music education
What is wrong with concluding that music education causes good grades?
An experiment requires a random assignment of study subjects to
experiments are the only way to show cause-and-effect relationships
how to obtain a random sample
Consider the following example: You want to find out how much debt an
Iowa State student has on average
There is much more to learn about designing an experiment, but that is
beyond the scope of this class.
How should you pick a representative sample?
Keep in mind though that
take all Stat 226 students from our section
experiments can be designed well, but also really badly. Badly
designed experiments often reveal no information at all.
go to the dorms and take a random sample of 100 students
go to the library and take a random sample of 100 students
Most of the success in conducting a designed experiment results
directly from how well the pre-experimental planning was done.
sample from the Football team
e.g. mall survey
”If you had to do it over again, would you have children?”
yield very often biased responses
voluntary response sample: consists of people who chose
themselves by responding to a general appeal
e.g. NBC, CNN polls
be aware: they often over represent people with strong opinions, most
often negative opinions
yield very often biased responses
a study of exercise called for volunteers to run on a treadmill ⇒ study
concluded that “Americans are in great shape”
The advice columnist Ann Landers once asked her readers,
A few weeks later, her column was headlined
trade-off made for ease of obtaining sample is that samples are
typically not very representative of the population.
convenience sampling: the selection of units from the population is
based on easy availability and/or accessibility
Indeed 70% of the nearly 10,000 parents who wrote in said they would not
have children if they could make the choice again. These data are
worthless as indicators of opinion among all American parents. The people
who responded felt strongly enough to take the trouble to write Ann
Landers. Their letters showed that many of them were angry at their
children. These people don’t fairly represent all parents. It is not surprising
that a statistically designed opinion poll on the same issue a few months
later found that 91% of parents would have children again. Ann Landers
announced a 70% ”No” result when the truth about parents was close to
91% ”Yes.”
so how do we choose a sample?
using the table of random digits
example: using the map provided in class, choose 5 counties of Iowa
label all individuals assigning each a distinct number/label
labels have to be of the same number of digits, e.g.
Best way to obtain a representative sample is if we let chance choose the
sample from the population.
Simple Random Sample of size n
To obtain a so-called simple random sample of size n
pick a line in Table B to start, e.g. line 122
choose a sample of size n by selecting the first n labels that appear
create a list of all individuals of the population and choose n at
random, e.g. using the table of random digits (Table B)
In a simple random sample (SRS) each set of n individuals has an equal
chance of selection.
Introduction to Business Statistics I
random selection ⇒ removes bias and subjectivity
if a label/number does not match any labels in the list or if a
label/number comes up more than once ⇒ skip it
if you cannot obtain a sample of size n in one line, continue in next
line, e.g. with 123 if you started in 122
example: for a simple random sample of size n = 5 of Iowa Counties
starting at line 122 and using labels as indicated on the map (provided in
class) we obtain
Assuming that we obtained a representative sample of size n, how do we
know that the sample mean x̄ from this sample is indeed a “good”
estimate for µ?
Answer: Amazingly, averages of random samples behave in very regular
and predictable ways, so knowing how x̄− values behave in general lets us
deduce how our x̄− value is likely to behave in terms of being close to µ.
SRS are not always feasible and appropriate
e.g. you may consider so-called stratified random samples: divide
the population into strata, groups of individuals that are similar in
some way that is important to the response. Then choose a separate
SRS from each stratum and combine these SRSs to form the full
sample (more on this on page 179 textbook)
more details on this follow in Section 4.4
