DobbinLectureChapter..

advertisement
Chapter 3 & Section 2.6
Dates: August 24, 26, 31, 2009
Course outline
Statistics is the science of collecting, organizing, and interpreting
information, which we call data, with the goal of gaining an understanding
from that data.
This is NOT a math class. This is a critical thinking class. My goal is to
give you some statistics tools and principals that will help you make wise
and educated decisions at work and in life.
This course is divided into 2 parts:
1. Gathering and working with data (graphing, summarizing, designing
studies to gather data).
2. Establishing relationships and drawing conclusions from the data
(statistical inference).
Gathering of data:
How do you get data?
First, decide what your population is.
The Population is the entire group of individuals/units that we want
information about.
In a census you attempt to get information from every member of the
population, ie, a 100% sample.
Populations are usually very large and a 100% sample would be expensive
and time consuming, so we must take a sample instead. However the
method of sampling is critical to getting good unbiased information.
A sample is a part of the population that we actually examine in order to
gather information about the whole population.
Lecture 1, Chapter 3 & Section 2.6
Page 1
SAMPLING, 3.2
Design of a sample: the method used to choose the sample from the
population. There are several types of samples. One that you encounter
often is a Voluntary Response Sample.
 consists of people who choose themselves by responding to a general
appeal.
 biased because people with strong opinions (especially negative
opinions) are most likely to respond.
 examples include radio or call-in shows, American Idol
A voluntary response sample is not the best type of sample because the data
obtained will probably not be representative of the entire population, and
will probably be biased.
Random Selection of a Sample:
 eliminates or minimizes bias by allowing impersonal chance to do the
choosing of individuals for the sample.
 gives all units/individuals in the population an equal chance to be
chosen
Types of Random Sampling:
Simple Random Sample (SRS): consists of n individuals selected from the
population in such a way that every sample of n individuals has an equal
chance of being selected. This also gives each individual in the population
an equal chance of being selected.
Stratified Random Sample: is obtained in steps. First divide the
population into groups of similar units/individuals, called strata. Then a
SRS is selected within each stratum. Then the samples are combined to
form the full sample.
Multistage Sample: A method in which the sampling is done in stages,
selecting successively smaller groups within the population in stages,
resulting in a sample consisting of clusters of individuals. Each stage may
employ an SRS, a stratified random sample, or another type of sample. This
is an effort to make sure you are not under-covering any groups when you
choose your sample. The example on page 252 of your textbook shows how
an opinion poll uses multistage sampling by first dividing the United States
into 2007 geographical areas, selecting a portion of these, then dividing the
selected geographical areas into smaller areas, selecting a portion of these
from each, and then finally dividing the smaller areas into neighborhoods of
Lecture 1, Chapter 3 & Section 2.6
Page 2
four nearby units and randomly selecting the neighborhoods to cover with
the opinion poll.
Capture-recapture sample: This type of sampling is done to estimate the
size of the population in wildlife studies. The “capture” phase refers to
capturing, tagging and releasing a certain number of birds for example.
The “recapture” phase takes place later when another sample of birds is
caught and a count is taken of the number which have tags from the first
capture.
As an example, suppose 200 birds are captured, tagged and released.
The next year another sample of 120 birds are captured and 12 of them have
the bands from the previous year.
Then we say that the proportion banded in the sample should be an estimator
of the proportion banded in the population. So:
12/120 = 200 / N
where N = the population.
N = 2000 (estimated value)
Examples:
Which type of sample is used for each of the following scenarios?
1. A study is conducted to find out how many undergraduates at Purdue
own cars. It is known prior to the study that seniors are more likely
to own cars than freshmen. The student population at Purdue is
divided into freshmen, sophomores, juniors, and seniors and a
random sample of 200 students is selected from each group.
2. The government wanted to gather some information on
unemployment. They randomly selected 5 of the 50 states. From the
5 selected states they randomly selected 3 counties to participate in
the study. They then randomly selected 10 individuals from each of
the counties to fill out their questionnaire.
Lecture 1, Chapter 3 & Section 2.6
Page 3
3. Anne Landers asked people to send to her a response to the following
question. “Do you have children? If so, would you still have
children knowing what you know now?”
4. Ashley wanted to determine the average height of Purdue women
students. She did not have the time to measure all Purdue women
student’s height so she randomly selected 50 Purdue women students
and measured each student’s height and averaged the 50 heights.
How do you select the units in the sample? You can use SPSS or the
random number table in the back of the book (Table B). Which way is
MORE random? Both methods are equally random.
Example:
A club has 12 members. They are:
Gundlach
Remke
Howell
Brenneman
Xu
Reeger
Cline
Mehta
Tuzov
Daye
Zheng
Kuiper
 Use the random number table (Table B) starting at line 130 to take a SRS
of 4 members.
Table B starting at line 130:
69051 64817 87174 09517 84534 06489 87201 97245
05007 16632 81194 14873 04197 85576 45195 96565
68732 55259 84292 08796 43165 93739 31685 97150
45740 41807 65561 33302 07051 93623 18132 09547
Strategy: Give all of our names numbers in order from 01 through 12. Then
look at our randomized numbers from Table B. Draw a line under every 2
digits. The 1st 4 unique (not repeated) 2-digit combinations which are
between 01 and 12 are your sample.
Lecture 1, Chapter 3 & Section 2.6
Page 4

Use SPSS to take a SRS of 4 members.
Enter all the names in one column. Click on the column and then click
Data Select Cases Random sample of cases Sample Exactly 4
cases from the first 12 cases Continue OK. You will see a “1” by
exactly 4 of the 12 names. These are the selected members for your sample.
All the other 8 names will have a 0, meaning they are not selected for the
trip. From the Data Editor page:
Gundlach
Remke
Howell
Brenneman
Xu
Reeger
Cline
Mehta
Tuzov
Daye
Zheng
Kuiper
1
0
0
1
1
0
0
0
1
0
0
0
Problems with sampling:
 Undercoverage: occurs when some groups in the population are left
out of the process of choosing the sample
Lecture 1, Chapter 3 & Section 2.6
Page 5
 Nonresponse: occurs when an individual chosen for the sample can’t
be contacted or does not cooperate
 Response Bias: occurs when the behavior of the respondent or
interviewer changes the sample results, (examples:

the respondent lying or having a faulty memory,

the race or sex of the interviewer influencing the respondent,

poor interviewing technique, wording of questions)
To see how good a survey actually is, you should look for:
 Sampling design
 Wording of questions posed
 Amount of non-response
 Date of the survey
Examples: (Problem 3.55, p. 260) Comment on each of the following as a
sample design or a potential sample survey question. Is there any source of
bias? What type of bias?
a)
A survey used the following question: “Do you agree that a
national system of health insurance should be favored because
it would provide health insurance for everyone and reduce
administrative costs?”
b)
Alex wanted to find out people’s opinions regarding Greater
Lafayette Health Services’ desire to build a new hospital.
Consequently, he took a simple random sample of 500
Lafayette and West Lafayette residents listed in the phone book.
He is concerned however that those not listed in the phone book
may have different views.
Lecture 1, Chapter 3 & Section 2.6
Page 6
c)
When Alex attempted to collect data from those who made it
into his sample, he was unable to contact some of them and
others refused to answer his survey questions.
SOURCES OF DATA:
Anecdotal evidence, which consists of data based on individual cases,
which often come to our attention because they are striking in some way.
(“News of the Weird” or a “Dateline” lead story) These cases will probably
not be representative of the population. The sample size is small, perhaps
only a single case. Do not draw conclusions from anecdotal evidence. It is
NOT good science!
Available Data are data that were produced in the past for some other
purpose but that may help answer a present question and, many times, is
quite good and useful data. Examples: Libraries, Internet websites.
An observational study. An observational study observes units or
individuals, usually a sample of all units, and measures variables of interest
but does not attempt to influence the responses. We let nature take its course
and observe the response. We do not manipulate the units in any way.
A designed experiment is a procedure in which you deliberately impose
some treatment on individual in order to measure their responses. In an
experiment, we are always interested in the influence of one or more
variables or factors on the response. We always impose some type of
treatment on the individuals.
Experiment versus an Observational Study
 In an experiment a treatment is imposed on the individuals before a
measure is taken. The environment is manipulated in some way. If
the experiment is carefully designed and all potential lurking variables
are accounted for and controlled, conclusions regarding causation can
be made. In some situations it is good to conduct more than one
experiment before making any decisions regarding causation.
 In an observational study, the environmental factors are not
controlled or manipulated. A measurement is taken without a
treatment being imposed on the individual. Possible lurking variables
Lecture 1, Chapter 3 & Section 2.6
Page 7
may exist. Consequently, numerous surveys need to be conducted to
draw conclusions regarding causation.
Examples:
Which of the following is an experiment and which is an observational
study?
1. To determine whether a review session will improve his students test
scores, a stat 301 instructor divides his class into two groups. He then
requires one group to attend a study session. He compares the test
results of each group.
2. To determine whether a review session will improve his students test
scores, a stat 301 instructor announces a study session to be held the
night before a test. The instructor lists the students who attended the
session and compares their scores to the remaining stat 301 student’s
scores.
Lecture 1, Chapter 3 & Section 2.6
Page 8
Design of Experiments, 3.1
Again, experiments deliberately impose some treatment on individuals in
order to observe their response.
Vocabulary:
An experimental unit is the individual or unit on which the experiment is
done. Often, these units are chosen randomly from the population of units.
When the units are human beings, they are called subjects.
A treatment is a specific experimental condition applied to the units.
Factors are the explanatory variable(s) under study. A factor level is a
specific value of a factor.
The response variable is what is being measured on each unit/subject.
Examples:
Identify the experimental units or subjects, treatments, factors, factor
levels and response variable.
1. In a Food technology study involving the storage of frozen
strawberries, 10 pints were stored at each of 5 storage times. Storage
times were randomly assigned to the pints. The amount of ascorbic
acid content for each pint was measured after storage.





Experimental units:
Factor:
Factor levels:
Treatments:
Response Variable:
Pints of strawberries
Storage time
Five different storage times
Each of the five different storage times.
Ascorbic acid measured after storage.
2. A sports engineer is interested in determining the effects that speed
and air pressure have on throwing distance for his new mechanical
trainer football throwing machine. Two speeds (40 mph and 55 mph)
and three air pressures (175 psi, 200 psi, 230 psi) where chosen for the
study. Thirty footballs were obtained. Treatments were randomly
assigned to footballs.
 Experimental units: Thirty footballs
 Factors:
Speed and air pressure
Lecture 1, Chapter 3 & Section 2.6
Page 9
 Factor levels:
Speed at 40 or 55 mph, Pressure at 175, 200 or
230 psi
 Treatments:
Six different combinations of speed and
pressure. Five trials in each treatment group.
 Response Variable: Throwing distance for each of 30 trials.
Advantages of Experiments:
 In principle, experiments can give good evidence for causation.
 Experiments allow us to study the specific factors we are interested in,
while controlling the effects of lurking variables.
 Experiments allow us to study the combined effects of several factors,
and possibly detect interactions between factors.
Difficulties which may arise:
 In the simplest designed experiment we would apply a single
treatment and observe the response. This is ok in very controlled
situations, but you may miss lurking variables, especially if you are
using living subjects
 May encounter the placebo effect: a patient responds favorably to
being treated, not to the treatment itself (your mind tricks you into
getting better even though the medicine has no effect) A control group
helps to determine whether a treatment is effective.
 Bias: the study systematically favors certain outcomes.
 Lack of realism: if the subjects know they’re in an experiment, they
might not behave naturally during the treatment.
How can we make an experiment objective and fair? (The 3 principles
of experimental design.)
 To help detect placebo effect, use a control group: group of patients
who receives a sham treatment (sugar pills instead of the medicine).
 Double-blind is best because then neither the subject nor the
experimenter knows whether they are in the treatment or control
group until the experiment is completely finished. (This avoids
unconscious bias by the experimenter.)
 Randomization: Leave the assignment of the individuals to the
treatment groups solely to chance. Do not rely on the judgment of the
Lecture 1, Chapter 3 & Section 2.6
Page 10
experimenter in any way. This reduces or eliminates bias in the
formation of the treatment groups.
 Replication: Use as many individuals in each treatment group as
your experimental budget will permit. This reduces the chance
variation in the average response for each treatment.
What are our choices for type of experiment?
Completely randomized design: In this plan or method of randomization,
the individuals are randomly assigned to the treatment groups without
restriction.
Group 1
Treatment 1
Random
Assignment
Measure
Results
Group 2
Treatment 2
Block design: In a block design, the random assignment of the units to the
treatments is carried out separately within each block, where a block
represents a group of units that are known, before the experiment, to be
similar in some way that will affect the response to the treatments.
(blocks can be of any size).
Group 1
Block 1
Treatment 1
Random
Assignment
Measure
Results
Group 2
Treatment 2
Group 1
Treatment 1
Subjects
Block 2
Random
Assignment
Measure
Results
Group 2
Lecture 1, Chapter 3 & Section 2.6
Page 11
Treatment 2
Matched pair design: A matched pairs design compares just two
treatments. We impose the two treatments on a pair of subjects/units. If we
don’t have perfectly matched pairs, we choose blocks of two units that are as
closely matched as possible. OR, each block may consist of just one subject,
who gets both treatments, one after the other. We randomly assign the
different treatments to each unit in the matched pair or, in the case of a
single subject/unit acting as a pair, we randomly assign the order of
treatments.
Example 1:
One unit
Trt 1
Within each pair
random
assignment
compare difference
other unit
Trt 2
………
Example 2
unit 1
Trt 1
Trt 2
Measure
difference
Trt 2
Trt 1
Measure
difference
Random
treatment order
unit 2
……..
……..
Example
Our 12 club members need to learn a new SPSS technique. An ITaP
computer trainer thinks playing classical music in the background helps
people to retain information better. Another ITaP computer trainer believes
drinking coffee while training helps. Their boss decides to design an
experiment to test out their theories. He will divide the club members into 4
groups and then give them a multiple choice test about the new SPSS
technique.
a)
What are the factors and their levels?
Classic music during training: Yes or No
Coffee during training:
Yes or No
Lecture 1, Chapter 3 & Section 2.6
Page 12
b)
What are the treatments?
The four combinations of the two factors at two levels each.
Three subjects will be assigned to each treatment.
c)
What are the units/subjects?
The 12 club members taking computer training.
d)
What is the response variable?
The test result of each individual after training.
e)
Is the response variable categorical or quantitative?
Quantitative probably.
f)
Outline the design of the experiment. What type is it?
Completely randomized design.
g)
Use Table B at line 133 to randomly assign the members to
the treatments.
Gundlach
Remke
Howell
Brenneman
Xu
Reeger
Cline
Mehta
Tuzov
Daye
Zheng
Kuiper
Table B starting at line 133:
45740 41807 65561 33302 07051 93623 18132 09547
27816 78416 18329 21337 35213 37741 04312 68508
66925 55658 39100 78458 11206 19876 87151 31260
08421 44753 77377 28744 75592 08563 79140 92454
Strategy: Same as choosing a simple random sample (SRS) except we need
to keep going until we have 3 members in each of the 4 treatment groups.
Lecture 1, Chapter 3 & Section 2.6
Page 13
The 1st 3 2-digit combinations that are between 01 and 12 are written down
under “Group 1.” The 2nd 3 2-digit combinations that are between 01 and
12 (and not repeats) are written down under “Group 2.” Do the same thing
for Group 3. Match names to the numbers. The remaining 3 names form
Group 4.
Group 1
Group 2
Group 3
Group 4
SPSS will do this easily if you want to separate your data into just 2 groups
and will also do this if you want 3 or more groups in a step-by-step way, but
the more groups you need the more complicated things get with SPSS.
Example 2:
Twelve overweight females have agreed to participate in a study of the
effectiveness of four reducing regimens, A, B, C, and D. The researcher
first calculates how overweight each subject is by comparing the subject’s
actual weight with her “ideal” weight.
The response variable is the weight lost after eight weeks of treatment.
For this problem, we believe the initial amount overweight will influence the
response variable, so a block design is appropriate for this study.
Lecture 1, Chapter 3 & Section 2.6
Page 14
Arrange the subjects in order of increasing excess weight. Form three
blocks by grouping the four least overweight, then the next four, and so on.
Following are the subjects and their initial amount overweight:
Birnbaum
Brown
Brunk
Dixon
Moses
Ram
35
23
34
21
41
26
Hernandez
Jackson
Tran
Loren
Smith
Brennan
25
33
43
32
38
44
After forming the three blocks, use the random numbers below to
assign each of subjects to the four reducing regiments separately
within each block.
19224 95034 05756 28713 96409 12531 42544 82853
73676 46150 30568 35098
Ethics in experiments: Section 2.6
The following three principles must be used when experiments involves
human beings:
 Planned studies should be reviewed by a board to protect the subjects from
harm.
 All subjects must give their informed consent before data are collected.
 All individual data must be kept confidential. Only summaries can be made
public.
Bad examples (which principles were violated) :
Lecture 1, Chapter 3 & Section 2.6
Page 15
Tuskegee Study (Quotation from the Report of the Tuskegee Syphilis Study Legacy
Committee, May 20, 1996. A detailed history is James H. Jones, Bad Blood: The
Tuskegee Syphilis Experiment, Free Press, 1993.)
In 1930, syphilis was common among black men in the rural South, a group that had
almost no access to medical care. The Public Health Service Tuskegee study recruited
399 poor black share croppers with syphilis and 201 others without the disease in order to
observe how syphilis progressed when no treatment was given. Beginning in 1943,
penicillin became available to treat syphilis. The study subjects were not treated. In fact,
the Public Health Service prevented any treatment until work leaked out and forced an
end to the study in 1970s.
Personal Space Study (R. D. Middlemest, E. S. Knowles, and C. F. Matter, “Personal
space invasions in the lavatory: suggestive evidence for arousal, “Journal of Personality
and Social Psychology, 33 (1976), pp 541-546.)
Psychologists observe that people have a “personal space” and get annoyed if others
come too close to them. We don’t like strangers to sit at our table in a coffee shop if
other tables are available, and we see people move apart in elevators if there is room to
do so. Americans tend to require more personal space than people in most other cultures.
Can violations of personal space have physical as well as emotional effects?
Investigators set up shop in a men’s public rest room. They block off urinals to
force men walking in to use either a urinal next to an experimenter (treatment group) or a
urinal separate from the experimenter (control group). Another experimenter, using a
periscope from a toilet stall, measured how long the subject took to start urinating and
how long he kept at it.
Tracking Americans Cradle-to-Grave
by Katherine Haley Will (president of Gettysburg College) in the J&C 7/26/06
Does the federal government need to know whether you aced Aristotelian ethics but had
to repeat introductory biology? Does it need to know your family’s financial profile, how
much aid you received and whether you took a semester to help out at home?
The Secretary of Education’s Commission on the Future of Higher Education thinks so.
… the commission called for creation of a tracking system to collect sensitive
information about our nation’s college students…It is a mandatory federal registry of all
American students throughout their collegiate careers—every course, every step, every
misstep. Once established, it could easily be linked to existing K-12 and work force
databases to create an unprecedented cradle-to-grave tracking of American citizens, all
under the watchful eye of the federal government.
Lecture 1, Chapter 3 & Section 2.6
Page 16
The commission calls our nation’s colleges and universities unaccountable, inefficient,
and inaccessible. In response, it seeks to institute collection of personal information
designed to quantify our students’ performance in college and in the workforce.
But many of us are concerned about invading our students’ privacy by feeding
confidential educational and personal data, linked to Social Security numbers, into a
mandatory national database…
We already have efficient systems in place to collect educational statistics…Our existing
systems meet the government’s need to inform public policy without intruding on student
privacy because they report the data in aggregate form [gathered altogether instead of as
individual reports]…
This proposal is a violation of the right to privacy that Americans hold dear. It is against
the law. Moreover, there is a mountain of data already out there that can help us
understand higher education and its efficacy. And, finally, implementation of such a
database, which at its inception would hold “unit” record data on 17 million students,
would be an unfounded mandate on institutions and add greatly to the expense of
education.
DATA ANALYSIS AFTER THE EXPERIMENT:
Now that you have carried out your experiment and you have the data from
your experiment, how do you know if a treatment is effective?
The response variable will be averaged for each group, and the averages of
all the treatments will be compared. Large differences in the treatment
averages indicate that the treatments had an effect.
An observed effect so large that it would rarely occur by chance alone is
called statistically significant. This can be determined statistically.
Statistical Inference: use a fact about a sample to estimate the truth about
the whole population.
Statistics vs Parameters:
Suppose we have 48 members in our class and we wanted to find the
average height of these students. Because we may not have time to measure
all the students we may take a SRS of size 5 and measure the heights of the
Lecture 1, Chapter 3 & Section 2.6
Page 17
five students in the sample. We could then calculate the sample mean
(average) for the five students which would be a statistic. We would then
use the statistic as an estimate of the height for all 48 members of our class.
The true average height of all 48 members of our class would be a
parameter.
A parameter is a number that describes the population. A parameter is a
fixed number, but in practice we do not know its value.
A statistic is a number that describes a sample. The value of a statistic is
known when we have taken a sample, but it can change from sample to
sample. We often use a statistic to estimate an unknown parameter.
Sampling variability represents the variation associated with the value of
the statistic that are generated by repeatedly selecting samples of the same
size.
In our average height example, suppose we selected all possible simple
random samples of size 5 from our population of 48 members of the class.
(Note: That would be 1,712,304 samples). We could then calculate the
1,712,302 sample means. If we constructed a relative frequency table for all
these sample means, the corresponding relative frequency histogram is
called the sampling distribution of the sample mean.
The sampling distribution of a statistic is the distribution of values taken
by the statistic over all possible samples of equal size selected from the same
population.
Properties of Sampling distribution
1. The sampling distribution of a statistic can be generated by repeatedly
sampling from the population, calculating the statistic and tabulating the
values obtained.
2. All statistics have sampling distributions.
3. Sampling distributions are fundamental to statistical inference because
the sampling distribution describes a regular, predictable pattern of
behavior that emerges with repeated sampling.
4. Sampling distributions provide us with information regarding the
accuracy and precision of our statistic as an estimator of the
corresponding parameter.
Lecture 1, Chapter 3 & Section 2.6
Page 18
 The accuracy of the statistic as an estimator of the corresponding
parameter is related to the center of the sampling distribution,
vs. the true value of the parameter of the population.
 The precision of the statistic as an estimator of the corresponding
parameter is related to the spread of the sampling distribution.
Bias and Variability
A statistic used to estimate a parameter is unbiased if the mean of the
sampling distribution of the statistic is equal to the parameter it is estimating.
We then say the statistic is an unbiased estimator of the parameter.
To reduce bias, use random sampling.
The variability of a statistic is described by the spread of its sampling
distribution. The less variability in the value of the statistic, the more
precise the statistic is as an estimator of the paramater. Sampling variability
is controlled by:
1. the sampling design used to generate the sample.
2. the sample size.
To reduce the variability of a statistic from an SRS, use a larger sample.
Population Size Doesn’t Matter
The variability of a statistic from a random sample does not depend on the
size of the population, as long as the population is at least 100 times larger
than the sample.
Example: (Problem 3.58, p. 269) Voter registration records show that 68%
of all voters in Indianapolis are registered as Republicans. To test a random
digit dialing device, you use the device to call 150 randomly chosen
residential telephones in Indianapolis. Of the registered voters contacted,
73% are registered Republicans. Are the boldface numbers parameters or
statistics?
Lecture 1, Chapter 3 & Section 2.6
Page 19
Example: (Problem 3.60, p. 269) A telemarketing firm in L.A. uses a
device that dials residential telephone numbers in that city at random. Of the
first 100 numbers dialed, 43 are unlisted. This is not surprising, because
52% of all L.A. residential phones are unlisted. Are the boldface numbers
parameters or statistics?
The Question of Causation
When we talked about observational studies vs. experiments, we said that an
observational study can’t give good evidence towards causation. Welldesigned experiments can help you with this.
In experiments, you usually try to prove that the explanatory variable causes
change in the response variable. You might have a strong association, but
how do you prove causation?
If we go back to our sleep/weight gain story in the Journal and Courier, one
experiment in the story to support an association between sleep and weight
gain was given as follows:
“In the study conducted by Dr. Shahrad Taheri and colleagues at Stanford
University and the University of Wisconsin-Madison, the scientists
examined the data from 1,024 volunteers in a long-term sleep study
conducted at the Wisconsin campus. They examined the sleep logs kept by
the subjects as well as the duration of their sleep during nights spent at a
sleep lab. Analyzing blood samples taken from the subjects, the researchers
found a clear pattern. Those who slept the least had the most ghrelin and the
least leptin, and for those who slept the longest, vice versa. The scientists
also found that the subjects with the least sleep had a larger body mass
index.”
Causation:
 x and y are associated
 x causes y to change
Lecture 1, Chapter 3 & Section 2.6
Page 20
Example: The amount of sleep a person gets in a night directly causes
changes in hormone levels. OR A person’s hormone levels directly effect
the number of hours of sleep a person gets per night.
Common response:
 x and y are associated
 z is really what causes both x and y to change
Example: The amount of daily exercise a person gets affects both sleep time
and hormone levels.
Confounding:
 x is associated with y
 x is associated with z
 x and z both have effects on y
 it is impossible to separate which affects are from x alone or z alone
 x and z can be either explanatory or lurking variables
Example: The amount of daily exercise and the number of hours of sleep a
person gets BOTH affect hormone levels.
Even a very strong association between 2 variables is not by itself good
evidence that there is a cause-and-effect link between the variables.
Association does not mean causation.
Lecture 1, Chapter 3 & Section 2.6
Page 21
Example: Just because having gray haired people die at a higher rate than
people with other hair colors doesn’t mean that the gray hair itself causes
death.
Even when direct causation is present, it is rarely a complete explanation of
an association between 2 variables.
Example: Do you think sleep is the only thing that determines whether you
have a higher body mass index?
Even well-established causal relations may not generalize to other settings.
Example: Medicine that works for dogs might not work for people or cats.
Doing well on the homework in this class may help you do better on the
exams, but that might not be true in every class you take.
Big Overview of how to answer a research question:
1.
Pick a question your want to answer.
2.
Decide on your population.
3.
Select a sample.
 voluntary response (the only one not random)
 simple random sample
 stratified random sample
 multistage sample
4.
Observational study or experiment? If experiment, the choices are:
 completely randomized design
 block design
Lecture 1, Chapter 3 & Section 2.6
Page 22
 matched pairs
5.
Collect the data. Make sure you follow ethical principles of
experimentation.
6.
Analyze the data. Don’t forget to look at graphs.
7.
State your conclusions.
Lecture 1, Chapter 3 & Section 2.6
Page 23
Download