Research design confidence and can be supported under scrutiny.

advertisement
Chapter 3
Obtaining Useful Evidence
Research design is that portion of the statistical process in
which planning is done so that the conclusions are drawn with
confidence and can be supported under scrutiny.
The research design process begins with the research
question(s). Consider the following three questions.
1. Do people believe government money should be spent to
build sports arenas for professional team?
2. Is there a difference in the price home day cares charge
and childcare centers charge?
3. Does removing invasive plants in spring lead to a
reduction in the percent of invasive plants in the summer?
The first question illustrates research that would be
appropriate for finding out something about a population or testing
a hypothesis about a population. A survey could be used to get
public opinion.
The second question compares groups based on a
characteristic of the group, in this case type of daycare. The
objective is to show if there is a difference between the groups
with regard to cost.
The third question seeks to show if an intervention will have
an effect on some outcome. The goal is to establish a cause and
effect relationship since such a relationship could lead to better
decision-making.
Showing causation (a cause and effect relationship) is a
highly desirable outcome for many research questions.
In a causal relationship, a treatment produces a particular
outcome while not providing the treatment means that particular
outcome is not produced. Thus, simply showing that a certain
response occurred when a treatment was provided does not prove
the treatment caused the response.
A cause and effect relationship implies the existence of two
variables. One of those variables must happen before the other.
independent or explanatory variable
dependent or response variable
The values that the explanatory variable can have are called
levels.
Consider education research. How can we maximize results?
To answer this question, we would need to know how to measure
results – the response variable.
We will also need to have a theory of what variable needs to be
changed to impact the response variable – e.g. what will be the
explanatory variable.
To decide on an explanatory variable, generate a list of possible
variables that might affect the response variable. These will be
characterized by students, teachers, school.
Students
Teachers
School
Pick some of these to demonstrate the concept of levels
Discuss confounding or latent variables
Consider two educational theories
Theory 1: Students who are more successful in a prerequisite class
will be more successful in the current class. (e.g. success in Math
98 = success in Math 146)
Theory 2: Students will be more successful with teachers who
give regular unannounced quizzes (e.g. weekly or daily) than with
teachers who don’t give quizzes.
What are possible response variables for these two theories?
What are the explanatory variables for these theories?
Proving causation is challenging. Some research designs can
show causation if properly done while others cannot show
causation.
Research Designs
Observational study
The goal is to understand the population or differences
between the populations.
Units are randomly selected.
Cannot show causation
Observational experiment
The goal is to determine the effect of an interaction
No random assignment is possible.
Might be able to show causation
Manipulative experiment
The goal is to determine the effect of an interaction
Random assignment of units to groups facilitates
showing causation
One of the goals of all research is to avoid Bias – The systematic
prejudice in one direction.
Approval for research: Institutional Review Board (IRB).
.
Observational Studies
The goal of observational studies is to understand certain
characteristics about a population or to test a hypothesis about
those characteristics.
Census
Random Selection
One population or comparison between multiple populations
(grouped based on a characteristic they possess that cannot be
changed either because change is impossible or unethical. For
example, it is not possible to change a person’s race or age group
or income level. It would be unethical to change their use of drugs
or smoking.)
Surveys
The construction of surveys is a more difficult process than it
may first appear if the goal is to get unbiased data.
1. Leading questions:
A. Do you believe the minimum wage should be raised so
that a person working full time is not still living below the poverty
line?
B. Do you believe the minimum wage should be raised, kept
the same, or lowered?
2. Sequence of questions:
A. With which political party do you most closely associate?
(Republican, Democrat, Tea Party, Libertarian, Socialist, Green
Party, Independent)
Do you believe that humans are causing the climate to
change?
B. Do you believe that humans are causing the climate to
change?
With which political party do you most closely associate?
(Republican, Democrat, Tea Party, Libertarian, Socialist, Green
Party, Independent)
Experiments
While the goal of an observational study is to understand
characteristics of a population, the goal of an experiment is to
assess the effect of an impact or treatment.
To show causation, an experiment requires a comparison
between those receiving the treatment and those not receiving the
treatment. The latter group is called a control group.
The term control is used in two different ways in experiment
designs. One use of control is in the term “control group” which is
a group that does not receive the treatment. The second use of the
word control is that an experiment can be controlled for a latent
variable by making sure that people or units with that variable are
randomly assigned to different explanatory variable levels.
The concept of random assignment is critical for
experiments. Because of potential latent variables that could affect
the response variable, experimental units are randomly assigned to
different groups.
Manipulative experiments are the most effective design
method for establishing causation. In a manipulative experiment,
the researcher controls the level of the explanatory variable. This
is done through the random assignment of experimental units to the
different levels of the explanatory variable. A related experiment,
called a quasi experiment, is done when groups of subjects are
assigned a level of treatment.
A well-designed experiment should have both internal and
external validity.
Internal validity means that the design is strong enough to
show a cause-and-effect relationship.
External validity means that similar results are obtained
when the treatment is applied to the entire population as was
achieved in the experiment itself.1
Human Experiments
Medical experiment on humans can be more complicated
than other types of experiments because of the mind/body
interactions.
Factors that affect improvement
Subject
Severity of problem
Natural history of the problem (where in the cycle of symptom
variation is the subject at the time of treatment)
Ability to accurately quantify the state of the condition (e.g. pain
1-10)
Believe the treatment will help (psychosomatic effect)
Desire to please the doctor
Placebo effect
Researcher
Implication of the success or failure of the treatment (financial,
reputation)
Funding source for research
1
http://www.socialresearchmethods.net/kb/external.php
Appropriate assessment of patient’s condition
Effectiveness of laboratory research quality
Knowledge of which subjects receive the treatment
Medical treatments are often Manipulative Experiments. The
response variable is some measure of improvement (e.g. reduced
pain or symptoms). The explanatory variable is whether a person
receives the medical treatment. To account for potential placebo
or psychosomatic effects, the researchers randomly assign some
subject to the placebo level and others to the treatment level.
Placebo and psychosomatic effects
Double blind experiments
Bias from breaking blind (80% know which group they are in)
Active Placebo
Quality assurance/quality control (QA/QC) is important for the
research protocol and for the laboratory where tests will be done.
Observational experiments (also known as ex post facto studies)
The objective is similar to that of a manipulative experiment
in that the purpose is to determine the effect of an intervention.
However, in these experiments, the researcher cannot randomly
assign subject to different groups but can only measure the level of
the explanatory variable and record the response variable, thus
gathering the data as if it were an observational study.
1. Does creating a marine reserve improve the abundance and
varieties of fish?
2. Does flooding farmland fields in winter to create habitat for
migratory birds improve the farmland quality?
(http://mynorthwest.com/11/321774/Farmers-find-flooded-fieldscan-help-birds-crops).
3. Does installing a camera at an intersection reduce the number of
people who run a red light?
Before-after-control-impact or BACI experiment design.
One way of establishing causation with experiments of this
magnitude is with a before-after-control-impact design (BACI).
For example, in Seattle, they are attempting to drill a tunnel under
the city to replace the Alaskan Way Viaduct. To determine the
impact on major roads through the city, a BACI study could be
constructed in which the traffic on a major alternative to the tunnel
could be monitored before and after the opening of the tunnel.
During the same time period, traffic on main north-south roads
above and below the tunnel could be monitored as a control group.
Significance and Causation
Showing an effect and proving causation is not the same
thing. Observational experiments are not effective for proving
causation. While it is necessary to show an effect in order to
establish causation, it is not sufficient. Causation means that
changes in the explanatory factor cause changes in the response
variable.
Epidemiologists have three criteria they use to establish causation
with observational experiments.
1. The effect exists in many different times and places,
reducing the chance of confounding.
2. A logical or scientific explanation exists for why the
explanatory variable causes changes in the response variable.
3. A different explanation does not exist.2
2
Aliaga, Martha, and Brenda Gunderson. Interactive Statistics. Upper Saddle River, NJ: Pearson Prentice
Hall, 2006. Print.
For each new project, the initial research-design efforts can
be confusing. Clarity begins with asking the right specific
questions and is further enhanced by determining the appropriate
response variable, type of research design, etc. In this text we will
use a research design table to help organize our thoughts. When
options are presented, circle the appropriate choice. Think of the
potential latent variables yourself; they are not usually included in
the story.
Research Design Table
Research Question:
Observational Study
Observational Experiment
Manipulative Experiment
Type of Research
What is the response variable?
What is the parameter that will be
calculated?
Mean
List potential latent variables.
Grouping/explanatory Variables 1 (if
present)
Levels:
Grouping/explanatory Variables 2 (if
present)
Levels:
Proportion
Correlation
Some of the examples below contain underlined words,
others do not. The purpose of underlining is to help you identify
the key words in the story. Ultimately, you need to identify these
parts without them being underlined.
Example 1. What is the average number of months a cell
phone is used before being discarded?
A cell phone company wants to know the normal life expectance
of a cell phone for their customers. They review their records to
see the amount of time a number was associated with the same
phone and person. They determine the average number of months
a person uses a cell phone before discarding it.
.
Example 2. Will requiring college success class lead to an
improvement in the 3-year graduation rate of Pierce students?
College success classes were required for all new students at Pierce
in Fall 2014. The data that will be recorded is whether a student
has graduated by the end of their third year. A comparison of the
proportion of people who graduate will be made for two time
periods, before the implementation of the policy and after its
implementation.
Example 3. Ground beef is labeled with the percent fat, such
as 4% fat (lean) or 20% fat (not lean). The leaner the meat, the
more expensive it is per pound so it is important to know if
packages marked as leaner, really are leaner. To find out,
packages of ground beef will be purchased and fried, with all the
fat being collected and weighed. The percent of fat from each
package will be calculated (weight of fat divided by weight of
ground beef before cooking). Is the average percentage of fat less
in the packages labeled as leaner ground beef? The average of fat
percentage from the lean packages will be compared to the average
fat percentage from the not-lean packages.
Example 4. Does divorce of a child in 9th or 10th grade have a
negative impact on the child’s academic success?
To determine the impact of a divorce on a child in 9th or 10th grade,
the mean GPA of students before a divorce is compared to the
mean GPA after the divorce. A control group consisting of
children whose parents were not divorced was compared to the
impact group of children whose parents were divorced.
Example 5. Is there a relationship between the BMI and % body
fat of a person?
BMI (Body mass Index) is used to determine obesity in people.
However, it is based only on height and weight and not on the
percent of body fat a person has. Consequently, it is natural to
wonder if there is a relationship between BMI and percent body
fat. The assumption is that percent body fat influences BMI.
Extra Examples
Example 6. Is there a relationship between the amount of weight a
person can squat lift and the height the person can jump?
Randomly selected adults are tested to see the amount of
weight they can squat lift and their vertical jump height. The
expectation is that greater leg strength would correspond to higher
jumps.
Example 7. What proportion of students who plan to take time off
of school actually return to school?
A survey of students who left Pierce College over 5 years ago
before completing their degree was done. The former students
were asked if they had returned to school (either at Pierce or
somewhere else). The proportion of students who returned was
calculated.
Complete In-class Activity– Design Tables –Page 275
Sampling
Observational studies and some observational experiments
require random sampling from a population. The next step in the
research design process is to determine how a sample will be taken
from the population so that it is representative of the population.
The objective is to avoid bias.
Bias can result from who is asked, who can’t be asked and
how they are asked.
Another source of bias comes from using data that is not
independent. Data that are independent means the knowledge of
one datum does not give any indication of the value of another.
Time series data is a special concern in this regard.
Using time series data for a BACI design or for a correlation
study must be done by sampling data from years that are far
enough apart that the data do not have serial dependence.
Probability Sampling Methods
Simple random – sample random selection from all units.
Complete In-class Activities – SRS on Calculator –Page 277
Stratified: Define strata then simple random from each stratum
separately.
Systematic: randomly select 1 value from 1 to k, then add k
repeatedly.
Cluster: Define the clusters then do a simple random sample of the
clusters and use all the units from the cluster.
Demonstrate on the book example.
Cluster 1
1 Yes
2 Yes
3 No
4 Yes
5 No
6 Yes
7 No
8 Yes
9 No
10 No
11 No
12 Yes
13 Yes
14 Yes
15 No
16 No
17 Yes
18 Yes
19 No
20 No
West Side of Harbor
Cluster 2
Cluster 3
21 No
41 Yes
22 No
42 No
23 No
43 No
24 No
44 Yes
25 No
45 No
26 Yes
46 Yes
27 No
47 No
28 No
48 No
29 No
49 No
30 No
50 No
31 No
51 Yes
32 No
52 Yes
33 No
53 Yes
34 No
54 Yes
35 Yes
55 No
36 Yes
56 No
37 Yes
57 No
38 Yes
58 No
39 No
59 Yes
40 No
60 No
Cluster 4
61 No
62 No
63 No
64 No
65 No
66 No
67 Yes
68 No
69 No
70 Yes
71 No
72 Yes
73 Yes
74 Yes
75 Yes
76 Yes
77 No
78 Yes
79 No
80 No
Cluster 5
81 No
82 Yes
83 No
84 No
85 Yes
86 Yes
87 No
88 No
89 Yes
90 No
91 No
92 Yes
93 No
94 Yes
95 No
96 No
97 Yes
98 No
99 No
100 No
East Side of Harbor
Cluster 6
101 No
102 No
103 Yes
104 Yes
105 Yes
106 No
107 No
108 Yes
109 No
110 No
111 No
112 Yes
113 No
114 No
115 No
116 No
117 Yes
118 Yes
119 Yes
120 Yes
Cluster 7
121 No
122 Yes
123 No
124 No
125 Yes
126 No
127 Yes
128 No
129 Yes
130 No
131 No
132 Yes
133 No
134 Yes
135 No
136 No
137 Yes
138 No
139 Yes
140 Yes
Complete and submit In-class Activities – Compare and Contrast
Sampling Methods - Page 279
Download