Experimental and Single

advertisement

Factors that threaten the validity of research findings

Material for this presentation has been taken from the seminal article by Don

Campbell and Julian Stanley:

 Experimental and quasi-experimental designs for research on teaching, which was first published as Chapter 5 in

N.L Page (1963), Ed., Handbook of

Research on Teaching.

Two classes of factors that jeopardize the validity of research findings

 Factors concerned with internal validity.

 Do the research conditions warrant the conclusions?

 Without internal validity results are uninterpretable.

 Factors concerned with external validity.

 To what extent can the results be generalized?

 To what populations, settings, treatment variables, and measurement variables?

Factors affecting Internal Validity

Internal validity is threatened whenever there exists the possibility of uncontrolled extraneous variables that might otherwise account for the results of a study.

Eight classes of extraneous variables can be identified.

 History

 Maturation

 Testing

 Instrumentation

 Statistical regression

 Selection

 Research mortality

 Interactions w/ selection

History

Specific events, in addition to the treatment, that occur between the first and second measurement.

The longer the interval between the pretest and posttest, the more viable this threat.

Maturation

Changes in physical, intellectual, or emotional characteristics, that occur naturally over time, and that influence the results of a research study.

In longitudinal studies, for instance, individuals grow older, become more sophisticated, maybe more set in there ways.

Testing

Also called “pretest sensitization,” this refers to the effects of taking a test upon performance on a second testing.

Merely having been exposed to the pretest may influence performance on a posttest.

Testing becomes a more viable threat to internal validity as the time between pretest and posttest is shortened.

Instrumentation

Changes in the way a test or other measuring instrument is calibrated that could account for results of a research study (different forms of a test can have different levels of difficulty).

This threat typically arises from unreliability in the measuring instrument.

Can also be present when using observers .

Statistical Regression

Occurs when individuals are selected for an intervention or treatment on the basis of extreme scores on a pretest.

Extreme scores are more likely to reflect larger (positive or negative) errors in measurement (chance factors).

Such extreme measurement errors are NOT likely to occur on a second testing .

Differential Selection

This can occur when intact groups are compared.

The groups may have been different to begin with.

If three different classrooms are each exposed to a different intervention, the classroom performances may differ only because the groups were different to begin with.

Selection-Maturation

Interaction

Occurs when differential selection is confounded with maturational effects.

The treatment group might be composed of higher aptitude students, or…

The treatment group might have more students who are born during the summer months.

Research Mortality

The differential loss of individuals from treatment and/or comparison groups.

This is often a problem when research participants are volunteers.

Volunteers may drop our of the study if they find it is consuming too much of their time.

Other’s may drop out if they find the task to be too arduous.

Interaction of Selection with the Other

Factors Affecting Internal Validity

Occurs when intact groups, which may not be equivalent, are selected to participate in research interventions.

As in a previous example, three different classrooms may be exposed to different treatments, but one of the classroom might be composed of students having higher achievement trajectories.

External Validity

Concerned with whether the results of a study can be generalized beyond the study itself:

1.

Population validity (when the sample does not adequately represent the population).

2.

Personological validity (when personal/ psychological characteristics interact with the treatment).

3.

Ecological validity (when the situational characteristics of the study are not representative of the population).

Factors affecting External Validity

External validity is threatened whenever conditions inherent in the research design are such that the generalizability of the results is limited.

Four classes of threats to external validity can be identified.

 Reactive or interactive effects of testing

 Interaction effect of selection bias and the intervention.

 Reactive effects of treatment arrangements

 Multiple treatment interference

Reactive effect of testing

Occurs whenever a pretest increases or decreases the respondents’ sensitivity to the treatment.

Studies involving self-report measures of attitude and interest are very susceptible to this threat.

Selection x Treatment Interaction

This can occur when selected treatment or comparison groups are more or less sensitive to the treatment prior to initiating the treatment (or intervention).

Most likely to occur when the treatment and comparison groups are not randomly selected.

Reactive Effects of Experimental

Arrangements

These can occur when the conditions of the study are such that the results are not likely to be replicated in nonexperimental situations.

 Hawthorn effects

 John Henry effects

 Placebo effects

 Novelty effects

Multiple-treatment Interference

This has a likelihood of occurring whenever the same research participants are exposed to multiple treatments.

 Sequence effects

 Carry-over effects

Research Designs

We will examine the operative threats to internal and external validity in twelve specific types of research designs.

Some symbols to be used:

R = Random Assignment

X = Treatment Intervention

O = Observation or Measurement

Design 1: One-shot Case Study

This is a widely-used research design in education.

 A single group receives a treatment or intervention.

 Following the treatment individuals are measured on some outcome variable:

 It can be diagramed as follows:

X O

Design 1:

One-shot Case Study, Continued

 This design is typical of a case study

 Inferences typically are based upon

expectations of what the results would have been had X not occurred.

 These designs often are subject to the error of misplaced precision, since they often involve tedious collection of specific detail and careful observations.

 The problem is that there usually are numerous rival, plausible sources of effect on the outcome other than X .

Design 2:

One-group Pretest-Posttest Design

This, also, is a widely-used research design in education (see the diagram).

A pretest is given, followed by a treatment or intervention, followed by a posttest.

 The difference between due to X .

O

1 and O

2 is used to infer an effect

 This design is subject to four of the eight threats to internal validity and one of the threats to external validity.

Can you name them?

O

1

X O

2

One-group Pretest-Posttest Design (Continued)

Threats to internal validity

1.

History

Many change-producing events may have occurred between O

1 and O

2

.

History is more viable the longer the lapse between the pretest and posttest.

2.

Maturation

During the time between cynical.

O

1 and O

2 the individuals may have grown older, wiser, more tired, more wary, or more

3.

Testing

The fact that the participants in the study were exposed to a pretest may, by itself, influence performance on the posttest.

One-group Pretest-Posttest Design (Continued)

Threats to internal validity (continued)

4.

Instrumentation

If O

1 and O

2 are obtained from judges (or raters), for between the two sets of observations.

Standardized achievement tests might be re-normed between pretesting and postesting.

5.

Statistical regression

For example, if students are selected to participate in a remedial intervention because of extremely low scores on a pretest they are very likely, as a group, to score higher upon receiving the same (or similar) test as a posttest.

This results mainly from errors in measurement (or unreliability in the tests).

Design 3:

Static-group Comparison

In this design (diagramed below) a non-random treatment group is compared to a non-random comparison group.

Problems associated with this design stem from the fact that that there is no way to substantiate that the treatment and comparison groups were equivalent to begin with.

X O

1

O

2

Static-group Comparison (Continued)

Threats to internal validity

1.

Selection

Here, intact groups, are being compared. It is possible that the treatment group was already prepared to do better (or worse) than the comparison group on O ; hence the treatment group might have performed differently from the comparison group even in the absence of X .

2.

Mortality

It is possible that differences between O

1 and O

2 are due to the fact that the nature of the treatment is such that participants drop out at higher rates than do participants in the comparison group.

Static-group Comparison (Continued)

Threats to internal validity (continued)

3.

Interactive effects (e.g., selections and maturation).

It may be that one of the groups being compared has a higher (or lower) achievement trajectory (e.g., when a more advanced class is compared with a lesser-advanced class).

The three designs discussed so far are usually referred to as pre-experimental designs.

We will now turn to a consideration of three true

experimental designs.

True Experiments

 True experiments are characterized by random assignment:

 Random assignment of individuals to treatment conditions.

 Random assignment of treatment conditions to individuals.

 When comparison groups are large enough

(usually, n > 20) and individuals are selected at random than representativeness can be assumed.

Design 4.

Pretest-posttest Control Group Design

R O

1

R O

2

X O

3

O

4

 Here, individuals are randomly assigned to one of two groups: the treatment group and a comparison group.

 The treatment group receives the intervention.

 The groups are compared in terms of their

difference scores:

(M

O

3

- M

O

1

) vs (M

O

4

– M

O

2

)

Pretest-posttest Control Group Design (Continued)

 This design, and the next two true-experimental designs, they control for all eight of the threats to internal validity.

 Any differences between groups that might have existed prior to X are (assumed to be) controlled through random assignment.

 Any effects do to history, maturation, testing, instrumentation, regression and so on would be expected to occur with equal frequency in both groups.

Pretest-posttest Control Group Design (Continued)

Factors affecting external validity:

1.

Interactions between the treatment and testing.

The occurs whenever the pretest sensitizes the treatment group to the effects of the treatment.

2.

Interactions between the treatment and group selection.

This can happen when the population from which the comparison group samples were selected is not the same as the target population.

3.

Reactive arrangements

Sometimes the setting for the study is artificially restrictive. When this occurs generalizability suffers.

Design 5.

Solomon Four-group Design

This design enjoys several advantages.

1.

Both the main effect of testing and the interaction of testing and treatment are testable.

2.

There are multiple tests of the effect of X:

O

2

>O

1

; O

2

>O

4

; O

5

>O

6

; O

5

>O

3

R

R

R O

1

X O

2

R O

3

O

4

X O

5

O

6

Design 6:

Posttest-only Design

Pretests are not always necessary. Given randomization of subjects to treatment conditions we can assume that the groups were equivalent prior to the treatment intervention.

In this design all the threats to internal validity are controlled for.

As far as external validity is concerned we might still question whether there might be reactive effects.

R

R

X O

O

2

1

Design 6 (continued) Randomized pretest-posttest control group design

O

pre

T O

post

R: ---------------------------

O

pre

C O

post

R: ---------------------------

O

pre

C

2

O

post

Matched comparison group, posttest design

T O

M: ---------

C O

 Validity depends upon how well matching is achieved

 Potential threats to internal validity are same as those for posttest-only designs

More advanced Randomized Designs:

Randomized factorial designs

T

A1,B1

O

---------------

T

A1,B2

O

R: ---------------

T

A2,B1

O

---------------

T

A2,B2

O

Factorial Design: Example

Method (B)____

Word Type (A) Computer Handwriting

B

1

B

2

Easy A

1

Hard A

2

20

16

26

20

____________________________

Quasi-experiments: Nonequivalent control groups

 In these designs, randomization is either not possible or not feasible.

 Characterized by ...

 using intact groups for treatment and comparison

manipulated independent variable

 Often, the best we can expect from education research

Design 7:

Non-equivalent Pretest-Postest

Most widely-used quasi-design in education research.

O

1

X O

2

______________________________

O

3

O

4

Used to determine (and adjust where necessary) whether the groups were equivalent before onset of treatment.

Design 7 (Continued) Non-equivalent, control group, pretest-posttest design

O pre

T O post

-------------------------

O pre

C O post

 Except for reactive effects, most threats to internal validity are controlled

 Again settings and selection by treatment interactions pose threats to external validity

Quasi-experiments: Time series designs

O

1

O

2

O

3

O

4

T O

5

O

6

O

7

O

8

 Pre-observations to establish a baseline

 A treatment intervention

 Post-observations to establish new baseline

Design 8:Time Series Design

O

1

O

3

O

5

O

7

X

9

O

11

O

13

O

15

O

17

-----------------------------------------------

O

2

O

4

O

6

O

8

X

10

O

12

O

14

O

16

O

18

-----------------------------------------------

: : : : : : : : :

-----------------------------------------------

O

2

O

4

O

6

O

8

X

10

O

12

O

14

O

16

O

18

-----------------------------------------------

Design 9:

Counterbalanced Designs

X

1

O

1

X

2

O

2

X

3

O

__________________________________________________

3

X

3

O

4

X

1

O

5

X

2

O

__________________________________________________

6

X

2

O

7

X

3

O

8

X

1

O

9

Treatment Reversal Design without Randomization

O

1

O

3

X

5

O

7

X

9

O

11

-------------------------------

O

2

O

4

X

6

O

8

X

10

O

12

Treatment Reversal Design with Randomization

R O

1

O

3

X

5

O

7

X

9

O

11

------------------------------------

R O

2

O

4

X

6

O

8

X

10

O

12

Single (or few) Subject Designs

In certain types of situations these designs are very appropriate.

When the target population is very small.

Particularly applicable to clinical settings.

When studying specific behaviors of unique individuals.

Individuals serve as their own controls.

When we want to show that an intervention

can have an effect.

Single-subject designs

 Similar to time-series designs, only with a single individual.

 Repeated measurements over time

(baselines).

 Subjects serve as their own controls.

 Involve a manipulated independent variable (the intervention).

Requirements of Single-

Subject Designs

External validity is often difficult to establish.

Internal validity requires three things:

Repeated and reliable measurement.

Valid and reliable measuring instruments (or techniques).

Baseline stability.

Single variable rule (manipulate only one variable at a time.)

Basic single-subject designs

 Reversal: A - B - A

 Double reversal: A - B - A - B

 Multiple baseline:

A is a period of no treatment

B is a period of treatment

A-B-A Withdrawal Design

This design involves alternating phases of baseline observation and treatment intervention, X:

0 0 0 0 | X 0 X 0 X 0

__________________________________ ________________________________________________________

Baseline Phase Treatment Phase

During the treatment phase the intervention is turned on and off.

A-B-A Single Subject Design

0 0 0 0 X X X X 0 0 0 0

_____________________________ _______________________________ ____________________________

Baseline Phase Treatment Phase Post-treatment

One problem with this design is that it is sometimes considered unethical to discontinue treatment when the treatment has been shown to be effective.

A-B-A-B Single Subject Design

0 0 0 0 X X X X 0 0 0 0 X X X X

_________________ _____________________ __________________ _____________________

Baseline Treatment Baseline Treatment

The advantage is that it leaves an effective treatment in place.

Other Single-Subject Designs

There are a wide variety of singlesubject designs:

Multiple baseline designs.

Alternating treatment designs.

Increasing/decreasing treatment intervention designs.

Replicated single-subject designs.

Example of a stable baseline

100

90

80

70

60

50

40

30

20

10

0

S ta ble Ba s e line P a tte rn

Example of an increasing baseline

100

Increasing Baseline Pattern

80

60

40

20

0

End of Presentation

Download