Slides for Lecture 10

advertisement
Hawthorne Effect
 When your experimental effect is due to the
experiment itself: the subject is at the center of
attention.
 Can manifest itself as a spurt or elevation in
performance or physical phenomenon measured.
 More of a problem when it operates differently in
different cells of the experiment.
 Solution: Add a control group to the experiment.
Have them go through the same experimental
procedure, but administer a placebo instead of the
treatment.
 Example: Testing a new design tool. Bring in two
groups into the lab, tell them both you have an
exciting new tool. Use your real tool with one group,
use the old tool with the placebo group.
Blind and Double Blind
Procedures
• Medical Terminology
• Blind Administration: When the subjects does
not know if he/she is in the experimental /
control condition
• Double Blind Administration: When the above is
true, and also the experimenter does not know
which condition the subject is in (Controls for
expectancy effects)
Experimental terminology in
Multifactor experiments
• Factors / Independent Variable / Treatment
Condition:
 Is directly manipulated in real experiments, is selected in
quasi experiments.
• Levels of the IV: Each specific variation of the
factor. E.g. the different font sizes
• Main Effect: The difference in the DV between the
different levels of the IV
• Interaction: Does one independent variable effect
the other. Do they interact?
Effect of Font Size and Screen
Resolution on Readability
9
• Main Effect-Size
8
7
• Main Effect-Resolution
6
5
• No interaction
4
3
Low Resolution
2
High Resolution
1
0
Size 12
Size 16
Size 12
Low Resolution
High Resolution
Column Means
Size 16
3
5
4
Row Means
5
4
7
6
6
5
Effect of Font Size and Screen
Resolution on Readability
• Main Effect-Size
7
6
• No Main EffectResolution
5
4
3
2
Low Resolution
High Resolution
1
• No interaction
0
Size 12
Size 16
Size 12
Low Resolution
High Resolution
Column Means
3
3.2
3.1
Size 16
Row Means
6
4.5
6.2
4.7
6.1
4.6
Effect of Font Size and Screen
Resolution on Readability
12
• Main Effect-Size
10
• Main Effect-Resolution
8
• Interaction
6
4
Low Resolution
High Resolution
2
 High sizes at High resolution
have great readability
0
Size 12
Size 16
Size 12
Low Resolution
High Resolution
Column Means
3
4
3.5
Size 16
Row Means
5
4.0
10
7.0
7.5
5.5
Effect of Font Size and Screen
Resolution on Readability
12
• Main Effect-Size
10
• Main Effect-Resolution
8
6
• Interaction
4
Low Resolution
High Resolution
2
0
Size 12
Size 16
Size 12
Low Resolution
High Resolution
Column Means
1
5
2.8
Size 16
Row Means
9
4.8
10
7.5
9.5
6.1
• Main Effects: When we look at a main effect
(effect of one variable averaged over the
other), we are ignoring the other variable
• Interaction: concerned with the joint effects of
both the variables
 When lines are parallel, interaction not present. In
case of interaction, lines will cross theoretically at
some point
 Independent Variables can be depicted on either axis
Establishing a Cause-Effect Relationship
Temporal Precedence
• Cause happened before your effect.
 Real life relationships between variables
are never simple.
 Cyclical situations, involving ongoing
processes that interact are hard to
interpret.
Covariation of the Cause and Effect
if X then Y
if not X then not Y
• If you observe that whenever X is present, Y
is also present, and whenever X is absent, Y
is too, then there is covariation between the
two.
• For Example:
Better website, more visitors
Bad website, less visitors
No Plausible Alternative Explanations
• Covariation does not imply causation.
• Rule out alternative explanations. (a third
variable that might be causing the outcome)
• Referred to as the "third variable" or
"missing variable" problem. Also at the heart
of establishing Internal validity.
• For Example: Better better site (better
company, more marketing) more visitors
Hypothetical Case Study:
Barnes and Noble site redesign
•Hired one of the famous “ient” web design
companies to redesign site
•Purpose: make online shopping easy and
site more attractive
•Paid a lot of money
•Does site redesign work: Lets look at sales
figures
Hypothetical Data
Effect of Site Redesign on Online Sales
9
8
7
Sales
6
5
4
3
2
1
0
Old Site
• Sales increased!
New Site
Problems with Deducing that
site redesign worked
• Temporal relationship
• Covariation
• Alternative Explanations:
Reliability
• Replicability
• Insure that random confounding factors
are not playing a role
External Validity
• Related to generalizing. Degree to which the
conclusions in your study would hold for other
persons in other places and at other times.
• Sampling Model: Identify the population you
would like to generalize to. Then, you draw
random sample from that population. You can
generalize back to it.
 Problems: Time and place constraints
Threats to External Validity
• Peoples: Results of your study could unusual
type of people who were in the study.
• Places: Limited to experimental context.
 For example: if you conducted study in an office
atmosphere.
• Time: Limited to time period when you did your
experiment.
 For example: study on web interfaces in 1997
• Objects: In HCI your results might be
extendable to only similar objects / interfaces.
What is validity
• Validity refers to the operationalization or
measurement of concepts.
• Any time you translate a concept or
construct into a functioning and
operating reality (the
operationalization), you need to be
concerned about how well you did the
translation.
Internal Validity
Concerns inferences regarding cause-effect or causal
relationships.
•Only relevant in studies that try to establish a
causal relationship.
•Not relevant in most observational or
descriptive studies.
Important for studies that assess the effects of
certain changes to websites, or to products.
Are there alternative
explanations?
• Example: Amazon.com increased the number of tabs in
its home page.
• Assume that study showed
increase in the no of tabs = increase in ease of
navigation.
Alternative explanations:
• At same time Amazon.com launched a marketing
campaign.
• The key question in internal validity is whether observed
changes can be attributed to your intervention (i.e., the
cause) and not to other possible causes (sometimes
described as "alternative explanations" for the
outcome).
Construct Validity
• Degree to which you can generalize back to the
theoretical construct you started from.
• Construct validity can be thought of as a
"labeling" issue.
• Real Objective: to make site easier to navigate
 Operationalization: give users more options on each
page by increasing number of links.
 Is increasing number of links really giving users more
options.
Kinds of construct validity
• Face Validity
• Content Validity
Face Validity
• Does operationalization of the concept seem
like a good translation “on its face" or
superficially speaking.
• The weakest way to try to demonstrate
construct validity.
• For example: you can check for a measure of
math ability, read through the questions, and
decide that, it seems like this is a good
measure of math ability (i.e., the label "math
ability" seems appropriate for this measure).
Content Validity
• Check the operationalization against the
relevant content domain for the construct.
• For example: you are trying to measure
usability. What are the sub domains of usability
 Efficiency
 Attractiveness
 Control
• Check your measure of usability against these
domains
Research Designs
Single Group Experimental Designs
 Repeated measurements are take across time for one
group.
 Does not lend itself to clear statistical analysis and
hypothesis testing
 Cannot control for order effects, difficult to generalize
 Can provide us with important information which we
might not have access to by experiments
Randomized Group
Experimental Designs
• This is what you want to aim for
• You have an experimental and control
group. Randomly assign subjects to
either group
• All sorts of causal inferences possible
Quasi Experimental Design
• When you cannot control who gets assigned to
which group
• For example: in an ex post facto study, IV has
already occurred, you want to draw inferences.
• For example: You want to compare users of
Palm Pilot and Handspring. You have no control
over who goes to which group
Comparing Quasi-Experimental
and Experimental designs
• The experimental design is as sound in both
cases
• It is harder to make causal inferences in case of
quasi experimental designs, since groups were
not equal to start with
• You can do pretest on groups, and do analysis
of covariance
Download