Eisenbeiss (2013): Introduction to Experimental Design. JNU, Delhi

advertisement
Introduction to Experiment Design
Sonja Eisenbeiss, University of Essex
seisen@essex.ac.uk
February 2013
Research Topics and Research Questions


A specification of a research topic/question should include information about:
o language(s) of the speakers involved
o populations (adult native speakers, L1 learners, L2 learners, people with a
specific language disorder,…) and criteria for assigning people to these
populations (proficiency levels, clinical classifications for aphasic patients,…)
o constructions
o a research question and predictions, based on a review of the literature
The chosen populations must be easily available and the research questions should
make predictions that can be tested with available methods.
An Example:
 Language: English
 Population: adult native speakers
 Constructions: s-possessives and prepositional of-possessives
the lady's leg vs. the leg of the lady; the table's leg vs. the leg of the table
 Research Question: Does animacy affect the choice of possessive construction?
 Hypotheses: Phrases with animate referents are more easily encoded than phrases
with inanimate referents. Hence, phrases with animate referents tend to be
encoded before phrases with inanimate referents. For possessive constructions,
this means that speakers should prefer to realize animate PRs phrase before an
inanimate PM, i.e. in an PR-initial s-possessive (the lady's leg). Inanimate PRs
should not show such a preference (the leg of the table).
 Literature: Rosenbach (2008) in Lingua and references cited there (use google
scholar to find more recent publications referring to this article). Use
http://linguistlist.org/ ; Glottopedia
http://www.glottopedia.de/index.php/Main_Page, http://www.wikipedia.org/,
http://academia.edu/ to find further references and to follow researchers, journals,
topics, etc.
1
Experiments vs. Spontaneous Data vs. Elicitation
Criteria for Method Selection: Reliability and Validity
•
Reliability: The consistency of a measurement/test
•
Test-re-test reliability
•
Inter-rater reliability
 Strict control of variables, test situation, etc.
•
Validity: the degree to which the test measures what it intends to measure (e.g.
children’s knowledge of genitives or possessives vs. their ability to focus)
•
Ecological validity: capturing the use of an ability as closely to real-life
use as possible
 Natural test situation with minimal extra demands
Types of Methods
•
spontaneous speech / naturalistic data
recordings in natural situations
•
"classic" elicitation:
asking individual participants or small groups of participants to provide information
about their language:
•
•
Translation from a lingua franca into the target language
•
Back-translating utterances provided by other speakers
•
Manipulating utterances (e.g. turning them into questions, negating them)
•
Grammaticality/acceptability judgements
stimulus-based elicitation
using pictures, videos, games, etc. to encourage speakers to produce language in
semi-structured situations
•
interactional types: director/matcher (or “confederate description”) vs.
speaker/listener vs. co-player
•
•
target: broad-spectrum vs. form-focused vs. meaning-focused
experiments
controlled manipulation of one or more variables (e.g. syntactic construction: active
vs. passive sentence)
•
in a fixed setting and
•
with a fixed procedure
2
Experimental Methods
 What is measured?
o Experiments with behavioural measures measure the behaviour of
participants, e.g. their grammaticality judgements or the correctness rates
for their use of morphological forms.
o Neuro-imaging experiments measures the neural activity during the
experiment (e.g. brain waves in ERPs, blood flow to regions of the brain
with increased activity, e.g. PET)
 Which linguistic abilities / modality of is tested?
o production
o comprehension
o grammaticality judgments
o imitation
 Is the measurement time-sensitive?
o Off-line experiments (e.g. acceptability judgment task with a questionnaire)
do not provide any information about the time-course of linguistic
processing
o On-line measurements provide information about the time-course of
linguistic processing (e.g. neuroimaging experiments, reaction-time
experiments).
o NOTE: some reaction time experiments only measure how long it takes to
complete a process without providing information about the time-course of
the individual steps in this process. For instance, in speeded grammaticality
experiments, participants see a sentence on the scene and have to decide as
quickly as possible whether the sentence is grammatical or ungrammatical.
The experimenter studies the correctness rates for the reactions. In addition,
reaction times for the responses are measured – i.e. the experimenter
measures how long it takes participants to press the yes-button for
grammatical sentences or the no-button for ungrammatical sentences. This
reaction time reflects the entire process of decision making, not its
individual steps. In contrast, in self-paced reading experiments, participants
read sentences or longer texts on a PC screen and have to press a button
whenever they have finished reading the short piece of text on the screen
(typically just one or a few words). In this way, reading times for these
individual text segments are obtained, and not just overall reading times for
the entire text.
 Which types of stimuli are used?
o Linguistic stimuli (auditory vs. visual presentation of words, sentences,
etc.)
o Non-linguistic stimuli:
3


Static (pictures, drawings) vs. dynamic (e.g. videos, animations, life
action)
Abstract depictions (e.g. drawings, cartoon-style animations) vs.
naturalistic depictions (e.g. photos, videos, etc.) vs. real objects/people
Variables
Variables are properties of participants, situations, materials, ... whose value can vary
Independent Variable (IV) vs. Dependent Variable (DV)
IV:
DV:
Sometimes also called Experimental Variable (EV)
This is the variable whose values are manipulated by the researcher
The values of this variable are set up independently by the researcher; i.e. before
the experiment begins.
An IV can have several levels. An Experiment can have several IVs.
Conditions result from the combination of IVs.
This is the variable that measures the changes of the researcher's manipulation
The values of this variable are seen as dependent on the values of the IV.
Example 1
Does PR animacy affect native speakers choice of English possessive constructions
IV:
animacy of Possessor (PR); two levels (animate vs. inanimate)
DV: percentage of s-choice
Example 2
Does PR animacy affect native speakers choice of English possessive constructions and
are second language learners influenced by construction choice in their first language?
IV:
animacy of Possessor (PR); two levels (animate vs. inanimate)
IV:
first language of participant (English vs. Japanese with PR<PM construction only
vs. German with s-possessives vs. prepositional possessives, but s only for proper
names)
DV: percentage of s-choice
Exercise
Read the following description by Barbora Skarabela and Ludovica Serratrice
(http://www.bu.edu/bucld/proceedings/supplement/vol33/ ) and discuss which variables
should be included in the study:
"The aim of this study was, therefore, to examine whether 4-year-olds and adults are
sensitive to the animacy constraints in the possessive noun phrase. More specifically, we
focused on the following four questions: 1. Do 4-year-olds and adults show preference
for encoding human possessors with the s-genitive than the of-genitive in the baseline
4
(without exposure to examples in the immediately preceding input)? 2. Do 4-year-olds
and adults respond to syntactic priming of possessive noun phrases with a human
possessor and human possessee? 3. If a priming effect is found, is it of comparable
strength for both structures (the preferred s-genitive and dispreferred of-genitive) and
both populations? 4. If priming effects are found, is the use of possessive constructions
affected by exposure to target forms in the course of the experiment (i.e., do the
structures persist in non-primed contexts in the post-test)? We used a syntactic priming
paradigm to explore our questions. Syntactic priming refers to a procedure in which
participants are exposed to an exemplar from one of two ‘equivalent’ structures in the
language. For example, they are exposed to an example the mother of the girl (vs. the
girl’s mother). Those who have heard such an example (i.e., the 'prime') tend to use the
same syntactic pattern (e.g., the sister of the doctor) in subsequent elicited production
(i.e., the 'target'). The procedure is claimed to provide information about the mental
representation of language, as priming happens only if there is a recognition of a
relationship between the prime and target. The method has been successfully used with
children (Brooks and Tomasello, 1999; Savage, Lieven, Theakston, and Tomasello, 2003;
Huttenlocher, Vasilyeva, and Shimpi, 2004). In this study, children participated in a
simple picture-description task, which provides an ecologically valid set-up for children
but which is also equally appropriate for adults. Importantly, however, the procedure
allows manipulation of subtle non-structural constraints, such as animacy or discourse
status (also see Serratrice, 2008)."
Confounding Variables
When one designs an experiment, one is only interested in the effects of the IVs on the
DV. However, sometimes, other variables unintentionally vary alongside the manipulated
variable (i.e. the IV). This results in confounding.
Example
Some researchers argue that the choice of s- vs. of-possessives is influenced by the type
of relationship between the possessor and the possessum (see e.g. Rosenbach 2008 in
Lingua): s- is preferred for prototypical possessive relations (e.g. ownership, kinship,
part-of-relations) whereas abstract relations tend to be realized by of-constructions (e.g.
the shape of the house). Animacy and the type of relation are closely linked. For instance,
PRs in kinship and ownership relations are animate. Thus, if one wants to explore effects
of PR-animacy, one has to keep the type of relationship constant (e.g. use items with partof relationships only: the leg of the table/lady vs. the table's/lady's leg). Moreover, PRlength and PM-length need to be controlled for because they can affect construction
choice as well (try this out for yourselves).
Irrelevant Variables
Participants
The personal properties of individual participants (e.g. their general speed in responding
to questions or their ability to concentrate during an experiment) should be irrelevant for
the experiment. Researchers use large groups of participants so that the individual
differences, e.g. in reaction times, will be averaged out.
5
Procedure
The situation in which an experiment is carried out and the instructions given to
participants might influence the outcome of the experiment. Therefore, care must be
taken to keep the situation as similar as possible, e.g. by developing a standardized
experimental procedure and written instructions that are given to each participant before
the experiment.
Randomisation
As participants become more tired in an experiment, responses to stimulus items which
are presented at the end of an experiment might be slower or more incorrect. Thus, one
cannot simply present all phrases with animate PRs first and then all phrases with
inanimate PRs. One has to make sure that both types of phrases are randomly distributed
across the experiment.
Constants
In contrast to variables, constants do not vary. They are fixed, i.e. remain the same. You
can turn an IV or a confounding variable into a constant, .e.g. by keeping word or phrase
length constant across the different conditions of the experiment.
Hypotheses
Researchers make predictions about the effects of the IV(s) on the DV. These predictions
are formulated as experimental hypotheses. Hypothesis must be testable. I.e., it must be
possible for the predicted effects to occur or not to occur.
The experimental hypothesis (or: alternative hypothesis) predicts that an effect occurs.
The null hypothesis predicts that this effect does not occur.
There are two types of experimental hypotheses:
A non-directional hypothesis predicts that there is an effect, but does not make any
statement about the direction of the effect (e.g. improvement vs. deterioration).
A directional hypothesis predicts the direction of an effect.
In an experiment, the researcher wants to reject the null hypothesis. Then, the
experimental hypothesis is supported.
Exercise
Develop a research question that is related to genitives/possessives and discuss potential
IVs, DVs, confounding and irrelevant variables, hypotheses.
6
Design Types





correlational design
repeated measures design
independent groups design
mixed design
Latin Square design
Correlational Designs


Two different measures are obtained for each participant and one tries to determine
whether there is a relationship between the measurements
prediction: There is a positive correlation between the measurements (the higher the
score for variable X, the higher the score for variable Y) OR there is a negative
correlation between the measurements (the higher X, the lower Y)
Repeated Measure Designs



other names: within group design, same subject design, related design
The same participants are measured several times.
prediction: There is a difference between the measurements
Independent Group Designs





other names: between-group design, different subject design, unrelated design
Two groups of participants are measured and the measurements of the two groups are
compared.
Groups can differ with respect to one variable, e.g. age, proficiency level, L1, .. Then
there is one IV.
Groups can also differ with respect to several of these variables. Then, there is more
than one IV.
prediction: there is a difference between the measurements
Mixed Designs
An experiment can involve both repeated measures and independent groups (mixed
design).
Latin Square Designs
In order to minimize effects of items-specific properties, items in different conditions
should be as similar to one another as possible. On the other hand, one has to avoid
repetitions of materials. For this purpose, so-called Latin-Square Designs were
developed. In such designs, sets of minimally contrasting stimuli are used and
counterbalanced across lists.
7
A Latin-Square Design Example
One might want to vary PR-definiteness as it has been shown to affect construction
choice (Rosenbach 2008 in Lingua). For this, it would be ideal to create minimal pairs of
items that only differ with respect to the crucial IV, for instance one could compare
construction choice for items with definite PRs (the table's leg vs. the leg of the table)
with construction choice for items with indefinite PRs (a table's leg vs. the leg of a table).
If one presented both parts of such minimal pairs to each participant, this might highlight
the difference with respect to definiteness. Hence, participants would be altered to the
purpose of the experiment and they might develop strategies that could lead to unreliable
results.
Thus, researchers would like to use sets of minimally contrasting stimuli (the same phrase
pairs, with definite vs. indefinite PR). However, one would try to avoid presenting each
variant from this set to each of the participants. Moreover, each participant should be
presented with each minimal pair. Both requirements can be met in a Latin-Square
Design, where sets of minimally contrasting stimuli are used and counterbalanced across
lists. Each of these lists is given to a subset of the participants from each of the
experimental groups. The number of lists is determined by the number of contrasting
conditions. I.e., for example 1, one would have two lists of stimulus items:
Table 1: A Latin Square Design for Example 1
LIST I
LIST II
s/of-pair 1
definite PR
indefinite PR
s/of-pair 2
indefinite PR
definite PR
s/of-pair 3
definite PR
indefinite PR
s/of-pair 4
indefinite PR
definite PR
s/of-pair 5
definite PR
indefinite PR
s/of-pair 6
indefinite PR
definite PR
s/of-pair 7
definite PR
indefinite PR
s/of-pair 8
indefinite PR
definite PR
s/of-pair 9
definite PR
indefinite PR
S/of-pair 10
indefinite PR
definite PR
s/of-pair 11
definite PR
indefinite PR
s/of-pair 12
indefinite PR
definite PR
s/of-pair 13
definite PR
indefinite PR
s/of-pair 14
indefinite PR
definite PR
s/of-pair 15
definite PR
indefinite PR
s/of-pair 16
indefinite PR
definite PR
s/of-pair 17
definite PR
indefinite PR
s/of-pair 18
indefinite PR
definite PR
s/of-pair 19
definite PR
indefinite PR
S/of-pair 20
indefinite PR
definite PR
8
Note: In a Latin Square Design, it is crucial that you are working with pairs of stimuli
that are only minimally different. This is the case if you only vary definiteness, but keep
all the lexical elements the same. However, it is not the case for an animacy variation
(PM-leg: PR-table vs. PR-lady). Here, you would not just vary animacy, but also the
lexical element (table vs. leg). Hence, you might not want to use a Latin Square Design in
this case.
Stimuli vs. Distractors/Fillers
The stimulus items are those items which represent the levels of the IV that the researcher
is manipulating. They are the core of the experiment. Distractor/filler items are included
into the experiment to distract participants from the properties of the stimulus items that
the researcher wants to investigate. This is necessary to prevent participants from
developing strategies or response patterns. This is particularly crucial when the same
participants are tested several times – and know that they will be tested on similar items
again. Distractors involve different constructions, morphological forms,… than the
stimulus items.
Exercise
In English, one can encode a possessive relation with the marker –s (my neighbour’s car)
or with the preposition of (the car of my neighbour). Some linguists have suggested that
the animateness of the possessor influences the choice of –s vs. of.
 Suggest a method and a design for an experiment investigating this question
(variables, levels, potential confounding factors, etc.)
 Discuss the appropriateness of the following stimuli:
 the book of John vs. John’s book
 James’ car vs. the car of James
 lilies of the valley vs. the valley’s lilies
 the dog’s hair vs. the hair of the dog
 this year’s best wine vs. the best wine of this year
 Bloomingdale’s lift vs. the lift of Bloomingdale
 Discuss potential distractors for a construction choice experiment (s vs. of). Consider:
o Which types of construction alternatives can you think of?
o Which factors determine construction choice for these alternatives?
Randomisation
As participants become more tired in an experiment, responses to stimulus items which
are presented at the end of an experiment might be slower or more incorrect. Thus, one
has to ensure that the order in which stimulus items are presented is random. E.g., when
one compares reaction times for the recognition of irregularly inflected word forms (e.g.
went) with reaction times for regularly inflected word forms (e.g. walked) one cannot
simply present all regular word forms first and then all irregular word forms. One has to
make sure that both types of word forms are randomly distributed across the experiment.
I.e. .the items from the different conditions (corresponding to the different levels of the
IV) have to be equally distributed across different parts of the experiment.
9
Exercises
Question 1
Some researchers have argued that the of-possessive is preferred for possessives with
longer possessor phrases (the car of my old neighbour with the broken leg) while spossessives are preferred for possessives with shorter possessor phrases (my neighbour's
car). Describe the basic design of an experiment that you could use to test this claim.
Discuss:
 Method:
o Ability/Modality tested
o Time-sensitivity
o Stimuli provided
 Independent Variable(s)
 Dependent Variable
 Potentially confounding variables and how they can be controlled
 Constants
Question 2
Some researchers have argued that the of-possessive is preferred for possessives with
longer possessor phrases (the car of my old neighbour with the broken leg) while spossessives are preferred for possessives with shorter possessor phrases (my neighbour's
car). Discuss the advantages and disadvantages of using (i) a controlled experiment, (ii)
spontaneous speech recordings or (iii) elicitation games. Consider:
 the age and abilities of participants
 the frequency of the construction under study
 the relative reliability and validity of the three types of methods
10
Download