Introduction to Experiment Design Sonja Eisenbeiss, University of Essex seisen@essex.ac.uk February 2013 Research Topics and Research Questions A specification of a research topic/question should include information about: o language(s) of the speakers involved o populations (adult native speakers, L1 learners, L2 learners, people with a specific language disorder,…) and criteria for assigning people to these populations (proficiency levels, clinical classifications for aphasic patients,…) o constructions o a research question and predictions, based on a review of the literature The chosen populations must be easily available and the research questions should make predictions that can be tested with available methods. An Example: Language: English Population: adult native speakers Constructions: s-possessives and prepositional of-possessives the lady's leg vs. the leg of the lady; the table's leg vs. the leg of the table Research Question: Does animacy affect the choice of possessive construction? Hypotheses: Phrases with animate referents are more easily encoded than phrases with inanimate referents. Hence, phrases with animate referents tend to be encoded before phrases with inanimate referents. For possessive constructions, this means that speakers should prefer to realize animate PRs phrase before an inanimate PM, i.e. in an PR-initial s-possessive (the lady's leg). Inanimate PRs should not show such a preference (the leg of the table). Literature: Rosenbach (2008) in Lingua and references cited there (use google scholar to find more recent publications referring to this article). Use http://linguistlist.org/ ; Glottopedia http://www.glottopedia.de/index.php/Main_Page, http://www.wikipedia.org/, http://academia.edu/ to find further references and to follow researchers, journals, topics, etc. 1 Experiments vs. Spontaneous Data vs. Elicitation Criteria for Method Selection: Reliability and Validity • Reliability: The consistency of a measurement/test • Test-re-test reliability • Inter-rater reliability Strict control of variables, test situation, etc. • Validity: the degree to which the test measures what it intends to measure (e.g. children’s knowledge of genitives or possessives vs. their ability to focus) • Ecological validity: capturing the use of an ability as closely to real-life use as possible Natural test situation with minimal extra demands Types of Methods • spontaneous speech / naturalistic data recordings in natural situations • "classic" elicitation: asking individual participants or small groups of participants to provide information about their language: • • Translation from a lingua franca into the target language • Back-translating utterances provided by other speakers • Manipulating utterances (e.g. turning them into questions, negating them) • Grammaticality/acceptability judgements stimulus-based elicitation using pictures, videos, games, etc. to encourage speakers to produce language in semi-structured situations • interactional types: director/matcher (or “confederate description”) vs. speaker/listener vs. co-player • • target: broad-spectrum vs. form-focused vs. meaning-focused experiments controlled manipulation of one or more variables (e.g. syntactic construction: active vs. passive sentence) • in a fixed setting and • with a fixed procedure 2 Experimental Methods What is measured? o Experiments with behavioural measures measure the behaviour of participants, e.g. their grammaticality judgements or the correctness rates for their use of morphological forms. o Neuro-imaging experiments measures the neural activity during the experiment (e.g. brain waves in ERPs, blood flow to regions of the brain with increased activity, e.g. PET) Which linguistic abilities / modality of is tested? o production o comprehension o grammaticality judgments o imitation Is the measurement time-sensitive? o Off-line experiments (e.g. acceptability judgment task with a questionnaire) do not provide any information about the time-course of linguistic processing o On-line measurements provide information about the time-course of linguistic processing (e.g. neuroimaging experiments, reaction-time experiments). o NOTE: some reaction time experiments only measure how long it takes to complete a process without providing information about the time-course of the individual steps in this process. For instance, in speeded grammaticality experiments, participants see a sentence on the scene and have to decide as quickly as possible whether the sentence is grammatical or ungrammatical. The experimenter studies the correctness rates for the reactions. In addition, reaction times for the responses are measured – i.e. the experimenter measures how long it takes participants to press the yes-button for grammatical sentences or the no-button for ungrammatical sentences. This reaction time reflects the entire process of decision making, not its individual steps. In contrast, in self-paced reading experiments, participants read sentences or longer texts on a PC screen and have to press a button whenever they have finished reading the short piece of text on the screen (typically just one or a few words). In this way, reading times for these individual text segments are obtained, and not just overall reading times for the entire text. Which types of stimuli are used? o Linguistic stimuli (auditory vs. visual presentation of words, sentences, etc.) o Non-linguistic stimuli: 3 Static (pictures, drawings) vs. dynamic (e.g. videos, animations, life action) Abstract depictions (e.g. drawings, cartoon-style animations) vs. naturalistic depictions (e.g. photos, videos, etc.) vs. real objects/people Variables Variables are properties of participants, situations, materials, ... whose value can vary Independent Variable (IV) vs. Dependent Variable (DV) IV: DV: Sometimes also called Experimental Variable (EV) This is the variable whose values are manipulated by the researcher The values of this variable are set up independently by the researcher; i.e. before the experiment begins. An IV can have several levels. An Experiment can have several IVs. Conditions result from the combination of IVs. This is the variable that measures the changes of the researcher's manipulation The values of this variable are seen as dependent on the values of the IV. Example 1 Does PR animacy affect native speakers choice of English possessive constructions IV: animacy of Possessor (PR); two levels (animate vs. inanimate) DV: percentage of s-choice Example 2 Does PR animacy affect native speakers choice of English possessive constructions and are second language learners influenced by construction choice in their first language? IV: animacy of Possessor (PR); two levels (animate vs. inanimate) IV: first language of participant (English vs. Japanese with PR<PM construction only vs. German with s-possessives vs. prepositional possessives, but s only for proper names) DV: percentage of s-choice Exercise Read the following description by Barbora Skarabela and Ludovica Serratrice (http://www.bu.edu/bucld/proceedings/supplement/vol33/ ) and discuss which variables should be included in the study: "The aim of this study was, therefore, to examine whether 4-year-olds and adults are sensitive to the animacy constraints in the possessive noun phrase. More specifically, we focused on the following four questions: 1. Do 4-year-olds and adults show preference for encoding human possessors with the s-genitive than the of-genitive in the baseline 4 (without exposure to examples in the immediately preceding input)? 2. Do 4-year-olds and adults respond to syntactic priming of possessive noun phrases with a human possessor and human possessee? 3. If a priming effect is found, is it of comparable strength for both structures (the preferred s-genitive and dispreferred of-genitive) and both populations? 4. If priming effects are found, is the use of possessive constructions affected by exposure to target forms in the course of the experiment (i.e., do the structures persist in non-primed contexts in the post-test)? We used a syntactic priming paradigm to explore our questions. Syntactic priming refers to a procedure in which participants are exposed to an exemplar from one of two ‘equivalent’ structures in the language. For example, they are exposed to an example the mother of the girl (vs. the girl’s mother). Those who have heard such an example (i.e., the 'prime') tend to use the same syntactic pattern (e.g., the sister of the doctor) in subsequent elicited production (i.e., the 'target'). The procedure is claimed to provide information about the mental representation of language, as priming happens only if there is a recognition of a relationship between the prime and target. The method has been successfully used with children (Brooks and Tomasello, 1999; Savage, Lieven, Theakston, and Tomasello, 2003; Huttenlocher, Vasilyeva, and Shimpi, 2004). In this study, children participated in a simple picture-description task, which provides an ecologically valid set-up for children but which is also equally appropriate for adults. Importantly, however, the procedure allows manipulation of subtle non-structural constraints, such as animacy or discourse status (also see Serratrice, 2008)." Confounding Variables When one designs an experiment, one is only interested in the effects of the IVs on the DV. However, sometimes, other variables unintentionally vary alongside the manipulated variable (i.e. the IV). This results in confounding. Example Some researchers argue that the choice of s- vs. of-possessives is influenced by the type of relationship between the possessor and the possessum (see e.g. Rosenbach 2008 in Lingua): s- is preferred for prototypical possessive relations (e.g. ownership, kinship, part-of-relations) whereas abstract relations tend to be realized by of-constructions (e.g. the shape of the house). Animacy and the type of relation are closely linked. For instance, PRs in kinship and ownership relations are animate. Thus, if one wants to explore effects of PR-animacy, one has to keep the type of relationship constant (e.g. use items with partof relationships only: the leg of the table/lady vs. the table's/lady's leg). Moreover, PRlength and PM-length need to be controlled for because they can affect construction choice as well (try this out for yourselves). Irrelevant Variables Participants The personal properties of individual participants (e.g. their general speed in responding to questions or their ability to concentrate during an experiment) should be irrelevant for the experiment. Researchers use large groups of participants so that the individual differences, e.g. in reaction times, will be averaged out. 5 Procedure The situation in which an experiment is carried out and the instructions given to participants might influence the outcome of the experiment. Therefore, care must be taken to keep the situation as similar as possible, e.g. by developing a standardized experimental procedure and written instructions that are given to each participant before the experiment. Randomisation As participants become more tired in an experiment, responses to stimulus items which are presented at the end of an experiment might be slower or more incorrect. Thus, one cannot simply present all phrases with animate PRs first and then all phrases with inanimate PRs. One has to make sure that both types of phrases are randomly distributed across the experiment. Constants In contrast to variables, constants do not vary. They are fixed, i.e. remain the same. You can turn an IV or a confounding variable into a constant, .e.g. by keeping word or phrase length constant across the different conditions of the experiment. Hypotheses Researchers make predictions about the effects of the IV(s) on the DV. These predictions are formulated as experimental hypotheses. Hypothesis must be testable. I.e., it must be possible for the predicted effects to occur or not to occur. The experimental hypothesis (or: alternative hypothesis) predicts that an effect occurs. The null hypothesis predicts that this effect does not occur. There are two types of experimental hypotheses: A non-directional hypothesis predicts that there is an effect, but does not make any statement about the direction of the effect (e.g. improvement vs. deterioration). A directional hypothesis predicts the direction of an effect. In an experiment, the researcher wants to reject the null hypothesis. Then, the experimental hypothesis is supported. Exercise Develop a research question that is related to genitives/possessives and discuss potential IVs, DVs, confounding and irrelevant variables, hypotheses. 6 Design Types correlational design repeated measures design independent groups design mixed design Latin Square design Correlational Designs Two different measures are obtained for each participant and one tries to determine whether there is a relationship between the measurements prediction: There is a positive correlation between the measurements (the higher the score for variable X, the higher the score for variable Y) OR there is a negative correlation between the measurements (the higher X, the lower Y) Repeated Measure Designs other names: within group design, same subject design, related design The same participants are measured several times. prediction: There is a difference between the measurements Independent Group Designs other names: between-group design, different subject design, unrelated design Two groups of participants are measured and the measurements of the two groups are compared. Groups can differ with respect to one variable, e.g. age, proficiency level, L1, .. Then there is one IV. Groups can also differ with respect to several of these variables. Then, there is more than one IV. prediction: there is a difference between the measurements Mixed Designs An experiment can involve both repeated measures and independent groups (mixed design). Latin Square Designs In order to minimize effects of items-specific properties, items in different conditions should be as similar to one another as possible. On the other hand, one has to avoid repetitions of materials. For this purpose, so-called Latin-Square Designs were developed. In such designs, sets of minimally contrasting stimuli are used and counterbalanced across lists. 7 A Latin-Square Design Example One might want to vary PR-definiteness as it has been shown to affect construction choice (Rosenbach 2008 in Lingua). For this, it would be ideal to create minimal pairs of items that only differ with respect to the crucial IV, for instance one could compare construction choice for items with definite PRs (the table's leg vs. the leg of the table) with construction choice for items with indefinite PRs (a table's leg vs. the leg of a table). If one presented both parts of such minimal pairs to each participant, this might highlight the difference with respect to definiteness. Hence, participants would be altered to the purpose of the experiment and they might develop strategies that could lead to unreliable results. Thus, researchers would like to use sets of minimally contrasting stimuli (the same phrase pairs, with definite vs. indefinite PR). However, one would try to avoid presenting each variant from this set to each of the participants. Moreover, each participant should be presented with each minimal pair. Both requirements can be met in a Latin-Square Design, where sets of minimally contrasting stimuli are used and counterbalanced across lists. Each of these lists is given to a subset of the participants from each of the experimental groups. The number of lists is determined by the number of contrasting conditions. I.e., for example 1, one would have two lists of stimulus items: Table 1: A Latin Square Design for Example 1 LIST I LIST II s/of-pair 1 definite PR indefinite PR s/of-pair 2 indefinite PR definite PR s/of-pair 3 definite PR indefinite PR s/of-pair 4 indefinite PR definite PR s/of-pair 5 definite PR indefinite PR s/of-pair 6 indefinite PR definite PR s/of-pair 7 definite PR indefinite PR s/of-pair 8 indefinite PR definite PR s/of-pair 9 definite PR indefinite PR S/of-pair 10 indefinite PR definite PR s/of-pair 11 definite PR indefinite PR s/of-pair 12 indefinite PR definite PR s/of-pair 13 definite PR indefinite PR s/of-pair 14 indefinite PR definite PR s/of-pair 15 definite PR indefinite PR s/of-pair 16 indefinite PR definite PR s/of-pair 17 definite PR indefinite PR s/of-pair 18 indefinite PR definite PR s/of-pair 19 definite PR indefinite PR S/of-pair 20 indefinite PR definite PR 8 Note: In a Latin Square Design, it is crucial that you are working with pairs of stimuli that are only minimally different. This is the case if you only vary definiteness, but keep all the lexical elements the same. However, it is not the case for an animacy variation (PM-leg: PR-table vs. PR-lady). Here, you would not just vary animacy, but also the lexical element (table vs. leg). Hence, you might not want to use a Latin Square Design in this case. Stimuli vs. Distractors/Fillers The stimulus items are those items which represent the levels of the IV that the researcher is manipulating. They are the core of the experiment. Distractor/filler items are included into the experiment to distract participants from the properties of the stimulus items that the researcher wants to investigate. This is necessary to prevent participants from developing strategies or response patterns. This is particularly crucial when the same participants are tested several times – and know that they will be tested on similar items again. Distractors involve different constructions, morphological forms,… than the stimulus items. Exercise In English, one can encode a possessive relation with the marker –s (my neighbour’s car) or with the preposition of (the car of my neighbour). Some linguists have suggested that the animateness of the possessor influences the choice of –s vs. of. Suggest a method and a design for an experiment investigating this question (variables, levels, potential confounding factors, etc.) Discuss the appropriateness of the following stimuli: the book of John vs. John’s book James’ car vs. the car of James lilies of the valley vs. the valley’s lilies the dog’s hair vs. the hair of the dog this year’s best wine vs. the best wine of this year Bloomingdale’s lift vs. the lift of Bloomingdale Discuss potential distractors for a construction choice experiment (s vs. of). Consider: o Which types of construction alternatives can you think of? o Which factors determine construction choice for these alternatives? Randomisation As participants become more tired in an experiment, responses to stimulus items which are presented at the end of an experiment might be slower or more incorrect. Thus, one has to ensure that the order in which stimulus items are presented is random. E.g., when one compares reaction times for the recognition of irregularly inflected word forms (e.g. went) with reaction times for regularly inflected word forms (e.g. walked) one cannot simply present all regular word forms first and then all irregular word forms. One has to make sure that both types of word forms are randomly distributed across the experiment. I.e. .the items from the different conditions (corresponding to the different levels of the IV) have to be equally distributed across different parts of the experiment. 9 Exercises Question 1 Some researchers have argued that the of-possessive is preferred for possessives with longer possessor phrases (the car of my old neighbour with the broken leg) while spossessives are preferred for possessives with shorter possessor phrases (my neighbour's car). Describe the basic design of an experiment that you could use to test this claim. Discuss: Method: o Ability/Modality tested o Time-sensitivity o Stimuli provided Independent Variable(s) Dependent Variable Potentially confounding variables and how they can be controlled Constants Question 2 Some researchers have argued that the of-possessive is preferred for possessives with longer possessor phrases (the car of my old neighbour with the broken leg) while spossessives are preferred for possessives with shorter possessor phrases (my neighbour's car). Discuss the advantages and disadvantages of using (i) a controlled experiment, (ii) spontaneous speech recordings or (iii) elicitation games. Consider: the age and abilities of participants the frequency of the construction under study the relative reliability and validity of the three types of methods 10