MARKING SALIENCE: THE SIMILARITY OF TOPIC AND FOCUS1 JENNIFER E. ARNOLD University of Pennsylvania Past research on choice of reference form and other aspects of language processing has suggested that referents differ in terms of their salience. But what makes referents salient? In this paper I investigate how salience is influenced by topic and focus constructions. In Experiment 1, I show that the referents of both the sentential topic and the focus of cleft-like constructions are more salient than other referents. This result is consistent with past psycholinguistic research, but is surprising in light of the traditional opposition between topic and focus in linguistics. A partial explanation of this result is given with a corpus analysis, which shows that these constructions share an additional characteristic: referents of subjects and foci are both more likely to be referred to again in the following discourse than the other referents in their respective utterances. In Experiments 2 and 3, I further investigate the role of the focus by comparing it with the discourse topic. These experiments show that when a clearly defined discourse topic is present, the focus is not as salient as when there is no clear discourse topic. This suggests that salience is a competitive property, such that the salience of one referent is sensitive to the salience of other, competing referents in the discourse. Implications for the nature of salience are discussed. How do speakers decide when to use a pronoun to refer to something, and when to use a fuller form, like a name or description? This question has been the subject of much research in linguistics, psychology, philosophy, and computer science, in part because of its centrality to understanding how human beings think, speak, and represent the world around them. Most researchers explain the appropriateness of particular forms of reference by appealing to the salience of the referent in the discourse model that is shared among discourse participants.2 For the most part, pronouns are used when the speaker believes that the referent is already available 1 The results of Experiment 1 and the Corpus Analysis were presented in a paper at the 1997 meeting for the Linguistic Society of America, under the title "What is Salience?: The role of topic and focus in processing reference". I would like to thank Thomas Wasow for his guidance at every stage of this project. Thank you also to Amit Almor, Mira Ariel, David Beaver, Eve Clark, Herbert Clark, Miriam Eckert, Adele Goldberg, Maryellen MacDonald, Ellen Prince, Peter Sells, and John Trueswell for their comments on earlier drafts of this paper and/or discussions of these data. Thank you to Carie Lemack for her help with designing and running Experiment 2. None of these people are responsible for any shortcomings of this paper. 2 My use of the term "discourse participants" includes both the speaker and all listeners. I will often use the terms "speaker" and "listener", assuming a prototypical situation of a dialogue. However, "speaker" is also meant to extend to writers and other producers of language, and "listener" also represents readers and other comprehenders. In a dyadic conversation, the listener is also the addressee. However, my data do not specifically distinguish between language comprehension from the perspective of an addressee or an overhearer. page 1 in the listener's consciousness (i.e., when the entity is "given") (Ariel, 1990; Chafe, 1976; Gundel et al., 1993; among others). But there may be many things in the listener's consciousness, and the speaker needs to know that the listener will choose the correct one. To account for this process, past work on reference has suggested that the entities in the joint discourse model will differ in terms of salience, prominence, activation, or accessibility. Although these terms are not identical, for the purposes of this paper I will categorize them together, and use "salience" as a cover term.3 But what makes things salient? In this paper, I will focus on how the salience of discourse entities is influenced by two kinds of linguistic properties: "topic" and "focus". An inspection of past research reveals an apparent discrepancy between linguistic analyses of information structure and psycholinguistic studies on reference: while linguists have traditionally treated "topic" and "focus" as opposing categories (see e.g. Gundel, 1988; Lambrecht, 1994; Tomlin et al., 1997), different psycholinguistic experiments have revealed that both topic and focus constructions can make their referents salient. The experiments in this study will investigate this discrepancy, and propose an explanation for why certain topic and focus constructions have similar effects on language processing. In the first experiment, I will directly compare the grammatical subject of nonclefted utterances (a topical position) with the focus of clefts (a focus construction). I will demonstrate that both categories make their referents more salient, such that subsequent reference is more natural as a pronoun. I will then use a corpus analysis to investigate the similarity between these categories, and show that the referents of subjects and the focus of clefts share the property that each is more likely than other entities to be mentioned in the following discourse. If the foci of clefts are salient, what happens when they co-occur with a well-established topic? Experiments 2 and 3 investigate this situation, and find that the salience of the focus position is highly susceptible to competition from other discourse entities: if a competing 3 Later in the paper I will discuss salience more specifically in terms of the activation of referents in the discourse participants' mental representations. This approach is detailed more fully in Arnold (1998). page 2 discourse entity is highly salient, it appears to draw salience away from the focus. This finding further suggests that salience is a dynamic, competitive phenomenon. 1. SALIENCE AND LANGUAGE PROCESSING. In this paper I am attempting to contribute to a processing account of salience and reference form, which carries several important assumptions. The major underlying issue at hand is how linguistic information leads some people, things, or ideas to be more saliently represented in the mind of the listener. Language comprehension is known to result in a non-linguistic mental model of the discourse (e.g., Johnson-Laird, 1983, 1989; Miller, 1979a, 1979b; Morrow et al., 1987, van Dijk and Kintsch, 1983). It is also known that language use depends on speakers and hearers knowing what is in “common ground” (e.g., Clark 1992, Part I), or the “joint discourse model”. However, it is not entirely known how the joint discourse model is built. I critically assume that speakers and listeners do not know a priori what is salient for others, so they have to infer it. I am further assuming that people estimate the salience of an entity probabilistically, in a graded way. For example, if Ann has good evidence that an object is salient for Bob, then her representation of that object in the joint discourse model will be high on the salience continuum. If Ann has only weak evidence of the object's salience, she will represent that object with only middle-level salience. If there is no evidence for the object's salience, she will represent it as not salient (or not represent it at all).4 So, what evidence do people use to infer the salience of an entity in the joint discourse model? I will report on three studies that investigate how topic and focus constructions affect salience, and reveal the competitive nature of salience. 4 I am assuming here that speakers refer only to the joint discourse model during production (and furthermore that they rely on comprehenders to use the linguistic information successfully to build the same joint discourse model). However, this simple assumption is likely to be only an approximation of actual discourse processes (e.g., Hanna et al., 1998; Horton and Keysar, 1996; Arnold et al. 1999) Thus, it is likely that the model I am developing here will have to be replaced by a more complex model, which accounts for the extent to which speakers use information from the joint discourse model, and the extent to which other constraints influence their choices in reference form. However, the studies in this paper investigate information that is available to all discourse participants, so these results would be consistent with a more fully developed model. page 3 2. SALIENCE AND REFERENCE FORM. The tool I will use to measure salience is variation in reference form. This method rests on an important assumption about salience and reference form. Past research has demonstrated that the speaker's choices of forms of reference are greatly influenced by the salience / accessibility of the referent (e.g., Ariel, 1990; Arnold, (to appear); Du Bois, 1987; Givón, 1983; Gundel et al., 1993). In addition, the listener's comprehension of these referring forms is affected by the appropriateness of a linguistic form in a given situation (e.g., Almor, 1995; Gordon et al., 1993; Hudson-D'Zmura and Tanenhaus, 1997). The tight association between salience and reference form means that evidence about reference form can be taken as evidence about the cognitive status of the referent. For example, if an experiment shows that pronouns tend to be used more for the referents of grammatical subjects than for other referents, this can be taken as evidence that the grammatical subject is associated with a more salient representation of that referent. This is the logic employed by many studies on reference (e.g., Givón, 1983; Gordon et al., 1993), and which I will also adopt for this paper.5 3. TOPIC AND FOCUS: SIMILAR OR DIFFERENT? Topic and focus are informational categories that have been used to explain various linguistic phenomena. Some studies about these phenomena have looked at the syntactic constructions that can be used to encode topics or foci. Others have asked what cognitive properties are associated with each category. In both of these cases, most work has concentrated on how topical or focused referents relate to their status in the PRIOR discourse, studying the conditions under which something can be put in a topic construction or focus position (e.g., Prince, 1978, 1981, 1992; Ward and Birner, 1998), or the kinds of syntactic constructions used for topics or foci (e.g., Aissen, 1992). In both cases, topics and foci are contrasted with each other 5 This assumption is reasonable, given that the factors affecting reference form can be categorized into two general factors (Arnold, 1998): 1) those factors that affect the speaker’s assumption about the salience of referents in the joint discourse model, and 2) the ambiguity of the situation. For example, in a context with only one inanimate referent, the pronoun "it" will be sufficient to specify that referent, but in a context with several inanimate referents, a fuller form may be needed. In this study I will hold the ambiguity of a situation constant, or cross-balance it against other factors, which means that reference form preferences are a plausible indicator of the salience of a referent. page 4 syntactically and informationally (for reviews see Gundel 1988; Lambrecht, 1994; Tomlin et al., 1997). By contrast, I am interested in a complementary question: how do topic and focus constructions influence the discourse status of entities in the FOLLOWING discourse? This question is germane to determining the nature of salience. Consider topics first. Theories of referential form often describe salience in terms of topical attributes. There are several theories that describe the relationship between cognitive accessibility and choices in referring forms, e.g. Givon's (1983) Topicality continuum, Gundel et al's (1993) Givenness hierarchy, Ariel's (1990) Accessibility hierarchy, Prince's (1981) Familiarity Scale, and Chafe's (1994) Activation states. All five theories encode similar continua, but the same attributes that are termed “topical” by Givon are described by others in terms of cognitive accessibility. The link between salience and topicality is also apparent where topics are described in terms of high accessibility. For example, Gundel (1988) argues that an entity has to have a certain level of "givenness" (or salience) to be felicitously used as a topic (i.e., it has to be assumed to be familiar among the discourse participants.) Because "salience" and "topicality" are often described in terms of similar properties, one might think that salience could be understood in terms of topicality. This conclusion would also be supported by experimental research on the role of grammatical subjects. Sentential topics are not always the same as the grammatical subject, but they have often been associated with this role, especially when they are in sentence-initial position (Chafe, 1994; Lambrecht, 1994:131; Levy, 1982, as cited in Ariel, 1990: 24; Reinhart, 1982). Research has demonstrated that speakers and writers consistently use pronouns (or zero anaphora) more often than full names or descriptions to refer to an entity that last appeared in subject position (Arnold, in press; Brennan, 1995; Stevenson et al. 1994; Sanford and Garrod 1981:137-8). In terms of comprehension, Gordon et al., (1993) and Hudson-D'Zmura and Tanenhaus (1998) both found that the referents of grammatical subjects ("subject-referents") were more salient than non-subject-referents: comprehension was facilitated if the next sentence used a page 5 pronoun to refer to subject-referents, but it was inhibited if a repeated name was used. In contrast, the "repeated-name penalty" was not observed for non-subject-referents. Both studies concluded that their results reflected the higher salience of subjects over non-subjects. Since grammatical subjects are considered topical in English, the association of subjects with salience supports the connection between salience and topicality. However, salience can not be fully defined in terms of topicality. While the above studies found repeated-name penalties for subject entities, Almor (in press) found a similar effect for the entities denoted by the foci of clefts. In a self-paced reading experiment, he compared reading times for reference to the focused entity with reading times for reference to the nonfocused entity. For instance, in the following example, "the bird" is compared with "the fruit" under conditions where each referent is focused. (1) Sample stimuli form Almor (in press), Experiment 1 1a. It was the robin that ate the apple. The bird seemed very satisfied. FASTER READING TIMES 1b. It was the robin that ate the apple. The fruit was already half rotten. 2a. What the robin ate was the apple. The bird seemed very satisfied. 2b. What the robin ate was the apple. The fruit was already half rotten. FASTER READING TIMES Almor's results showed that subjects read the underlined anaphor in the second sentence faster if it referred to the focused referent than if it referred to the nonfocused referent. By contrast, when repeated NPs were used (e.g., "the bird" referring to an entity last mentioned as "the bird"), he found the opposite effect: a repeated-NP penalty for focused entities but not unfocused ones. Under the assumption that reading is facilitated for accessible information, these results suggest that the focus of a cleft is more saliently represented than the nonfocus, a conclusion which is also supported by Carpenter and Just's (1977) study. In sum, past research suggests that both subjects and the focus of clefts make their referents more salient than other elements in the utterance. Since the subject is a topical position page 6 but the focus of clefts is a focus position, this observation contrasts with the traditional opposition between topics and foci in terms of their pragmatic status in the prior discourse context. This paper will investigate the apparent similarity between topics and the foci of clefts.6 The evidence for the subject/focus similarity, presented above, comes from different studies, with slightly different methodologies. Because of this, and because topics and foci have so often been contrasted with each other, it is necessary to confirm the similar roles that subjects and the foci of clefts play with respect to salience. Experiment 1 addresses this question. 4. EXPERIMENT 1: SUBJECT AND FOCUS. The goal of the first experiment was to test the hypothesis that the topic and the focus of clefts are more saliently represented than other referents, by explicitly comparing these two categories. More specifically, I wanted to test the hypothesis that the supposed salient representation of topics and foci results in an increased preference for less-specified forms of reference. However, any study of categories like "topic" and "focus" is only beneficial to the extent that these categories are well defined. For this reason, I will use the grammatical subject in this study as an operationalization of sentential topic, following the studies discussed above. In the discussion of this experiment, I will avoid any confusion about the category I am discussing by using the term "grammatical subject". Similarly, the category "focus" is operationalized as the focus of cleft-like constructions such as "the one she saw was Ann".7 6 An additional question concerns the extent to which the salience of the foci in Almor's study is generalizable to other types of foci. Theories of information structure posit that all sentences contain a focus (e.g., Lambrecht, 1994: 206), even if they aren't syntactically marked with a cleft. In this paper, I discuss salience with respect to only one type of focus: that marked by clefts. It may be that other types of focus-marking phenomena, like prosody, might make referents salient in the way clefts do. This question will be left to future research. 7 Although this is not strictly a wh-cleft, it has been referred to as a "cleft with a lexical head" (Prince 1978). page 7 4.1 METHODS. PROCEDURE. The method for this experiment was a rating questionnaire that elicited off-line judgments about naturalness. Participants read three-sentence paragraphs and were asked to rate them for naturalness on a 7-point scale. A sample stimuli set is shown in Figure 1. 1. 2a. 2b. 3a. 3b. The guests were nervously standing around in the living room, trying to decide which person to talk to. Ann decided to say hi to Emily first. The one Ann decided to say hi to first was Emily. Emily/She looked like the friendliest person in the group. Ann/She hated to be in a room full of people where no-one was talking. FIGURE 1: Sample stimuli for Experiment 1 The experiment followed a (2x2x2) design, investigating three factors: 1) focus construction (did the second sentence include a focus construction ("the one..." or not?); 2) 3rdsentence continuation (was the subject of the second sentence the 1st or 2nd mentioned character?), and 3) type of reference for 3rd-sentence subject (noun vs. pronoun). All items appeared an equal number of times in each condition, but only once for a given participant. Half the participants saw stimuli with "the one..."8 constructions, and half saw stimuli with no focus construction. PARTICIPANTS. 144 subjects rated 12 stimuli and 10 fillers in exchange for one candy bar. Time of participation was approximately 10 minutes. All subjects were native speakers of American English. "Native speaker" was defined as having begun to learn English by age 5. PREDICTIONS. Based on the findings of Gordon et al. (1993) and others, I expected that for the non-focused stimuli, the pronominal versions would be rated higher when the referent was the subject of the second sentence, but not when the referent was the non-subject. Based on the findings of Almor (in press), I expected the opposite pattern for the focused stimuli, where I expected the pronominal versions to be rated higher for the focus. 8 For variation, two of the stimuli used the construction "the person..." instead of "the one...". page 8 RESULTS. The ratings for each subject9 were first normalized to eliminate individual biases for some subjects to rate the sentences higher overall than other subjects. The noise resulting from this variation was reduced by taking advantage of the fact that all subjects saw the exact same set of filler stimuli. Each individual's average score for all the filler stimuli was subtracted from the average filler score for all the subjects, and this normalization score was then added to each stimulus score for that individual. The normalized scores were then submitted to a threeway ANOVA analysis. When a subject skipped an item for some reason, the average score for that subject's stimuli items was substituted. The average ratings for each category are displayed in Table 1. The results showed that when the second sentence did not contain a focus construction, higher ratings were given to stimuli with pronouns in the third sentence that referred to the 1st mentioned entity (the topic), and to stimuli with names that referred to the 2nd mentioned entity. In contrast, when the second sentence did contain an explicit focus construction the preferences were reversed: item ratings showed that people preferred pronouns to refer to the 2nd mentioned character, the focused referent, but they preferred names to refer to the nonfocused entity. TABLE 1 Normalized average scores for each category in Experiment 1. The higher rating is underlined. Focus Construction NO pronoun name YES pronoun name 5.14 > 5.00 4.40 < 4.68 2nd mention 5.19 < 5.35 4.80 > 4.71 1st mention 9 One subject was excluded because he was not a native speaker of English, one subject was excluded because he was a native speaker of South African English, not American English, 2 subjects were excluded because they rated all stimuli items equally, and 4 random subjects were excluded in order to make the cells even. The total number of subjects included in the analysis is 136. page 9 The results of the ANOVA showed a main effect for focus construction (F1 (1,134) =5.469, p< .03; F2(1,11)=38.638, p<.001) and a main effect for 3rd-sentence continuation (F1(1,134)=8.996, p< 0.005; F2(1,11)=4.736; p<.06). The crucial result, however, was the threeway interaction between focus construction, pronominalization, and 3rd-sentence continuation, which was significant by subjects and marginally significant by items (F1(1,134)=6.690, p<.02; F2(1,11)=4.047, p<.07). No other factors were significant (p's >.1) The results for this experiment showed that readers preferred stimuli that used pronouns over stimuli that used names for referring to both the grammatical subject entity (a topic construction) and the focus of "the one" constructions (a focus construction). By contrast, when the third sentence contained a reference to the nonsubject or nonfocus, they preferred stimuli that used names over stimuli that used pronouns. These results corroborated the findings of Almor (in press) and Gordon et al. (1993). If these preferences for reference form are assumed to reflect the mental representations of the referents, the results suggest that the referents of both subjects and the focus of clefts are more salient than other arguments. This conclusion suggests that despite the traditional opposition between topic and focus, the two categories may be more similar to each other than previously thought. At the same time, the results of this study raise the question of why it should be that these two constructions share the role of designating their referents as more salient. It is this question that the following corpus analysis aimed to answer. 5. CORPUS ANALYSIS: SUBJECT VS. FOCUS. The results of the Subject/Focus questionnaire suggested that the referents of both subjects and the focus of clefts are more saliently represented than other referents in an utterance. What is it about these two constructions that gives rise to this property? It cannot derive from topicality alone, and neither can it be explained in terms of focus properties alone. The hypothesis that I investigated with this corpus analysis was that both subjects and the focus of clefts are good indicators that their referents will be mentioned again in the following page 10 utterance. That is, both constructions may be pointers to the topic of the following utterance. In a "normal", nonclefted utterance, the best bet for the topic of the following utterance is the topic of the current utterance, since speakers usually talk about the same thing for extended periods of time. On the other hand, a clefted utterance is a marked construction that the speaker may employ to indicate that the topic will shift to the referent of the focus. This follows from Weil's (1887/ 1844) proposal that the focus of one utterance is related to the topic of the next (as cited Tomlin et al., 1997). If this is the case, I expected that following nonclefted utterances, speakers would refer to the referent of the subject more often than other elements in the utterance, but that following clefted utterances, speakers would refer more often to the focus of the cleft. To investigate this hypothesis I conducted a corpus analysis, using the Aligned-Hansard Corpus from 1986. The Aligned-Hansard corpus is a collection of transcripts from the Canadian parliament. It has two advantages. First, the discourse it represents is natural and communicative. Second, it is a relatively formal genre of discourse, which is likely to raise the occurrence of clefts, a relatively formal construction.10 Using the Aligned-Hansard corpus, I compared nonclefted utterances with object-clefted whclefts, following the method described below. My hypothesis, as mentioned above, was that speakers would refer to the referents of subjects and focus of clefts more frequently than to other referents in the preceding utterance. 5.1 METHOD. I analyzed the 1986 Aligned-Hansard corpus by extracting two types of utterances: wh-clefts (also sometimes called pseudo-clefts) and nonclefted sentences. To find wh-clefts I searched for "What" (with a capital W) that was not followed by a question-mark. I only considered those utterances that clefted the object, because I was interested in the comparison between the embedded subject and the focused NP. I also discarded cases where there was no following utterance, or where the following utterance belonged to another speaker, yielding 146 total examples 10 The drawbacks of this corpus are that some speakers may have had prepared notes at their disposal, the transcripts are doubtlessly edited to remove disfluencies, and some segments may be translated from French. page 11 of wh-clefts. The comparison set of data was a random sample of non-clefted sentences, which I assembled by opening each file from 1986, scrolling down three pages, and taking the main clause of the utterance where my cursor landed. In this set I excluded questions, sentences with dummy subjects or nonreferential subjects (like the generic "one"), and cases where there was no following sentence by the same speaker. I also excluded utterances that did not have either a direct object or an object-of-preposition, since I was interested in comparing objects with the foci of clefts.11 This method resulted in 167 examples. For each sentence in my sample, S1, I identified the next "independent" utterance, S2, which I defined as a clause that is finite, not a sentential complement of the matrix clause, and not a relative clause. I then coded S2 for the first reference to an entity that had been referred to in S1 (if any). The categories I found in this analysis are presented in Table 2. 11 Note that it is a conservative choice to exclude cases with no object or object of preposition, because many sentences have more than one object or object of preposition. Therefore, if reference continuations were distributed randomly, there would be a higher chance of the speaker referring to an object-referent than to the subject-referent. Therefore my sample is slightly biased against my hypothesis that subsequent references will refer more to the subject-referent than object-referents. page 12 TABLE 2 Identity (in S1) of the Referent of the First Referring NP in S2 Nonclefted Sentences (S1) 1. Subject 2. Part of Subject 3. Related to Subject 4. Object/Obj of PP 5. Part of Object 6. Whole 9. None - matrix clause subject part of matrix clause subject (e.g., "I" when previous subject was "my Party") something related to the matrix clause subject, such as another member of a set matrix clause object or object of PP part of matrix clause object the whole sentence, or an entire constituent within the sentence (such as an entire PP, VP, or subordinate S). there is no referent in common with S2 Clefted Sentences (S1) 1. Embedded Subject subject of embedded clause (e.g., What we need is...) 2. Related to Embedded Subj. Related to subject of embedded clause 3. Focused NP NP in the focus of the Wh-cleft 4. Part of Focused NP part of the NP in the focus of the Wh-cleft 5. Related to Focused NP related to the NP in the focus of the Wh-cleft 6. Embedded Object object of embedded clause 7. Unclear 8. Whole the whole sentence, or an entire constituent within the sentence (such as an entire PP, VP, or subordinate S). 9. None there is no referent in common with S2 As mentioned above, I looked at S2 to find the first NP that had the same referent as any of the NPs in S1, and if so, which one. The hypothesis behind this methodology was that certain categories (namely, subjects and focused NPs) may be indicators that their referents are likely to be mentioned again in the following discourse. If more than one referent from S1 was mentioned in S2, only the first one was analyzed. Once a referent was identified, I noted how it had been referred to in the first utterance (S1) -- that is, what was the grammatical function of the reference in S1. The following examples show how particular cases were coded. page 13 TABLE 3 Examples from Corpus Analysis first reference example subject S1: The charter is giving individuals a new opportunity to seek redress when their rights are infringed upon. S2: It also imposes on all an obligation to be tolerant of the rights of others. subject S1: The hon . gentleman states that the Government has turned down options. S2: He knows that is not correct . . object of PP S1: I have talked to other whites who have worked in the South African police force S2: and through them I have learned something about the enormously tragic environment of the prison system in South Africa and the harshness of the treatment to which many . . . focus S1: What the Government finally got was a deck hand whose name is Mr. Lander. S2: He has been busying himself rearranging the deck chairs on the deck of the Titanic. focus S1: What we have now is a breath of fresh air , S2: and the people of Canada welcome it . The codings for each utterance were tabulated in terms of four categories: 1) Subject (or embedded Subject, 2) Object / Cleft, 3) Other (references to other NPs or the whole utterance), and 4) None. The first category (Subject / embedded Subject) included cases where the referent was part of the subject or related to the subject, and the second category included cases where the referent was part of or related to the object or cleft. Examples of this type of utterance are provided in Table (4). page 14 TABLE 4 Examples of "Part of" and "Related to" codings. (1) Part of Subject a. The unfortunate aspect of the dilemma facing our producers is that the problem is not of their own doing. Our farmers are the innocent victims of the costly madness of subsidies which has gripped the European Economic Community . . . b. From time to time my Party speaks about the need to democratize these centres of power and put them in the hands of ordinary Canadians. I suggest that those are some constructive and positive means by which we could seek to do that. (2) Part of Object I would like to quote the words of the Hon. Member , the Liberal health critic. He said: "Higher prices might be justified if it led to more drug research in Canada. (3) Part of Cleft What the Hon. Member has not answered with respect to Challenge '86 '86, has not refuted, is the allegation, the observation, that there has been a significant shifting of funds from the non-profit service and municipal sector to private enterprise. I have no problem with funding training opportunities in private enterprise. (4) Related to Embedded Subject What the Government is entering into now is a bilateral negotiation by appointing trade envoys. The U.S. administration said to Canada that . . . (5) Related to Cleft What we are really talking about is a farm crisis. Just the other day the Committee on Agriculture met with Dr. Lillian Walker and Dr. James Walker. One might wonder about the rationale for categorizing cases where the speaker makes reference to a part of an argument or something related to an argument. The idea behind this choice is that this corpus analysis is measuring discourse patterns -- that is, how often do speakers continue talking about the referents of subjects, and how often do speakers continue talking about the referents of the focus of clefts? Whatever the speaker's choice for continuing a discourse, the listener will interpret the following utterances in terms of what came before, engaging in what Clark and Haviland (1977) call bridging inferences. If the speaker mentions an entity that represents only a subpart of the referent of an NP in the previous sentence, or something that relates to a referent in the first sentence, the listener will need to access the page 15 representation of that referent in order to establish a connection. This idea is also reflected in claims by Almor (in press) and Prince (1981:251), who suggest that reference to one member of a set makes the other members of the set more accessible as well. Therefore, to understand how discourses are frequently continued, it is necessary to include cases where an argument is the topic of the discourse, even if it is not explicitly or fully mentioned. 5.2 RESULTS. The results showed that the most frequent type of continuation for nonclefted utterances is with the referent of the subject, but for clefted utterances it is with the referent of the focus. The total number of continuations in each category is listed in Table (3.) The results were analyzed with a chi-squared analysis of distribution, which showed that these four categories behaved differently with respect to nonclefts and clefts (2(3)=45, p<.001). TABLE 5 The number in each category refers to the total number of cases where the first referent in S2 co-referred with the NP in that position in S1. Data for the Subject and Object are for nonclefted utterances, and data for the Embedded Subject and Focus are for the clefted utterances. Noncleft categories / Cleft categories subject / embedded subject object / focus of cleft other none nonclefts 74 46 25 22 clefts 25 81 7 33 The percentages of total references for Clefts and Nonclefts are presented graphically in Figure (4). In this figure, as in Table (5), references to the subject of nonclefted utterances are grouped with references to the embedded subject in clefted utterances, and the object of nonclefted utterances is grouped with the focus of clefted utterances. page 16 Nonclefts (n=167) 60% 50% 55% Clefts (n=146) 44% 40% 28% 30% 23% 20% 17% 15% 10% 13% 5% 0% sub ject/em bedd ed sub ject obj ect/focus oth er/ who le non e Gramma tical Function of Referent Figure 3. Percentage of total references in each category for both clefted and nonclefted utterances. 5.3 DISCUSSION OF EXPERIMENT 1 AND CORPUS ANALYSIS. The results of the experiment and corpus analysis showed a remarkable parallel between subjects and the foci. In the Subject/Focus study, readers gave higher ratings to stimuli that used pronouns to refer to the referents of both the subject of nonclefted sentences and the focus of clefts. This pattern contrasted with higher ratings for stimuli that used names to refer to the referents of objects and embedded subjects. This finding, which is consistent with past results of on-line studies (Gordon et al., 1993; Almor, in press), indicates that both subjects and the focus of clefts cause their referents to be represented more saliently in the mind of the comprehender. The most striking result is that Experiment 1 found that the focus-referent was similar in terms of salience to the referent of a prototypical topic position like the subject. This shows that salience cannot be understood purely in terms of topicality. The corpus analysis provides one potential reason for the similarity between subjects and foci: both positions are associated with an increased tendency to appear in the following page 17 discourse. That is, the probability of re-mention is higher for subject-referents than other referents in non-clefted sentences, and the probability of re-mention is higher for focus-referents than other referents in clefted sentences. Thus, there is a three-way connection between preference for pronouns, probability of subsequent mention, and the positions of subject and focus, as depicted in Figure 4. Increased preference for pronouns over names Subject-referents Focus-referents Increased probability of subsequent reference Figure 4. The relation between pronoun use, probability of subsequent mention, and grammatical positions, as shown by Experiment 1 and the Corpus Analysis. The parallel findings of experiment 1 and the corpus analysis support the following hypothesis about why these two positions are salient. Recall that comprehenders need to identify the topic of a sentence. Subjects are often topics, so they are probabilistically taken as topics. The current topic is likely to be the topic of the next utterance, because speakers are usually coherent, and topic continuation is more frequent overall than topic shift. By contrast, cleft constructions signal a probable change in topic, to the focus of the cleft. The degree to which something is expected to be the topic of the following utterance determines its degree of salience. The more salient a referent is to the comprehender, the more likely it is that the speaker will use a pronoun to refer to it. This story about referent salience can also be described on a different level. The results presented here show that subjects and the focus of clefts provide a similar resource for speakers, in that speakers can use these positions to signal their interlocutors about the discourse status of certain referents. Those referents that are placed in subject or focus position are relatively more salient, and therefore are good candidates for pronouns that may occur in the subsequent utterance. In this way, subjects and the focus of clefts are similar in that they provide similar discourse functions. This hypothesis makes sense if we think about what topichood might mean for language comprehension. Topics are what the sentence is "about", in a way that is relevant to the current page 18 context (Gundel, 1988; Lambrecht, 1994; Reinhart, 1982). Since speakers tend to stick with the same topic for awhile, the thing which a sentence is "about" is the thing which is likely to be mentioned again in the following utterance, and is also the thing most salient to the speaker. This means that if subject-referents tend to be topical , they should also receive multiple reference. This prediction is borne out in the corpus analysis of nonclefted utterances. Focusreferents are also likely to be mentioned again, as shown by the corpus analysis of clefted utterances, supporting the hypothesis that clefts signal a topic shift. This account has an attractive property of being consistent with other work on language processing, which has also shown that people are sensitive to the degree to which certain information is likely to occur (e.g., Lieberman, 1963, Marslen-Wilson and Tyler, 1980), to the extent of causing comprhension errors in certain ritualized contexts (Cushing, 1994). 5.4 FURTHER QUESTIONS. The account I just presented explains salience in terms of the degree to which a referent is a likely topic of the following utterance. One question this raises concerns the relative weight of topics and the foci of clefts in marking a referent as a likely topic of the following utterance. If topics and foci are similar in marking salience, as the preceding results suggest, then what happens when an utterance contains a cleft-focus AND a well-established topic? Are both referents equally salient, or do they compete with each other? If it is the latter, then which one "wins out"? The following two experiments address this question. 6. EXPERIMENT 2: DISCOURSE-COMPLETION. The stimuli in Experiment 1 were designed to contain either a topic (the subject) or a focus (the focus of a cleft), but not both. In the condition with a focus construction, the focus co-occurred with an embedded subject, which is not a highly topical position. In the condition without a focus construction, the subject co-occurred with an object or object of preposition, which are not specially marked focus positions. Therefore, Experiment 1 did not bear on the question of which referents are more salient when the topic and focus are competing against each other. page 19 Experiment 2 was designed to pit topics and foci against each other. In order to compare these categories, however, it is important to establish how they are operationalized. In the first experiment, "topic" was defined as the grammatical subject, which is most naturally thought of as a sentence-level topic. However, foci of clefts can not co-occur with simple subjects of the type investigated in Experiment 1. The entity expressed as the subject in a non-cleft, as in (2a), is expressed as an embedded subject, as in (2b). (2a) (b) The burglar heard the neighbor's dog. What the burglar heard was the neighbor's dog. Thus, in order to directly compare the focus of a wh-cleft with a topic, the embedded subject must be defined as the discourse topic. The discourse topic in the stimuli for this experiment was established by introducing one character as the subject of the first sentence, and referring to this character with a pronoun in the following two sentences. The referent of the discourse topic was compared with a second character, which was introduced either as the focus of a cleft or as an object / object of PP. The experiment aimed to discover which of these two referents would be treated as more salient by the participants. 6.1. METHODS. PROCEDURE. The experiment was conducted in the language laboratory at Stanford University. Each participant sat in a cubicle outfitted with a tape recorder and headset with a microphone and earphones. The stimuli were provided in written form on several sheets of paper, with several spaces between each stimulus item in order to prevent subjects from reading the next stimulus item while completing the current one. Each stimulus item consisted of a three-sentence story, an example of which is given in (7). The first sentence introduced one character and provided the context for the story. In the second sentence, the same character was referred to again with a pronoun. The third sentence consisted of either a clefted or nonclefted utterance. This sentence also introduced a new character, as either the object of a nonclefted utterance or the focus of a cleft. The clefted page 20 utterance was always an object cleft, such that both utterances contained two participants: a subject/embedded subject and an object/ focus. All characters were introduced with proper names.12 (3a) Ron was looking through his address book, trying to make up his mind. He had an extra ticket to the opera, but he didn't know which friend to invite. The one he decided on at last was Kysha / Fred. (b) Ron was looking through his address book, trying to make up his mind. He had an extra ticket to the opera, but he didn't know which friend to invite. At last he decided on Kysha / Fred. Participants were asked to read each paragraph into the tape recorder. At the end of each paragraph, their task was to add a natural continuation sentence to the paragraph. They were asked to make the continuation as seamless and natural as possible. It was specified that the continuation should be another sentence, and not an elaboration of the last sentence. However, no strict definition of "a sentence" was provided, and some continuations had to be discarded because they consisted of conjoined sentences (with "and", "but", or "because"), relative clauses, or other elaborations of the stimulus sentence. The experiment used a total of 24 items, which were rotated through the four conditions and presented to subjects in four lists. These items were combined with 32 items belonging to another experiment, such that each experiment served as a distracter for the other. The items from the other experiment also had three sentences and used proper names, but followed a different structure than the items from this experiment. PARTICIPANTS. 16 native speakers of English from the Stanford community participated in this and another experiment in exchange for $7. Native speakers were defined as having begun 12 The experiment also included an additional, orthogonal manipulation of ambiguity. Each stimulus appeared half the time with two names that represented the same gender, and half the time with two names that represented different genders. In all cases, names were chosen according to the author's estimation that they are commonly associated with only one gender. In a few cases where the participants did not take the intended interpretation, their responses were excluded. This variable was not the purpose of this study, and therefore will not be discussed in this paper. page 21 to learn English by age 5. The approximate time to complete both experiments was forty-five minutes. 6.2 RESULTS. The responses of each participant were recorded, and later transcribed.13 Each response was coded according to the same system as the corpus analysis: the first time the participant referred to one of the characters in the preceding utterance, that reference was coded in terms of 1) who the referent was, and 2) how that reference was made, with a name or pronoun. Thus, each sentence was given one of the following codings for who the referent was: a) NP1 (subject or embedded subject), b) NP2 (object or focus), or c) other/ none. For those cases that referred to either NP1 or NP2, a coding of either "pronoun" or "name" was also recorded. Examples of participant completions are shown in (8). (8) STIMULUS: Leslie arrived at the party where she didn't know anyone. She surveyed the guests in the living room, planning her first move. The one she decided to say hi to was Bob/ Pam. / She decided to say hi to Bob/ Pam. SAMPLE COMPLETIONS: • Pam always stood next to the to the munchies so she thought it would be a good place to go. (NP2) • Leslie was very nervous about meeting people. (NP1) • She walked over and sat down next to Bob and asked him how his day went. (NP1) • Bob was probably the only person that she recognized. (NP2) • He was really good looking and kinda had looked over her looked over at her with a smile. (NP2) • Hi Pam, she said. (NP1) • Bob and Leslie had grown up together in the back hills of the Appalachians. (other) The goal of this study was to find out when people would tend to use pronouns, and when they would tend to use names. Based on the results of Experiment 1, I predicted that in a 13 14 responses were excluded. Reasons for exclusion included continuing the last sentence rather than beginning a new one, skipping the item, adding a nonsensical continuation, or interpreting the name of one of the characters with the unintended gender. For example, the context of the sentence made it clear that once a participant interpreted the name "Lynn" as a male name (although it was intended as a female name), and one participant interpreted "Ned" as a female name (although it was intended as a male name). page 22 nonclefted utterance, they would use pronouns more when referring to the subject, but in a clefted utterance, they would use pronouns more when referring to the focus. The results, which are presented in Table (5), did not support this prediction. TABLE 5 Percentage of pronouns used, according to the position of the referent (NP1 vs. NP2) and the type of utterance (cleft vs. noncleft). Only the initial references for each completion are included.14 NP1 referent NP2 referent Nonclefts 98% (n=40) 12% (n=142) Clefts 97% (n=34) 12% (n=137) The results demonstrated that the referent identity greatly influenced the form of reference that participants used in their continuations. Pronouns were almost always used to refer to the first-mentioned character, both when it was the subject and when it was the embedded subject. In contrast, names were almost always used to refer to the second-mentioned character, both when it was the object and when it was the focus. However, the type of utterance that preceded the continuation did not affect the form of reference at all. These results were submitted to a logistic regression analysis, using SPSS, which indicated that the referent identity (NP1 vs. NP2) contributed significantly to the prediction of the dependent variable (p<.001), but the type of utterance (cleft vs. noncleft) and the interaction (referent identity x utterance type) did not (p's>.1). 6.3 DISCUSSION. The results of Experiment 2 suggest that when the focus of a cleft co- occurs with a strongly established discourse topic, the topic "wins out" as the more salient 14 An alternate method of coding includes all references to NP1 or NP2, where only the first reference was included if the same character was mentioned more than once. This method sometimes resulted in two data points for a given completion. The pattern of results with this method is similar, except that there is a lower percentage of parallel Subjectto-Subject references, which results in a lower percentage of pronouns referring to Subjects. NP1 referent NP2 referent Nonclefts 89% (n=83) 12% (n=163) Clefts 84% (n=77) 11% (n=170) Like the results reported in Table 5, only referent identity was selected as significant by the logistic regression analysis. page 23 referent of the two. This conclusion is supported by the fact that participants usually used names (88%) to refer to the focus of the cleft, but they almost always used pronouns (97%) to refer to the discourse topic. At the same time, it is somewhat surprising that these results showed absolutely no effect of the focus construction, in that names were used equally as often to refer to the focus in the focus condition as to the object in the non-focus condition. The results of Experiment 1 indicated that cleft constructions make the focus more salient than other referents, but Experiment 2 did not confirm these findings. So why did this difference occur? One diference between the two experiments occurred in the stimulus design. The content of the stimuli was identical, but the operationalization of "topic" was different in each case. Experiment 1 compared the focus of a cleft with the grammatical subject, which represented the sentence-level topic, while Experiment 2 compared it with a discourse-level topic, operationalized by repeated mention of one character. It is possible that when a topic is established through repeated reference to one character, other weaker constraints (such as the use of a cleft construction) cannot influence the representation of new entities as much. 7. EXPERIMENT 3: TOPIC VS. FOCUS. Experiment 3 was designed to investigate why Experiment 1 found an effect for cleft constructions, but Experiment 2 did not. I hypothesized that the difference in how the topic was operationalized in each experiment contributed to the difference in results. In particular, I hypothesized that when one character was established as a discourse topic, it became very highly salient, and far more predictable than other characters. This salience drew the attention away from the other character, regardless of whether it was the focus or not. Following research on syntactic processing (e.g., MacDonald et al., 1994), I hypothesized that discourse entities would compete with each other for salience in the listener's representation. At a broader level, this hypothesis is consistent with proposals that people have limited attentional resources (see Clark, 1996: 274-6), and cannot focus their attention on multiple referents at the same time. Thus, increasing the salience of one character necessarily means decreasing it for another. page 24 Experiment 3 investigated the salience of the focus when it co-occurred with a discourse topic versus when it occurred with a sentence topic. The crucial question to be answered here is whether the existence of a discourse topic can move the referent of the focus to a lower point on the scale of salience. For that reason, this experiment was restricted to the comparison of pronouns and names that referred to the focus of a cleft, under two conditions: with a discourse topic and without one. 7.1 METHODS. This experiment used the rating questionnaire methodology of Experiment 1. As in Experiment 1, participants read three-sentence paragraphs and were asked to rate them for naturalness on a 7-point scale. A sample stimuli set is shown in Figure 5. DISCOURSE TOPIC Eleanor arrived at a party where no one knew anyone else. She sat in the living room, trying to decide how to break the ice. The one she spoke to first was Emily. {Emily/She} looked like the friendliest person in the room. NO DISCOURSE TOPIC Everybody arrived at a party where no one knew anyone else. They all sat in the living room, trying to decide how to break the ice. The one Eleanor spoke to first was Emily. {Emily/She} looked like the friendliest person in the room. FIGURE 5: Sample stimuli for Experiment 3 Thus, the experiment investigated two factors: 1) Did the story contain a clear discourse topic (yes vs. no); and 2) Is reference to the focus made using a noun or a pronoun? The experiment used a total of 24 items. The stimuli were put into four lists, following the same procedure as for Experiment 2. This experiment was also combined with 12 items belonging to another experiment, plus 24 distracters. All distracters and additional stimuli had 3 sentences, but followed a different structure. The hypothesis was that in the presence of a clear discourse topic, the non-topic character (introduced as the focus of a cleft) would become relatively less salient, by virtue of the fact that more attention would be allocated to the discourse topic, and less to the focus. Since I was only measuring reference form preferences for the focus, I expected that the results would show page 25 relatively higher scores for the pronoun condition when there was no discourse topic than when there was a discourse topic. PARTICIPANTS. 129 native speakers of English from the Stanford community participated in this study.15 60 people were paid $3 plus one candy bar, and 2 people were paid $5 for their participation in this rating questionnaire, which lasted approximately 15 minutes. 67 people received course credit for their participation in this and another experiment, which totaled approximately one hour.16 7.2 RESULTS. Following the same procedure as for Experiment 1, the ratings were normalized to adjust for individual biases in rating scales. An inspection of the means in Table 6 shows that when the stimuli contained a discourse topic, participants gave higher ratings for the versions where the focus was referred to with a name. By contrast, when there was no discourse topic, the preference for the name versions disappeared. These scores were submitted to two two-way analyses of variance, one with the subjects as a random variable, and one with items as a random variable. The critical result from this experiment was the two-way interaction between topicality and referring form (F1 (1,11) = 5.101, p<.05; F2 (1, 23) = 7.451, p<.05). The ANOVA also revealed a main effect of topicality (F1 (1,111) = 12.779, p<.001; F2 (1, 23), p<.01), and a marginal main effect of referring form (F1 (1, 11) = 3.733, p<.06; F2 (1, 23), p<.09). TABLE 6 Normalized average scores for each category in Experiment 3. pronoun name 15 The scores for 17 participants were excluded: five people had not begun learning English by age 5, one participant skipped an entire page, and one provided answers that appeared too pattern-like (suggesting that the individual had not actually paid attention to the stimuli). It was also necessary to exclude ten randomly selected participants to produce even numbers of participants per cell. This left the scores of 112 participants in the analysis. 16 For these 67 participants, three of the filler items were slightly modified, so that they wouldn't create a conflict with the other experiment. The modifications consisted of changing "ice cream truck" to "ice cream van", "When the sun went over the hill" to "When it got dark", and "until the sun rose" to "until morning". page 26 Reference to FOCUS in the Discourse Topic condition 4.52 Reference to FOCUS in the No Discourse Topic condition 4.43 7.3 DISCUSSION. < = 4.73 4.42 The results of Experiment 3 demonstrated that changing the topicality of the non-focused NP has a significant effect on the salience of the focus of a wh-cleft. When the stimuli contained a discourse topic, names were preferred to refer to the new NP, which was introduced as the focus of a cleft. When the stimuli did not contain a clear discourse topic, the name preference disappeared, indicating that the focus was relatively more salient in the no-topic condition than the topic condition. The results of this experiment have two important implications. First, they explain the different results of Experiments 1 and 2: In Experiment 1, where the focus-referent co-occurred with the relatively non-salient referent of the embedded subject, the focus-referent appeared to be highly salient. In Experiment 2, by contrast, the focus-referent co-occured with a thrice-repeated discourse topic, and appeared no more salient than the object of a non-cleft. Experiment 3 demonstrated that when a discourse topic has been established by repeating one referent, the other referent becomes less salient, and we observe a shift towards a preference for names over pronouns. This result suggests that the difference in the results of Experiments 1 and 2 reflected a difference in the level of salience of the focused referent. The second implication of these results is more generally that the degree of salience of one referent can affect the degree of salience of the other. Recall that in Experiment 3, the degree of topicality of the NON-FOCUSED character was manipulated, but this had an effect on the degree of salience of the FOCUSED character. This result supports the idea that the language comprehender has limited attentional resources, and cannot pay attention to all aspects of the discourse equally. Thus, when one referent becomes more salient, other referents necessarily lose salience. This competitive view of salience is compatible with the view presented here, that salience is a scalar property that can rise or fall. The results of Experiment 3 further support a model in which salience is identified with the activation of mental representations, and where the page 27 representations of different referents in a particular discourse compete for activation. This approach is discussed further in section 8.1. At the same time, this experiment produced a somewhat unexpected result, which is that there was no pronoun preference for the focus of clefts, in either condition. In the topical subject condition, there was a clear name preference for reference to the focus, but in the nontopical subject condition, participants on average rated both pronoun and name conditions equally (4.43 and 4.42, respectively). There are several possible reasons for this. On one hand, there may be a tendency for the first-mentioned NP to be fairly salient, regardless of whether it is the subject or not (Gernsbacher and Hargreaves, 1988). If this is the case, it is still competing with the focus to a certain degree, even when it is not clearly the discourse topic. A second possibility is that participants may view other entities as the discourse topic, such as "Everyone" in Figure 5 above, which would provide some competition for the focus. If this is the case, it may be that the focus construction is a weaker indicator of salience than identity as discourse or sentence topics, and more susceptible to competition from other entities. In contrast, the referent of the grammatical subject may not be as susceptible to competition from the discourse topic. However, the data from Experiment 3 do not bear on these speculations, since this experiment did not manipulate whether the discourse contained a focus construction or not. For the same reason this result does not contradict the finding from Experiment 1 that the focus construction makes the focus relatively more salient than the grammatical object in comparable utterances. 8. GENERAL DISCUSSION. The major finding from this Experiment 1 were that pronouns were preferred over names to refer to the referents of the subject (a topic position), the focus of cleft-like constructions, and the discourse topic. These results join a host of studies that show that grammatical subjects make their referents more salient (e.g., Hudson-D'Zmura and Tanenhaus, 1998; Gordon et al., 1993) and a handful of other studies that show that the focus is the most accessible entity in constructions like clefts (e.g., Almor, in press; Carpenter and Just, 1977). The corpus analysis revealed a second similarity between subjects and the foci of clefts: the entities denoted by page 28 both are the things that are most likely to be mentioned in the following utterance. I argued that the salience of subject-referents and focus-referents can be thought of in terms of the degree of likelihood that the entity will be mentioned again. Experiment 2 investigated the relative salience of topics and the foci of clefts, and found that the salience associated with the focus is not a static feature. When there was a clearly established discourse topic, it made no difference whether or not the other character was introduced as the focus of a cleft, in that names were used predominately for both objects and the focus of clefts. Thus, when the focus co-exists with a highly salient discourse topic, it is less likely to be perceived as salient. The results of Experiment 3 suggested that the focus of a cleft is more salient when it is not competing with a discourse topic (or a sentential topic, for that matter): readers preferred names to refer to the focus of the cleft to a greater degree when the discourse contained a well-established, highly salient topic than when it did not. 8.1 TOPICALITY, SALIENCE, AND LANGUAGE PROCESSING. In section 5.3 I presented an account in which the degree of salience was linked to the degree to which comprehenders perceived an entity to be likely to be the topic of the following utterance. Under this view, the current topic-referent is salient because topics tend to persist over several utterances, and the focus-referent is salient because clefts signal a change in topic. This view also fits well with the data from Experiments 2 and 3, which suggest that discourse topics influence the comprehender's estimate of an entity's likelihood of being the topic in the following utterance. Experiments 2 and 3 also showed that the salience of the discourse topic is stronger than that of the subject. This makes sense because, as noted by many authors, the correlation between grammatical subjects and topics is not perfect. Therefore comprehenders can only make probabilistic estimations of the topicality of a subject-referent. By contrast, if a referent is page 29 mentioned repeatedly throughout a discourse, comprehenders have good evidence that it is highly topical.17 Experiments 2 and 3 also revealed an important property of discourse salience: it is competitive. If one referent is highly salient, as is the case when there is a discourse topic, other referents become relatively less salient. The account I have given so far (the "topicality account") is one where salience is viewed as a graded property which is dependent on the probability that a referent will be the topic of the following utterance, where this probability is sensitive to the competition between the existing referents at any point in the discourse. There is another way of saying the same thing, characterizing salience not in terms of topicality, but rather in terms of probabilistic activation (the "activation account"): Comprehenders compute the probability that a given entity will be mentioned in the following utterance, on the basis of information like grammatical function, special constructions like clefts, or the repetition of reference to one entity. (For a discussion of other relevant sources of information, see Arnold,1998). This probability can be represented in terms of the activation of the comprehender's mental representations for each referent. It is this probability of re-mention that determines an entity's salience. When two sources of relevant information make two different things likely to be mentioned again, the two referents compete. The topicality and activation accounts amount to the same thing, with subtle differences. One difference is that the activation account does not assume that comprehenders identify topics or foci in the current utterance, but only that they compute the likelihood of an entity being mentioned again. However, these two things are highly correlated. Topical things are defined as things that the sentence/discourse is "about", which means that coherent speakers should refer to them repeatedly. 17 This idea is formalized in Centering Theory, where the backward-looking Center is similar to what I am calling the "topic". Grammatical subjects are taken as the highest-ranked forward-looking center (Brennan, Friedman, and Pollard, 1987), but referents are not established as backward-looking centers until they have been mentioned in two consecutive utterances. page 30 Another difference is that the activation account does not assume, for example, that comprehenders identify one referent as the expected topic for the next utterance. Rather, it only says that some referents are RELATIVELY more likely to be mentioned again, and therefore are relatively more salient than others. An attractive property of activation is that it makes predictions about language processing. If a referent is highly activated (i.e., salient), it is easier for the comprehender to reactivate their representation of that referent when it is mentioned again. If re-activation is easier, then comprehenders should not need as much information to achieve re-activation, and thus lessspecified forms of reference like pronouns should be sufficient. In sum, the topicality account is useful for thinking of how this account relates to the well-established categories of topic and focus, and to the more general perception that speakers tend to be coherent. However, the activation account has attractive features for thinking about language processing in a way which is consistent with much of the processing literature. An advantage of the approach presented here, whether it is described in terms of topicality or activation, is that it provides a way of thinking of "salience" not as a primitive, but as the result of other discourse factors. 8.2. THE NATURE OF SALIENCE. The above interpretation of these results has additional implications for the concept of salience. The view of salience presented here is more of a "forward-looking" property than a "backward-looking" property, borrowing the terms from Centering Theory. That is, it suggests that the speaker focuses on some referents as salient because these referents are more central to what the speaker is saying. The listener takes cues from the speaker as to which elements are salient for the speaker and therefore more likely to be referred to in the upcoming discourse. This view contrasts with past accounts of reference, many of which have focused on the history of the discourse to determine the salience of any particular referent. For example, many researchers have demonstrated an effect of recency, such that more recently mentioned items are page 31 more salient than those that haven't been mentioned for awhile (e.g., Birner, 1998; Clark and Sengul, 1979; Givón, 1983; Ariel, 1988).18 Similarly, the heightened salience of subjects and the focus of clefts could be seen as information about the history of the discourse. However, two approaches to the study of reference have considered both "backwardlooking" and "forward-looking" information to determine the salience of an entity for the purposes of choosing a referring form. For example, Givón (1983) and colleagues include three measures of topicality: Recency, Ambiguity, and Persistence. Recency, as already mentioned, is a purely backward-looking property. Persistence, on the other hand, measures the number of clauses that contain subsequent reference to an entity, a clearly forward-looking characteristic. In addition, Centering Theory explicitly posits both forward-looking and backward-looking properties of anaphoric expressions (e.g., Grosz et al., 1995). Under the view presented here, information about the past discourse is important, but the reason that it is important is to provide information about the upcoming discourse. Information about the upcoming discourse is relevant because it makes some parts of the discourse predictable, and thus easier to process for the listener. As the corpus analysis demonstrated, subjects and the foci share the property that their referents are more predictable in the following utterance. This similarity is especially striking, because in other respects these two categories have been treated quite differently. 8.3. CONCLUSIONS. In sum, this study has tackled the slippery constructs of salience, topic, and focus. The results point to a link between these concepts, where topic and focus constructions make their referents more probable as topics for the following utterance, thus making them salient. This can also be described in terms of activation, where the probability of a referent being mentioned again determines its degree of activation. When a referent is highly activated or salient, less-specified forms of reference are more natural for subsequent reference. This study uses the notion of 18 See Arnold, 1998, for a discussion of how recency can also be an indication of the importance of an entity to the upcoming discourse. page 32 probabilistic activation to ground the concept of salience in the mechanisms of language production and comprehension. page 33 REFERENCES Aissen, J. 1992. Topic and focus in Mayan. Language 68.43-80. Almor, A. In press. Definite Noun-Phrase Anaphora and Focus. Psychological Review. Ariel, M. 1988. Referring and accessibility. Journal of Linguistics 24.65-87. Ariel, M. 1990. Accessing Noun-Phrase Antecedents. London: Routledge. Arnold, J. E. (1998). Reference Form and Discourse Patterns. Ph.D. Dissertation, Stanford Univeristy. Arnold, J. E., Trueswell, J. C. & Lawentmann., S. M. (1999). Using Common Ground to Resolve Referential Ambiguity. Poster presented at the 40th annual meeting of the Psychonomic Society, Los Angeles, CA. Arnold, J. In press. Multiple Constraints on Reference Form: Null, Pronominal, and Full Reference in Mapudungun. In J. W. Du Bois, L. E. Kumpf, and W. J. Ashby (Eds), Preferred Argument Structure: Grammar as Architecture for Function.. John Benjamins. Birner, B. J. 1998. Recency Effects in English Inversion. Centering Theory in Discourse, ed. by M. A. Walker, A. K. Joshi, & E. F. Prince, 309-323. Oxford: Clarendon Press. Bower, G. H. & Morrow, D. G. 1990. Mental models in narrative comprehension. Science 247.4448. Brennan, S. E., Friedman, M. W. & Pollard, C. J. 1987. A Centering Approach to Pronouns. Proceedings from the 25th Annual meeting of the Association for Computational Linguistics, July 6-9, 1987. Carpenter, P. A. & Just, M. A. 1977. Reading Comprehension as Eyes See It. Cognitive Processes in Comprehension, ed. by M. A. Just & P. A. Carptenter, Hillsdale, NJ: Lawrence Erlbaum Associates. Chafe, W. 1994. Discourse, Consciousness, and Time. Chicago: Chicago University Press. Clark, H. H. & Marshall, C. R. 1981. Definite reference and mutual knowledge. Elements of Discourse Understanding, ed. by A. K. Joshi, B. L. Webber, & I. A. Sag, 10-63. Cambridge: Cambridge University Press. Clark, H. H. & Sengul, C. J. 1979. In search of referents for nouns and pronouns. Memory and Cognition 7.35-41. Clark, H. H. 1992. Arenas of Language Use. Chicago: University of Chicago Press. Clark, H. H. 1996. Using Language. Cambridge: Cambridge University Press. page 34 Cushing, Steven. 1994. Fatal words: communication clashes and aircraft crashes. Chicago: University of Chicago Press. DuBois, J. W. 1987. The Discourse Basis of Ergativity. Language, 634.805-855. Francik, E. P. 1985. Referential choice and focus of attention. Ph.D., Stanford University. Gernsbacher, M. A. & Hargreaves, D. J. 1988. Accessing sentence participants: The advantage of first mention. Journal of Memory and Language 27.699-717. Givón, T. 1983. Topic Continuity in Discourse: An Introduction. Topic Continuity in Discourse: A Quantitative Cross-Language Study, ed. by T. Givón, 1-42. Amsterdam: John Benjamins Publishing. Gordon, P. C., Grosz, B. J. & Gilliom, L. A. 1993. Pronouns, Names, and the Centering of Attention in Discourse. Cognitive Science, 17.311-347. Grice, H. P. 1975. Logic and conversation. Syntax and Semantics 3: Speech acts, ed. by P. Cole & J. L. Morgan, New York: Academic Press. Grosz, B. J., Joshi, A. K. & Weinstein, S. 1995. Centering: A Framework for Modelling the Local Discourse. Computational Linguistics, 212. Gundel, Jeanette K. 1988, Universals of topic-comment structure. Studies in syntactic typology, ed. by Michael Hammond, Edith A. Moravcsik, and Jessica R. Wirth. Amsterdam: John Benjamins. Gundel, J. K., Hedberg, N. & Zacharaski, R. 1993. Cognitive Status and the form of referring expressions. Language, 692.274-307. Hanna. J. E., Trueswell, J. C., Tanenhaus, M. K., and Novick, J. M. 1998. Consulting Common Ground During Referential Interpretation. Paper presented at the CUNY conference on human sentence processing. Horton, W. S. & Keysar, B. 1996. When do speakers take into account common ground?. Cognition 59.91-117. Hudson-D’Zmura, S. & Tanenhaus, M. K. 1997. Assigning Antecedents to Ambiguous Pronouns: The role of the center of attention as the default assignment. Centering in Discourse, ed. by M. Walker, A. Joshi, & E. Prince, Oxford: Oxford University Press. Johnson-Laird, P. 1983. Mental Models. Cambridge, MA: Harvard University Press. Johnson-Laird, P. N. 1989. Mental Models. Foundations of Cognitive Science, ed. by M. Posner, 469-499. Cambridge, MA: MIT press. Kameyama, M. 1996. Indefeasible Semantics and Defeasible Pragmatics. Quantifiers, Deduction, and Context, ed. by M. Kanazawa, C. Piñón, & H. de Swart, 111-138. Stanford, CA: CSLI. page 35 Lambrecht, Knud. 1994. Information structure and sentence form. Cambridge: Cambridge University Press. Levelt, W. J. M. 1989. Speaking. Cambridge: MIT Press. Levy, E. 1982. Toward an objective definition of "Discourse Topic". Papers from the Eighteenth Regional Meeting of the Chicago Linguistic Society, ed. by K. Tuite, R. Schneider, & R. Chametzky, 259-304. Chicago: CLS. Lieberman, P. 1963. Some effects of the semantic and grammatical context on the production and perception of speech. Language and Speech 6.172-175. MacDonald, M. C., Pearlmutter, N. J. & Seidenberg, M. S. 1994. The Lexical Nature of Syntactic Ambiguity Resolution. Psychological Review 1014.676-703. Marslen-Wilson, W. D. & Tyler, L. K. 1980. The temporal structure of spoken language understanding. Cognition, 81.1-71. Marslen-Wilson, W. D. & Tyler, L. K. 1980. The temporal structure of spoken language understanding. Cognition 81.1-71. Miller, G. 1979a. Construction and Selection in the Mental Representation of Text. Cahier De L'Institut de Linguistique de Louvain.5.1-2. 185-197. Miller, G. 1979b. Images and Models, Similes and Metaphors. In A. Ortony (ed.), Metaphor and Thought. Cambridge: Cambridge University Press. Morrow, D. G., Greenspan, S. L. & Bower, G. H. 1987. Accessibility and situation models in narrative comprehension. Journal of Memory & Language 262.165-187. Prince, E. 1978. A Comparison of Wh-Clefts and It-Clefts in Discourse. Language, 544.883-906. Prince, E. F. 1981. Topicalization, Focus-Movement, and Yiddish-Movement: A Pragmatic Differentiation. Proceedings from the seventh annual meeting of the Berkeley Linguistics Society. Prince, E. F. 1992. The ZPG Letter: Subjects, Definiteness, and Information-Status. Discourse Description: Diverse Linguistic Analyses of a Fund-Raising Text., ed. by W. C. Mann & S. A. Thompson, 295-325. Amsterdam: John Benjamins. Reinhart, T. 1982. Pragmatics and Linguistics: An analysis of Sentence Topics. Ms, distributed by the Indiana University Linguistics Club. Sanford, A. J. & Garrod, S. C. 1981. Understanding Written Language. New York: Wiley. Stevenson, R. J., Crawley, R. A. & Kleinman, D. 1994. Thematic Roles, Focus and the Representation of Events. Language and Cognitive Processes 94.473-592. page 36 Tanenhaus, M. K. & Trueswell, J. C. 1995. Sentence Comprehension. The Handbook of Perception and Cognition: Volume 11., ed. by J. Miller & P. Eimas, 217-262. Academic Press. Tomlin, R. S., Forrest, L., Pu, M. M. & Kim, M. H. 1997. Knowledge Integration and Information Management in Discourse. Discourse: A multidisciplinary introduction, ed. by T. van Dijk, London: Sage. Trueswell, J. C. & Tanenhaus, M. K. 1994. Towards a lexicalist framework of constraint-based syntactic ambiguity resolution. Perspectives on Sentence Processing, ed. by J. C. Clifton, L. Frazier, & K. Rayner, Hillsdale, NJ: Erlbaum. van Dijk, T. A. & Kintch, W. 1983. Strategies of discourse comprehension. New York: Academic Press. Ward, G. & Birner, B. J. 1995. Definiteness and the English existential. Language, 71.722-42. Ward, G., & Birner, B. J. 1996. On the Discourse Function of Rightward Movement in English. Conceptual Structure, Discourse and Language, ed. by A. Goldberg, 463-479. Stanford: Center for the Study of Language and Information Publications. Ward, G., & Birner, B. J. 1998. Information Status and Noncanonical Word Order in English. Philadelphia: John Benjamins. Weil, H. 1887 (1844). The order of words in the ancient languages compared with that of the modern languages (C. Super, Trans.). (3rd ed.). Boston: Ginn & Co. page 37 Appendix A: Stimuli for Experiment 1: Subject vs. Focus 1. The camp counselors took all the children and paired them up randmoly for a game. {Fortunately, Judy ended up with Sarah. / Fortunately, the one Judy ended up with was Sarah.} {{Sarah/ She}was known as the best kid in camp at sll sports and games. / {Judy/ She}was glad to be with someone she knew.} 2. The faculty met the other day to discuss the candidates for the new job. {Carol supported Eleanor. / The one Carol supported was Eleanor. } {{Eleanor/ She} was the youngest candidate but the most impressive. / {Carol/ She} tried earnestly to convince the other members of the faculty as well.} 3. By the time the project was turned in, everybody in the group hated everybody else. {Stacy hated Diane the most / The one Stacy hated the most was Diane.} {{Diane/She} had done nothing but complain and boss other people around. / {Stacy/ She} couldn't stand people who didn't pull their weight in a group project.} 4. The fencing coach toald everyone in the class to find a partner. {Tim asked Sean. / The one Tim asked was Sean.} {{Sean/ He} was not only a good fencer but also a good teacher. / {Tim/ He} always tried to fence with people who were better than he was.} 5. The coach told all the girls on the varisty team to help those on the JV team. {Erica had to help Elaine. / The one Erica had to help was Elaine.} {{Elaine/She} was hard to help because she wasn't very motivated. / {Erica/She} hated having to spend time helping the younger players.} 6. The guests were nervously standing around the living room, trying to decide which person to talk to. {Ann decided to say hi to Emily first. / The one Ann decided to say hi to first was Emily.} {{Emily/She looked like the friendliest person in the group. / {Ann / She} hated to be in a room full of people where no-one was talking.} 7. The runners stood at the starting line, sizing each other up. {Amy had to watch out for Cynthia. / The one Amy had to watch out for was Cynthia.} {{Cynthia/ She} was known for tripping her opponents at the far end of the track. / {Amy/She} was cautious of anyone who looked that confident.} 8. The Smith brothers were trying to decide which friend to invite skiing. {Ron wanted to invite Fred. / The one Ron wanted to invite was Fred.} {{Fred/He} had never been skiing before and would love the chance to try it. / {Ron/He} felt guilty not being able to bring all his friends.} 9. The students found out that somebody had vandalized the classroom, but nobody knew who. {John suspected Tom. / The one John suspected was Tom.} {{Tom/He} was the most violent student in the class. / {John/He} couldn't imagine anyone else doing such an awful thing.} 10. The students filed into the class on the first day of school and looked for seats. {Alex sat next to Tony. / The person Alex sat next to was Tony.} {{Tony/ He was the new kid and looked like he needed a friend. / {Alex/ He} was careful to sit next to someone he liked on the first day.} 11. The TAs for the class were trying to think of someone to invite as a guest lecturer. {Immediately Jason suggested Andy. /The person Jason suggested immediately was Andy.} {Andy/ He} was a salient researcher as well as a good speaker. / {Jason/ He} had been a TA before and knew who would be good.} 12. At the end of each year the high school basketball team votes for "Player of the Year". {This year, Eric voted for Nathan. / This year, the one Eric voted for was Nathan.} {Eric/He} always voted for the best defense player rather than the best all-around player. {Nathan/He} was the best all-around player on the team.} page 38