CHAPTER 3 STRUCTURAL PROPERTIES OF SENTENCES One reason why we can process speech so rapidly is our ability to systematically make use of structure in natural language. What do we mean by structure in language? We can define the structure of language in terms of sets of rules that tell us how words strung together can form a sentence and convey a meaning. When we speak of rules that give structure to language, we do not mean that a speaker consciously follows these rules when uttering a sentence. As Levelt (1989) has said, “A speaker doesn’t have to ponder the issue of whether to make the recipient of give an indirect object (as in John gave Mary the book) or an oblique object (as in John gave the book to Mary).” Nor, Levelt goes on to. suggest, does the retrieval of common words require much time or conscious effort (p. 22). These are “automatic” processes over which we exert little conscious control. Yet, for communication to occur, the speaker and the listener must share a common knowledge base, and each must have access to the same knowledge sets and rules. Think for a moment of a simple “sentence” in the abstract, a sentence following the noun-verb-noun form. Think now of the same “sentence” but in the form of an action, The first noun verbed the second noun. Finally, let us instantiate this sentence with specific words: The student read the book. The teacher graded the test. The teacher heard the student Although all three sentences take the form of the noun verbed the noun, the first two sentences are not reversible. That is, while you can say, “The student read the book,” or “The teacher graded the test,” you cannot say, “The book read the student,” or “The test graded the teacher.” Only the third sentence is reversible. You can just as easily say, “The student heard the teacher,” as “The teacher heard the student.” Some actions are possible, and some are not. Real-world knowledge can supply constraints that operate as part of the structure of our language. These properties of language give rise to regularities in the language that make possible a degree of statistical prediction whenever we listen to natural speech. To illustrate, let us begin with the fact that the average college-educated adult may have a speaking vocabulary of 75,000 to 100,000 worth (Oldfield, 1963). Suppose someone was about to say a word to you, and you had to guess what the word might be. If all words in the language were equally probable, the odds of it being any particular word would be between .00001 and .000013. Now, clearly, each word is not equally probable. Some words tend to be used much more frequently than others. In writing, the most frequently used word is the, and in spoken telephone conversations, it is I. In fact, the 50 most commonly used words in English make up about 60% of all the words we speak, and about 45% of those we write. We can put this another way: On average, we speak only about 10 to 15 words before we repeat a word (Miller, 1951). Thus, some words are more predictable than others, even out of context. When words are heard within a context, the effect is even further increased. Imagine someone started to speak to you, but then stopped suddenly in midsentence. If you were asked at that point what you thought the next word might be, you might have a good idea. You could at least say what part of speech the next word might be, whether it would probably be a noun, a verb, an adjective, and so forth. Indeed, you would stand a good chance of correctly guessing the word itself. If someone said, “The train pulled into the... ,“ you might say “station,” or you might say “tunnel.” From your knowledge of language, you would, at the very least, have a high expectation for either a noun or an adjective. Statistical Approximations to English We can capture this predictive quality of natural language by giving people a few words of a sentence and asking them to guess what they think the next word might be. We then show this set of words to another person and ask him to guess the next word, and so on. In this way one can see what people’s linguistic intuitions look like with varying amounts of preceding context. For example, Moray and Thylor (1960) showed subjects the five words, “I have a few little,” and asked the subjects to guess what they thought the next word of this sentence might be. One subject said, “facts.” Moray and Taylor added the word facts, covered the first word, I, and showed”. .. have a few little facts _______“to another subject. This subject said, “here.” A third subject saw the Last set of five worth:”. . . a few little facts here ______“ and was asked to guess the sixth word. This process was continued until an entire 150-word passage was constructed. This example is called a sixth-order approximation to English, because each word was generated based on a context of five preceding words. Here is an extract from Moray and Thylor’s sample. As you read it, it seems as if our artificial speaker is continually on the verge of saying something meaningful, but never quite does: I have a few little facts here to test lots of time for studying and praying for guidance in living according to common ideas as illustrated by the painting. A second-order approximation, where subjects guess the most likely word of a sentence based on seeing only one word of context, would be somewhat lçss English-like: The camera shop and boyhood friend from fish and screamed loudly men only when seen again and then it was jumping in the tree. You might ask what would happen if one created approximations to English after giving subjects a specific context, such as telling them that the words are from a political campaign speech, a romantic novel, or a legal document. The following is a fourth-order approximation to English (each word is based on three words of prior context only) when respondents were told the words were taken from a mystery novel: When I killed her I stabbed Paul between his powerful jaws clamped tightly together. Screaming loudly despite fatal consequences in the struggle for life began ebbing as he coughed hollowly spitting blood from his ears (Attneave, 1959, p. 19). We have long known that increasing the likelihood of words by increasing contextual constraints, either with sentences or with statistical approximations to English, will make the words easier to remember (Miller & Selfridge, 1950), more audible under poor listening conditions (Rubenstein & Pollack, 1963), and more recognizable if they are presented visually for brief durations (Ttilving & Cold, 1963; Morton, 1964). Where Do People Pause When They Speak? Clearly, listeners know a great deal about the structure of their native language. The speech we hear also has an intonation pattern and rhythm to it that can give the listener hints about what is about to be heard. One of these hints can come from the periodic appearance of pauses in spontaneous speech, whether they are “filled” with uhms and ohs, or by silence. They occur as the speaker thinks of what to say and how to phrase it. Some estimates suggest that as much as 40 to 50 percent of speaking time is occupied by pauses that occur as we select the worth we wish to utter. What happens to these natural pauses when we reduce the planning demands on the speaker? Reading aloud from a script does reduce the proportion of pausing, but it may be impossible for a speaker to speak sensibly without pausing at least 20% of the time (Butterworth, 1989, p. 128). Systematic studies verify that the pauses in connected speech tend to occur just before words of low probability in the context, the “thoughtful” words that do not represent a run of association. They suggest that in fluent speech we do not pause to take a breath. Rather, we take the opportunity to breathe during natural pauses determined by the linguistic content of what we are saying (Goldman-Eisler, 1968). in short, although speech that departs from an expected pattern will be harder to predict, the nature of the speech act itself can signal the listener that such an event is upcoming. The lesson to be drawn from this discussion is that sentence perception is a surprisingly active process, even though it is ordinarily accomplished rapidly and without conscious effort. Sentence processing represents a continual analysis of the incoming speech stream to detect the structure and meaning of speakers’ utterances as they are being heard. In order to discover how sentence processing takes place, we must understand how the listener accomplishes syntactic and semantic processing. As we shall see, some theorists have claimed that we conduct syntactic structure and semantic analysis independently, and others have claimed that we ordinarily process them at the same time in an interactive fashion. SYNTACTIC PROCESSING Resolution Is Necessary for Comprehension Although the statistical properties of language say something about the consequences of the speaker’s and listener’s knowledge of language structure, they do not themselves explain this structure. During the 1960s some researchers attempted to use transformational grammar to fulfill this goal. These attempts made two important points relevant to our discussion: the difference between surface structure and deep structure, and the difference between competence and performance. Structure versus Deep Structure The first point was a distinction between the surface structure and the deep structure of a sentence. The surface structure of a sentence is represented by the words you actually hear spoken or read: the specific words we have chosen to convey the meaning of what it is we wish to say. The listener must “decode” this surface structure to discover the meaning that underlies the utterance—the “deep structure” of the sentence. Some sets of sentences have different surface structures, but the same deep structure. An example would be the pair of sentences, The boy threw the ball, and The ball was thrown by the boy. The specific words used—the surface structures—are obviously different The first sentence is a simple active declarative, and the second is a passive. In spite of this difference, both sentences focus on the fact that a boy threw a ball. The two sentences have different surface structures, but they convey the same meaning. They have the same deep structure. By contrast, some sentences can have the same surface structure, but different deep structures. A well-known example is the sentence, flying planes can be dangerous. This sentence could mean that it is dangerous to be a pilot, or it could mean that living near an airport can be dangerous. The distinction between deep structure and surface structure makes an important point for our understanding of sentence processing. It tells us that sentence processing is conducted in two steps in which the listener analyzes the surface structure and uses this information to detect the deep structure. The latter step conveys the meaning of the sentence that is the primary goal of the communicator (Fodor, Bever, & Garrett, 1974). Competence Versus Performance The second point is that the way people produce language is not equivalent to their knowledge of language. Much of what we say consists of incomplete fragments that do not even approach a grammatical sentence (Coldman-Eisler, 196S). This does not mean that we lack the knowledge to produce a complete sentence, or that we do not know the difference between an ungrammatical fragment and a grammatical sentence. The specification of these rules is critical to an understanding of language competence—what the speaker knows about the structure of the language (Chomsky, 1957, 1965). A theory of performance requires an explanation of how we can understand speech, however incomplete and fragmentary it may be. A complete theory of sentence processing must thus take into account both competence and performance. SENTENCE PARSING AND SYNTACTIC AMBIGUITY In the previous section we saw how comprehenders extract syntactic structure from sentences in the form of clausal units. Comprehenders also extract syntactic structure while they are processing clauses word by word. Models of sentence parsing address how the syntactic functions of individual words determine the overall syntactic structure of clauses and sentences. Researchers have found that the way listeners and readers handle ambiguities can offer valuable insights into general processing principles in language comprehension. Local Ambiguity Versus Standing Ambiguity Syntactic ambiguity refers to cases where a clause or sentence may have more than one interpretation given the potential grammatical functions of the individual words. The occurrence of such ambiguities and the fact that language comprehension runs along smoothly in spite of these ambiguities have long been of interest in psycholinguistics. There are two types of syntactic ambiguity of interest. The first is referred to as local ambiguity, and the second is referred to as standing ambiguity. Local ambiguity refers to cases where the syntactic function of a word, or how to parse a sentence, remains temporarily ambiguous until it is later clarified as we hear more of the sentence (Frazier & Rayner, 1989). For example, consider the sentence, When Fred passes the ball, it always gets to its target This sentence is temporarily ambiguous when we hear the noun phrase. the ball, because it could be completed in two different ways, corresponding to two possible syntactic structures. For instance, another completion might be, When Fred passes, the ball always gets to its target. The ambiguity is referred to as local ambiguity because our uncertainty about the structure of the sentence is only temporary. When the reader or listener has encountered the phrase it always gets or always gets the ambiguity is resolved. If we are forced to remain uncertain for too long (if the disambiguating information doesn’t arrive right away), we will find a sentence increasingly hard to understand. The sentence, The rat the cat the dog chased bit ate the cheese, is difficult because we must hold too many incomplete substructures before the sentence is finally complete and the full structure can be seen. Abney and Johnson (1991) clearly summarize the complexities of memory requirements and parsing strategies in the resolution of local ambiguities. A parser could adopt a wait-and-see attitude, holding off making a decision until more information is available. This, however, would ax memory. On the other hand, a parsing strategy that keeps memory load to a minimum would increase the risk of making many preliminary parsing errors at points of local syntactic ambiguity. Some theorists, such as Frazier (1979), have emphasized the need to minimize memory requirements; others, such as Marcus (1980). have emphasized the need to avoid local ambiguities, hence putting a greater burden on memory. Standing ambiguity refers to cases where sentences remain syntactically ambiguous even when all of the lexical information has been received. For example, the sentence, The old books and magazines were on the bench, remains ambiguous even when the sentence is finished. That is, it is not clear whether there should be a boundary after books (the books were old, but the magazines may not have been), or whether a boundary should follow magazines (making it clear that both the books and the magazines were old). Similarly, the sentence, lsaw the man with the binoculars, does not make clear who has the binoculars. Sentences such as these can only be disambiguated by the broader context in which they are encountered. MODELS OF SENTENCE PARSING To understand how theorists have used ambiguous sentences to understand syntactic parsing, consider the sentence, The old man the boats. If you had trouble understanding this sentence, it is probably because you read The old man and assumed it was a noun phrase. When you reached the end of the sentence (the boats) and found no verb, you knew that either the sentence made no sense at all or your initial understanding of the sentence was wrong. If you went back and realized that The old is the noun phrase and man is the verb (meaning “to operate”) the sentence made sense. Sentences that, like this one, are especially misleading when you first encounter them are called “garden path” sentences. Let us think (or a moment about how you might process a sentence such as The old man the boats. Most people’s intuition is that we initially hear only one meaning of the sentence as we are listening to it. Because of this, when we reach the end of the sentence and discover we have done something wrong, we must go back and attempt to reparse the sentence in a different way. Alternatively, your intuition might tell you that as we listen to sentences that contain syntactic ambiguities we process both possible meanings. Even though we are only consciously aware of one of them. In this case when we get to the end of the sentence and discover our interpretation was wrong, we could solve the problem by switching our attention to the alternative interpretation that has already been generated, albeit at the unconscious level. Interestingly, versions of each of these two possibilities have been proposed in the psycholinguistics literature. A theory similar to the first possibility is sometimes referred to as the garden path model of sentence processing. A theory similar to the second possibility is sometimes referred to as the constraint satisfaction model. THE ROLE OF PROSODY IN SENTENCE PROCESSING Prosody is a general term for the variety of acoustic features—what we hear—that ordinarily accompany a spoken sentence. One prosodic feature is the intonation pattern of a sentence. Intonation refers to pitch changes over time, as when a speaker’s voice rises in pitch at the end of a question or drops at the end of a sentence. A second prosodic feature is word stress, which is, in fact, a complex subjective variable based on loudness, pitch, and timing. 1\vo final prosodic features are the pauses that sometimes occur at the ends of sentences or major clauses and the lengthening of final vowels in words immediately prior to a clause boundary (Cooper & Sorensen, 1981; Ferreira, 1993; Streeter, 1978). Prosody plays numerous important roles in language processing. Prosody can indicate the mood of a speaker (happy, angry, sad, sarcastic), it can mark the semantic focus of a sentence (Jackendoff, 1972), and it can be used to disambiguate the meaning of an otherwise ambiguous senence, such as I saw a man in the park with a telescope (Beach, 1991; Ferreira, Henderson, Anes, Weeks, & McFarlane, 1996; Wales & Toner, 1979). A more subtle effect of prosody is the way it can be used to mark major clauses of a sentence. Consider the sentence, In order to do well, he studied very hard. If you say this sentence aloud, you will notice how clearly the clause boundary (indicated here by the comma) is marked by intonation stress, and timing Note especially how speakers automatically lengthen the final vowel in the word just prior to the clause boundary (in this case, the word well). Although Garrett and his colleagues used an ingenious splicing technique to eliminate prosodic cues, this had the effect of underestimating their importance when such cues were present. When studies analogous to the click studies are conducted, but with the formal clause boundary and the prosodic marking for a clause boundary placed in direct conflict, clicks just as often migrate to the point marked by prosody as to the formal syntactic boundary (Wingfield & Klein1 1971). Probably the experiment that cast the most dramatic doubt on whether or not the click studies were tapping on-line perceptual segmentation rather than reflecting a post-perceptual response bias was a study conducted by Reber and Anderson (1970). They found results parallel to the original click studies even when subjects were falsely told that the sentences they would hear contained “subliminal” clicks and asked to say where they thought these clicks had occurred. Although no clicks were actually presented, subjects more often reported having heard them at clause boundaries than within clauses. It is certainly the case that clauses are important to the way people remember speech. In one series of experiments, subjects heard a tape-recorded passage that was stopped without warning at various spots in the passage. The moment the tape stopped, subjects were asked to recall as large a segment as possible of what had just been heard. Generally, subjects’ recall was bounded by full clauses, just as one would expect if major linguistic clauses do have structural integrity (Jarvefla, 1970, 1971). The importance of clause boundaries and other syntactic constituents can also be demonstrated by giving subjects tape-recorded passages and telling them to interrupt the tape whenever they want to immediately recall what they have just heard. In such cases, subjects reliably press the tape recorder pause button to give their recall at periodic intervals corresponding exactly with the ends of major clauses and other important syntactic boundaries (Winglield & Butterworth, 1984). We should not dismiss all elements of an autonomy principle out of hand. Indeed, we will later review evidence for some degree of autonomous processing in the form of activation of word meaning independent of the sentence context in which the word is embedded. Few writers today, however, espouse the early version of syntactic autonomy that implies that analysis at the semantic level must await completion of a full clause or sentence boundary in the speech stream. We do not want to suggest that clauses are unimportant units in sentence processing1 Rather, our question is whether both syntactic and semantic analyses occur together and continuously interact as we hear a sentence. Let us examine the principles of an interactive view of sentence processing before returning to the arguments for processing autonomy still current in the literature.