What cross-linguistic variation tells us about information density in on-line processing John A. Hawkins UC Davis & University of Cambridge Patterns of variation across languages provide relevant evidence for current issues in psychology on information density in online processing. 2 Some background, first of all. I have argued (Hawkins 1994, 2004, 2009, to appear) for a ‘Performance-Grammar Correspondence Hypothesis’: 3 Performance-Grammar Correspondence Hypothesis (PGCH) Languages have conventionalized grammatical properties in proportion to their degree of preference in performance, as evidenced by patterns of selection in corpora and by ease of processing in psycholinguistic experiments. 4 I.e. languages have conventionalized or ‘fixed’ in their grammars the same kinds of preferences and principles that we see in performance, esp. in those languages in which speakers have alternatives to choose from in language use 5 E.g. between: alternative word orders relative clauses with or without a relativizer, with a gap or a resumptive pronoun extraposed vs non-extraposed phrases ‘Heavy’ NP Shift or no shift alternative ditransitive constructions zero vs non-zero case markers and so on 6 The patterns and principles found in these selections are, according to the PGCH, the same patterns and principles that we see in grammars in languages with fewer conventionalized options (more fixed orderings, gaps only in certain relativization environments, etc). 7 If so, linguists developing theories of grammar and of typological variation need to look seriously at theories of processing, in order to understand which structures are selected in performance, when, and why, with the result that grammars come to conventionalize these, and not other, patterns. See Hawkins (2004, 2009, to appear) 8 Conversely, psychologists need to look at grammars and at cross-linguistic variation in order to see what they tell us about processing. since grammars are conventionalized processing preferences. 9 Alternative variants across grammars are also, by hypothesis, alternatives for efficient processing. And the frequency with which these alternatives are conventionalized is, again by hypothesis, correlated with their degree of preference and efficiency in processing. 10 Looking at grammatical variation from a processing perspective can be revealing, therefore. 11 E.g. Japanese, Korean, Dravidian languages do not move heavy and complex phrases to the end of their clauses, like English does, they move them to the beginning, in proportion to their (relative) complexity. If your psychological model predicts that all languages should be like English, then you need to go back to the drawing board and look at these different grammars, and at their performance, before you define and test your model further. 12 Which brings me to today’s topic: What do grammars and typological variation tell us about information density in on-line processing? 13 Let us define Information as: the set of linguistic forms {F} (phonemes, morphemes, words, etc) and the set of properties {P} (ultimately semantic properties in a semantic representation) that are assigned to them by linguistic convention and in processing. 14 Let us define Density as: the number of these forms and properties that are assigned at a particular point in processing, i.e. the size of a given {Fi}-{Pi} pairing at point … i … in on-line comprehension or production. 15 I see evidence for two very general and complementary principles of information density in cross-linguistic patterns. 16 First, minimize {Fi} minimize the set {Fi} required for the assignment of a particular Pi or {Pi} I.e. minimize the number of linguistic forms that need to be processed at each point in order to assign a given morphological, syntactic or semantic property or set of properties to these forms on-line. 17 The conditions that determine the degree of permissible minimization can be inferred from the patterns themselves and essentially involve efficiency and ease of processing in the assignment of {Pi} to {Fi}. 18 Examples will be given from morphological hierarchies and from syntactic patterns such as word order and filler-gap dependencies. 19 Second, maximize {Pi} maximize the set {Pi} that can be assigned to a particular Fi or {Fi}. I.e. select and arrange linguistic forms so that as many as possible of their (correct) syntactic and semantic properties can be assigned to them at each point in on-line processing. 20 A set of linear ordering universals will be presented in which category A is systematically preferred before B regardless of language type, i.e. A + B. Positioning B first would always result in incomplete or incorrect assignments of properties to B on-line, whereas positioning it after A permits the full assignment of properties to B at the time it is processed. These universals provide systematic evidence for maximize {Pi}. 21 Consider first some grammatical patterns from morphology that support the minimize {Fi} principle minimize the set {Fi} required for the assignment of a particular Pi or {Pi} 22 In Hawkins (2004) I formulated the following principle of form minimization based on parallel data from crosslinguistic variation and language-internal selection patterns. 23 Minimize Forms (MiF) The human processor prefers to minimize the formal complexity of each linguistic form F (its phoneme, morpheme, word or phrasal units) and the number of forms with unique conventionalized property assignments, thereby assigning more properties to fewer forms. These minimizations apply in proportion to the ease with which a given property P can be assigned in processing to a given F. 24 The basic premise of MiF is that the processing of linguistic forms and their conventionalized property assignments requires effort. Minimizing the forms required for property assignments is efficient since it reduces that effort by finetuning it to information that is already active in processing through accessibility, high frequency, and inferencing strategies of various kinds. 25 MiF is visible in two sets of variation data across and within languages. The first involves complexity differences between surface forms (morphology and syntax), with preferences for minimal expression (e.g. zero morphemes) in proportion to their frequency of occurrence and hence ease of processing through degree of expectedness (cf. Levy 2008, Jaeger 2006). 26 E.g. singular number for nouns is much more frequent than plural, absolutive case is more frequent than ergative. Correspondingly singularity on nouns is expressed by shorter or equal morphemes, often zero (cf. English cat vs. cat-s), almost never by more. Similarly for absolutive and ergative case marking. 27 A second data pattern captured in MiF involves the number and nature of lexical and grammatical distinctions that languages conventionalize. The preferences are again in proportion to their efficiency, including frequency of use. 28 There are preferred lexicalization patterns across languages. Certain grammatical distinctions are cross-linguistically preferred: certain numbers on nouns certain tenses aspects causativity some basic speech act types thematical roles like Agent, Patient etc 29 The result is numerous ‘hierarchies’ of lexical and grammatical patterns E.g. the famous color term hierarchy of Berlin & Kay (1969), and the Greenbergian morphological hierarchies 30 Where we have comparative performance and grammatical data for these hierarchies it is very clear that the grammatical rankings (e.g. Singular > Plural) correspond to a frequency/ease of processing ranking, with higher positions receiving less or equal formal marking and more or equal unique forms for the expression of that category alone. 31 Form Minimization Prediction 1 The formal complexity of each F is reduced in proportion to the frequency of that F and/or the processing ease of assigning a given P to a reduced F (e.g. to zero). 32 The cross-linguistic effects of this can be seen in the following Greenbergian (1966) morphological hierarchies (with reformulations and revisions by the authors shown): 33 Sing > Plur > Dual > Trial/Paucal (for number) [Greenberg 1966, Croft 2003] Nom/Abs > Acc/Erg > Dat > Other (for case marking) [Primus 1999] Masc,Fem > Neut (for gender) [Hawkins 2004] Positive > Comparative > Superlative [Greenberg 1966] 34 Greenberg pointed out that these grammatical hierarchies define performance frequency rankings for the relevant properties in each domain. The frequencies of number inflections on nouns in a corpus of Sanskrit, for example, were: Singular = 70.3%; Plural = 25.1%; Dual = 4.6% 35 By MiF Prediction 1 we therefore expect: For each hierarchy H the amount of formal marking (i.e. phonological and morphological complexity) will be greater or equal down each hierarchy position. 36 E.g. in (Austronesian) Manam: 3rd Singular suffix on nouns = 0 3rd Plural suffix = -di, 3rd Dual suffix = -di-a-ru 3rd Paucal = -di-a-to (Lichtenberk 1983) The amount of formal marking increases from singular to plural, and from plural to dual, and is equal from dual to paucal, in accordance with the hierarchy prediction. 37 Form Minimization Prediction 2 The number of unique F:P pairings in a language is reduced by grammaticalizing or lexicalizing a given F:P in proportion to the frequency and preferred expressiveness of that P in performance. 38 In the lexicon the property associated with teacher is frequently used in performance, that of teacher who is late for class much less so. The event of X hitting Y is frequently selected, that of X hitting Y with X’s right hand less so. The more frequently selected properties are conventionalized in single lexemes or unique categories and constructions. Less frequently used properties must then be expressed through word and phrase combinations and their meanings must be derived by semantic composition. 39 This makes the expression of more frequently used meanings shorter, that of less frequently used meanings longer, and this pattern matches the first pattern of less versus more complexity in the surface forms themselves correlating with relative frequency. Both patterns make utterances shorter and the communication of meanings more efficient overall, which is why I have collapsed them both into one common Minimize Forms principle. 40 By MiF Prediction 2 we expect: For each hierarchy H (A > B > C) if a language assigns at least one morpheme uniquely to C, then it assigns at least one uniquely to B; if it assigns at least one uniquely to B, it does so to A. 41 E.g.a distinct Dual implies a distinct Plural and Singular in the grammar of Sanskrit. A distinct Dative implies a distinct Accusative and Nominative in the case grammar of Latin and German (or a distinct Ergative and Absolutive in Basque, cf. Primus 1999). 42 A unique number or case assignment low in the hierarchy implies unique and differentiated numbers and cases in all higher positions. 43 I.e. grammars prioritize categories for unique formal expression in each of these areas in proportion to their relative frequency and preferred expressiveness. This results in these hierarchies for conventionalized categories whereby languages with fewer categories match the performance frequency rankings of languages with many. 44 By MiF Prediction 2 we also expect: For each hierarchy H any combinatorial features that partition references to a given position on H will result in fewer or equal morphological distinctions down each lower position of H. 45 E.g. when gender features combine with and partition number, unique gender-distinctive pronouns often exist for the singular and not for the plural English he/she/it vs they the reverse uniqueness is not found (i.e. with a genderdistinctive plural, but gender-neutral singular). 46 More generally MiF Prediction 2 leads to a general principle of cross-linguistic morphology: Morphologization A morphological distinction will be grammaticalized in proportion to the performance frequency with which it can uniquely identify a given subset of entities {E} in a grammatical and/or semantic domain D. 47 This enables us to make sense of ‘markedness reversals’. E.g. in certain nouns in Welsh whose referents are much more frequently plural than singular, like ‘leaves’ and ‘beans’, it is the singular form that is morphologically more complex than the plural: deilen ("leaf") vs. dail ("leaves") ffäen ("bean") vs. ffa ("beans") Cf. Haspelmath (2002:244). 48 All of these data provide support for our minimize {Fi} principle: minimize the set {Fi} required for the assignment of a particular Pi or {Pi} I.e. minimize the number of linguistic forms that need to be processed at each point in order to assign a given morphological, syntactic or semantic property or set of properties to these forms on-line. 49 Either the surface forms of the morphology are reduced, in proportion to frequency and/or ease of processing. Or lexical and grammatical categories are given priority for unique formal expression, in proportion to frequency and/or preferred expression, resulting in reduced morpheme and word combinations for their expression. 50 The result of both is more minimal forms in proportion to frequency/ease of processing/preferred expressiveness, i.e. fewer and shorter forms for the expression of the speakers’ preferred meanings in performance. 51 Consider now some patterns from syntax that support the minimize {Fi} principle minimize the set {Fi} required for the assignment of a particular Pi or {Pi} 52 In Hawkins (2004) I formulated a second minimization principle for the combination of forms and dependencies between them based on parallel data from cross-linguistic variation and language-internal selection patterns: Minimize Domains (MiD). 53 Minimize Domains (MiD) The human processor prefers to minimize the connected sequences of linguistic forms and their conventionally associated syntactic and semantic properties in which relations of combination and/or dependency are processed. 54 E.g. in order to recognize how the words of a sentence are grouped together into phrases and into a hierarchical tree structure the human parser prefers to access the smallest possible linear string of words that enable it to make each phrase structure decision: the principle of Early Immediate Constituents (EIC) (Hawkins 1994). 55 more generally the processing of all syntactic and semantic relations prefers minimal domains (Hawkins 2004). 56 Minimize Domains predicts that each Phrasal Combination Domain (PCD) should be as short as possible. A PCD consists of the smallest amount of surface structure on the basis of which the human processor can recognize (and produce) a mother node M and assign the correct daughter ICs to it, i.e. on the basis of which phrase structure can be processed. 57 Some linear orderings reduce the number of words and their associated properties that need to be accessed for this purpose. The degree of this preference is proportional to the minimization difference for the same PCDs in competing orderings. 58 I.e. linear orderings should be preferred that minimize PCDs by maximizing their “IC-to-word” ratios. The result will be a preference for short before long phrases in head-initial languages like English. 59 (1) a. The man vp[waited pp1[for his son] pp2[in the cold but not unpleasant wind]] 1 2 3 4 5 ----------------------------------b. The man vp[waited pp2[in the cold but not unpleasant wind] pp1[for his son]] 1 2 3 4 5 6 7 8 9 ----------------------------------------------------------------- The three items, V, PP1, PP2 can be recognized and constructed on the basis of five words in (1a), compared with nine in (1b), assuming that (head) categories such as P immediately project to mother nodes such as PP, enabling the parser to construct them on-line. (1a) VP PCD: IC-to-word ratio of 3/5 = 60% (1b) ------------------------------------- 3/9 = 33% 60 For experimental support (in production and comprehension) for short before long effects in English, see e.g. Stallings (1998), Gibson (1998), Wasow (2002). 61 A Corpus Study Testing MiD in English Structures like (1ab) with vp{V, PP1, PP2} were examined (Hawkins 2000) in which the two PPs were permutable with truth-conditional equivalence (i.e. the speaker had a choice). Only 15% (58/394) had long before short. Among those with at least a one-word weight difference, 82% had short before long, and there was a gradual reduction in the long before short orders the bigger the weight difference (PPS = shorter PP, PPL = longer PP): 62 (2) [V PPS PPL] [V PPL PPS] PPL > PPS by 1 word 60% (58) 40% (38) by 2-4 86% (108) 14% (17) by 5-6 by 7+ 94% (31) 6% (2) 99% (68) 1% (1) 63 For head-final languages long before short orders provide minimal domains for processing phrase structure: (3) a. Mary ga [[kinoo John ga kekkonsi-ta to]s it-ta]vp Mary SU yesterday John SU married that said, ‘Mary said that John got married yesterday’ b. [kinoo John ga kekkonsi-ta to]s Mary ga [it-ta]vp 64 Why? Because placing longer before shorter phrases in Japanese positions constructing categories or heads (V, P, Comp, etc) close, or as close as possible, to each other, each being on the right of their respective phrasal sisters. Result: PCDs are smaller 65 (4) Some basic word orders of Japanese grammar a. b. c. Taroo ga vp[tegami o kaita] T. SU letter DO wrote 'Taroo wrote a letter' Taroo ga pp[Tokyo kara] ryokoosita T. SU Tokyo from travelled 'Taroo travelled from Tokyo' np[[Taroo no] ie] Taroo 's house NP-V NP-P Gen-N The heavier phrasal categories, e.g. NPs, occur to the left of their single-word (shorter) heads in Japanese, e.g. before V and P, and P and V are adjacent on the right of their respective sisters 66 For experimental and corpus support for long before short phrases in Japanese and Korean when there is a plurality of phrases before V, see Hawkins (1994, 2004), Yamashita & Chang (2001, 2006), Choi (2007) 67 An early corpus study testing long before short in Japanese (Hawkins 1994): [{NPo, PPm} V] (5) a. (Tanaka ga) [[Hanako kara]pp [sono hon o]np katta]vp Tanaka SU Hanako from that book DO bought, 'Tanako bought that book from Hanako' b. (Tanaka ga) [[sono hon o]np [Hanako kara]pp katta]vp 68 ICS = shorter Immediate Constituent; ICL = longer Immediate Constituent; regardless of NP or PP status (6) ICL>ICS by 1-2 words [ICS ICL V] 34% (30) [ICL ICS V] 66% (59) by 3-4 by 5-8 28% (8) 17% (4) 72% (21) 83% (20) by 9+ 9% (1) 91% (10) Data from Hawkins (1994:152), collected by Kaoru Horie. I.e. the bigger the weight difference, the more the heavy phrase occurs to the left; the mirror-image of English 69 Given these data from performance, we can now better understand: (a) the Greenbergian word order correlations (b) why there are two, and only two, productive word order types cross-linguistically, head-initial and head-final (c) why and when there are “exceptional” departures from the expected head-initial and head-final orders 70 The "Greenbergian" word order correlations (Greenberg 1963, Dryer 1992) (7) vp{V, pp{P, NP}} a. vp[travels pp[to the city]] -------c. vp[travels [the city to]pp] ------------------ b. [[the city to]pp travels]vp -------d. [pp[to the city] travels]vp ------------------- The adjacency of V and P guarantees the smallest possible string of words for the recognition and cnstruction of VP and its two constituents (V and PP), see the underlinings. 71 Language Quantities in Matthew Dryer's (1992) Cross-linguistic Sample (8) a. vp[V pp[P NP]] = 161 (41%) c. vp[V [NP P]pp] = 18 (5%) b. [[NP P]pp V]vp = 204 (52%) d. [pp[P NP] V]vp = 6 (2%) Preferred (a)+(b) with consistent ‘head’ ordering = 365/389 (94%) 72 Both head-initial (English) and head-final (Japanese) orders can be equally efficient for processing: whether heads are adjacent to one another on the left of their respective sisters (English), or on the right (Japanese), hence two and only two highly word order productive types, as predicted by MiD 73 MiD helps us to understand these cross-linguistic patterns and their frequencies. It also enables us to explain some systematic grammatical exceptions to these headordering universals. 74 Dryer (1992): there are exceptions to the preferred consistent head ordering when the category that modifies a head is a single-word item, e.g. an adjective modifying a noun (yellow book). 75 Many otherwise head-initial languages have non-initial heads with the adjective preceding the noun here (e.g. English), many otherwise head-final languages have noun before adjective (e.g. Basque). BUT when the non-head is a branching phrasal category (e.g. adjective phrase, cf. English books yellow with age) there are good correlations with the predominant head ordering. Why? 76 When heads are separated by a non-branching single word, then the difference between, say, vp[V [Adj N]np] and vp[V np[N Adj]] [read [yellow book]] [read [book yellow]] is short, only one word. Hence the MiD preference for noun initiality (and for noun-finality in postpositional languages) is significantly less than it is for intervening branching phrases, and either less head ordering consistency or no consistency is predicted 77 English [yellow book] but [book [yellow with age]] Romance languages have both prenominal and postnominal adjectives French grand homme / homme grand but postnominal adjective phrases like English 78 Similarly, when there is just a one-word difference between competing domains in performance, e.g. in the corpus data of English and Japanese above, both ordering options are generally productive, and so too in grammars. 79 Center embedding hierarchies and EIC The more complex a center-embedded constituent and the longer the PCD for its containing phrase, the fewer languages. E.g. in the environment pp[P np[__ N]] we have a center-embedding hierarchy, cf. Hawkins (1983). (9) Prep lgs: AdjN 32% PosspN 12% RelN 1% NAdj 68% NPossp 88% NRel 99% Mary traveled pp[to np[interesting cities]] np[[this country’s] cities]] np[[I already visited] cities]] AdjN PosspN RelN 80 I.e. The Greenbergian word order universals support domain minimization and locality (Hawkins 2004, Gibson 1998). There are minor and predicted departures from consistent ordering and head adjacency, as we have seen. There are also certain conflicts between MiD and other ease of processing principles, e.g. Fillers before Gaps, which result in e.g. NRel in certain (non-rigid) OV languages (Hawkins 2004, to appear). 81 Apart from these, I see no evidence in grammars for any preference for “non-locality” of the kind that certain psycholinguists have argued for based on experimental evidence with head-final languages (e.g. Konieczny 2000, Vasishth & Lewis 2006). E.g. Konieczny showed in a self-paced reading experiment in German that the verb is read systematically faster when a NRel precedes it, in proportion to the length of Rel. 82 This finding makes sense in terms of expectedness and predictability (Levy 2008, Jaeger 2006): the longer you have to wait for a verb in a verb-final structure, the more you expect to find one, making verb recognition easier. However, Konieczny found no evidence for this facilitation at the verb in his German corpus data (Uszkoreit et al. 1998). Instead the predictions made for the relevant structures by MiD and locality were strongly confirmed. 83 In fact, corpus studies quite generally do not support nonlocality: none of the data from numerous typologically diverse language corpora reported in Hawkins (1994, 2004) support it. 84 Nor do word order universals support it. The Greenbergian correlations strongly support locality, and the exceptions to Greenberg involve either small single-word non-localities or competitions with independently motivated preferences that do produce some non-localities in certain language types – but not because non-locality is a good thing! 85 The experimental evidence for greater ease of processing at the verb appears to be evidence, therefore, for a certain facilitation (arguably through predictability) at a single temporal point in sentence processing: it tell us nothing, about processing load for the structure as a whole, and it does not implicate any preference for non-locality as such. 86 Corpus data appear to reflect these overall processing advantages for alternative structures within which the verb may appear early or late. The predictions for these alternations are based squarely on the preferred locality of phrasal daughters and these predictions are empirically correct (Konieczny 2000, Uszkoreit et al. 1998). Nonlocality arises only when the locality demands of two phrases are in conflict and cannot be satisfied at the same time. E.g. if N is adjacent to its Rel in German, then N is separated from a final V. 87 Grammars also support locality in word order universals and provide no evidence for non-locality as an independent factor. Let us turn now to relative clauses and look at the crosslinguistic evidence for form and domain minimization in this area. 88 Relative clauses in many languages (e.g. Hebrew) exhibit both a 'gap' and a 'resumptive pronoun' structure: (10) a. the studentsi [that I teach Oi] b. the studentsi [that I teach themi] Gap Resumptive Pronoun In English we find relative clauses with and without a relative pronoun: (11) a. the studentsi [whomi I teach Oi] b. the studentsi [Oi I teach Oi] Relative Pronoun Zero Relative 89 Patterns in Performance The retention of the relative pronoun in English is correlated, inter alia, with the degree of separation of the relative clause from its head noun: the bigger the separation, the more the rel pros are retained (Quirk 1957, Hawkins 2004:153). 90 (12) a. [the studentsi [whomi I teach Oi]] visited me b. [the studentsi [Oi I teach Oi]] visited me (13) a. [the studentsi (from Denmark) [whomi I teach Oi]] visited me b. [the studentsi (from Denmark) [Oi I teach Oi]] visited me (14) a. [the studentsi (from Denmark)] visited me [whomi I teach Oi] b. [the studentsi (from Denmark)] visited me [Oi I teach Oi] (12a) (13a) (14a) Rel Pro = 60% Rel Pro = 94% Rel Pro = 99% (12b) (13b) (14b) Zero Rel = 40% Zero Rel = 6% Zero Rel = 1% 91 The Hebrew gap is favored when the distance between head and gap is small, cf. Ariel (1999): (15) a. Shoshana hi [ha-ishai [she-nili ohevet Oi]] Gap Shoshana is the-woman that-Nili loves b. Shoshana hi [ha-ishai [she-nili ohevet otai]] Res Pro that-Nili loves her (15a) Gap = 91% (15b) Res Pro = 9% 92 Resumptive pronouns in Hebrew become more frequent in more complex relatives with bigger distances between the head and the position relativized on, as in (16b): (16) a. Shoshana hi ha-ishai [she-dani siper [she-moshe rixel [she-nili ohevet Oi]]] b. Shoshana hi ha-ishai [she-dani siper [she-moshe rixel [she-nili ohevet otai]]] Shoshana is the-woman that-Danny said that-Moshe gossiped that-Nili loves (her) For just 3+ words separating head and position relativized on (i.e. gap or resumptive pronoun), many more pronouns, Ariel (1999) (16a) Gap = 58% (16b) Res Pro = 42% 93 Relative clauses with larger domains are more complex and harder to process. The harder to process relatives have the less minimal and more explicit form, in accordance with our minimize {Fi} principle above. 94 Specifically, the explicit resumptive pronoun makes the relative easier to process because the position relativized on is now explicitly signaled and flagged, in contrast to the zero gap, and because the explicit pronoun shortens various domains for processing combinatorial and dependency relations within the relative clause (these processes must otherwise access the head noun itself), cf. Hawkins (2004) 95 A Cross-linguistic Universal: the Accessibility Hierarchy Keenan & Comrie (1977) proposed an Accessibility Hierarchy (AH) for universal rules of relativization on different structural positions within a clause: Subjects > Direct Objects > Indirect Objects/Obliques > Genitives (17) a. the professori [that Oi/hei wrote the letter] SU b. the professori [that the student knows Oi/himi] DO c. the professori [that the student showed the book to Oi/himi] IO/OBL d. the professori [that the student knows Oi/hisi son] GEN 96 Relative clauses "cut off" (may cease to apply) down AH, cf. (18): if a language can form a relative clause on any low position, it can (generally) relativize on all higher positions. (18) SU only: SU & DO only: SU & DO & IO/OBL only: SU & DO & IO/OBL & GEN: Malagasy, Maori Kinyarwanda, Indonesian Basque, Catalan English, Hausa (19) ny mpianatrai [izay nahita ny vehivavy Oi] (Malagasy) the student that saw the woman 'the student that saw the woman' (NOT the student that the woman saw) 97 Distribution of gaps to resumptive pronouns across languages also follows the AH with gaps higher and pronouns lower: If a gap occurs low on the hierarchy, it occurs all the way up; if a pronoun occurs high, it occurs all the way down. 98 Languages Combining Gaps with Resumptive Pronouns (data from Keenan-Comrie 1977) Aoban Arabic Gilbertese Kera Chinese (Peking) Genoese Hebrew Persian Tongan Fulani Greek Welsh Zurich German Toba Batak Hausa Shona Minang-Kabau Korean Roviana Turkish Yoruba Malay Javanese Japanese Gaps = Res Pros = SU gap gap gap gap gap gap gap gap gap gap gap gap gap gap gap gap gap gap gap gap gap gap gap gap DO pro pro pro pro gap/pro gap/pro gap/pro gap/pro gap/pro gap gap gap gap * gap gap * gap gap gap gap gap * gap 24 [100%] 0 [0%] 17 [65%] 9 [35%] IO/OBL pro pro pro pro pro pro pro pro pro pro pro pro pro pro gap/pro gap/pro */pro gap gap gap 0 RP * gap 6 [26%] 17 [74%] GEN pro pro pro pro pro pro pro pro pro pro pro pro pro pro pro pro pro pro pro pro pro pro pro gap/pro 1 [4%] 24 [96%] 99 Keenan-Comrie argued that these grammatical patterns were ultimately explainable by declining ease of processing down the AH They hypothesized that the AH was a complexity ranking Cf. Hawkins 1999, 2004:177-190, to appear for elaboration in terms of Minimize Forms and Minimize Domains 100 Keenan (1987) gave data from English corpora showing declining frequencies of relative clause usage correlating with the AH positions relativized on 101 Experimental evidence for SU > (easier than) DO relativization (English) Wanner & Maratsos (1978): first pointed to greater processing load for DO rels Ford (1983): longer lexical decision times in DO rels King & Just (1991): lower comprehension accuracy and longer lexical decision times in self-paced reading experiments Pickering & Shillcock (1992): significant reaction time differences in self-paced reading experiments, both within and across clause boundaries (i.e. for embedded and non-embedded gap positions) King & Kutas (1992, 1993): neurolinguistic support using ERPs Traxler et al (2002): eye movement study controlling also for agency and animacy Frauenfelder et al (1980) and Holmes & O'Regan (1981): similar (SU > DO) results for French Kwon et al (2010): for an eye-tracking study of Korean and a recent literature review of the SU/DO asymmetry in English and other lgs 102 Let us take stock We see in these studies a clear correlation between performance data measuring preferred selections in corpora and ease of processing in experiments, on the one hand, and the fixed conventions of grammars in languages with fewer options: ● SU relatives have been shown to be easier to process than DO in English and certain other lgs - correspondingly lgs like Malagasy only have the SU option ● the distribution of resumptive pronouns to gaps across grammars follows the AH ranking, with pronouns in the more difficult environments, and gaps in the easier ones: this reverse implicational hierarchy appears to be structured by ease of processing 103 All of these data, morphological and syntactic, support minimize {Fi}, in proportion to the ease with which a given property Pi can be assigned in processing to a given Fi. 104 Let is turn now to our second principle of Information Density, maximize {Pi}. maximize the set {Pi} that can be assigned to a particular Fi or {Fi}. 105 In Hawkins (2004) I argued for a further very general principle of efficiency, in addition to Minimize Forms and Minimize Domains: Maximimize On-line Processing. There is a clear preference for selecting and arranging linguistic forms so as to provide the earliest possible access to as much of the ultimate syntactic and semantic representation as possible. 106 This principle also results in a preference for error-free online processing since errors delay the assignment of intended properties and increase processing effort. 107 Maximize On-line Processing (MaOP) The human processor prefers to maximize the set of properties that are assignable to each item X as X is processed, thereby increasing O(n-line) P(roperty) to U(ltimate) P(roperty) ratios. The maximization difference between competing orders and structures will be a function of the number of properties that are unassigned or misassigned to X in a structure/sequence S, compared with the number in an alternative. 108 Clear examples can be seen across languages when certain common categories {A, B} are ordered asymmetrically A + B, regardless of the language type, in contrast to symmetries in which both orders are productive [A+B/B+A], e.g. Verb+Object [VO] and Object+Verb [OV]. Some examples of asymmetries are summarized below: 109 Some Asymmetries (Hawkins 2002, 2004) (i) Displaced WH preposed to the left of its (gap-containing) clause [almost exceptionless] Whoi [did you say Oi came to the party] (ii) Head Noun (Filler) to the left of its (gap-containing) Relative Clause E.g. the studentsi [that I teach Oi] If a lg has basic VO, then NRel [exceptions = rare] (Hawkins 1983) VO OV NRel (English) NRel (Persian) *RelN RelN (Japanese) 110 (iii) Antecedent precedes Anaphor [highly preferred cross-linguistically] E.g. John washed himself (SVO), Washed John himself (VSO), John himself washed (SOV) = highly preferred over e.g. Washed himself John (VOS) (iv) Wide Scope Quantifier/Operator precedes Narrow Scope Q/O [preferred] E.g. Every student a book read (SOV lgs) preferred A book every student read (SOV lgs) preferred 111 In these examples there is an asymmetric dependency of B on A: the gap is dependent on the head-noun filler in (ii) (for gap-filling), the anaphor on its antecedent in (iii) (for co-indexation), the narrow scope quantifier on the wide scope quantifier in (iv) (the number of books read depends on the quantifier in the subject NP in Every student read a book/Many students read a book/Three students read a book, etc). 112 The assignment of dependent properties to B is more efficient when A precedes, since these properties can be assigned to B immediately in on-line processing. In the reverse B + A there will be delays in property assignments on-line ("unassignments") or misanalyses ("misassignments"). If the relative clause precedes the head noun the gap is not immediately recognized and there are delays in argument structure assignment within the relative clause; if a narrow scope quantifier precedes a wide scope quantifier, a wide scope interpretation will generally be (mis)assigned on-line to the narrow scope quantifier; and so on. 113 I have argued that MaOP (in the form of Fillers before Gaps) competes with Minimize Domains to give asymmetries in relative clause ordering: a head before relative clause preference is visible in both VO and OV languages, with only rigid V-final languages resisting this preference to any degree (Hawkins 2004:203-10). 114 VO & NRel: VO & RelN: OV & RelN: OV & NRel: MiD + + - MaOP + + 115 WALS data (Dryer 2005ab): Rel-Noun Rigid SOV 50% (17) Non-rigid SOV 0% (0) VO 3% (3) Noun-Rel or Mixed/Other 50% (17) 100% (17) 97% (116) 116 Language Variation in Psycholinguistics What this all means for psycholinguistics is that grammatical patterns and rules provide data that can inform language processing theories (Hawkins 2007, Jaeger & Norcliffe 2009). Conversely, processing can help us understand grammars better. 117 We can now give an explanation for what has been simply observed and stipulated so far in grammatical models, e.g. the existence of a head ordering parameter, with head-initial (VO) and head-final (OV) lgs being roughly equally productive: they are equally efficient for processing whether adjacent heads occur on the left of their sisters (English), or on the right (Japanese). 118 Performance data motivate the Accessibility Hierarchy for relative clause formation, the cut-offs for relativization, the reverse implicational patterns for gaps and resumptive pronouns, and numerous other regularities and languageparticular subtleties (Hawkins 1999, 2004, to appear). 119 This approach helps us understand exceptions to proposed universals (involving e.g. differential ordering for singleword versus phrasal modifiers of heads). I.e. linguists can benefit from the inclusion of processing ideas in their theories and descriptions. 120 The leftward versus rightward movement of heavy phrases in different language types is directly relevant for processing theories, on the other hand (cf. the theory of de Smedt 1994 which predicts only rightward movements). As is the absence of any independent evidence for “antilocality” in any word order universals. 121 For theories of information density we have seen lots of cross-linguistic patterns and hierarchies in morphology and syntax that support two complementary principles: minimize {Fi} and maximize {Pi} 122 Minimize {Fi} minimize the set {Fi} required for the assignment of a particular Pi or {Pi} in proportion to the processing ease with which each Pi can be assigned. 123 Maximize {Pi} Maximize the set {Pi} that can be assigned to a particular Fi or {Fi} at each point in on-line processing. 124 References Ariel, M. (1999) 'Cognitive universals and linguistic conventions: The case of resumptive pronouns', Studies in Language 23:217-269. Choi, H.W. (2007) ‘Length and order: A corpus study of Korean dative-accusative construction’, Discourse and Cognition 14: 207-27. Croft, W. (1990) Typology and Universals, CUP, Cambridge. de Smedt, K.J.M.J. (1994) 'Parallelism in incremental sentence generation', in G. Adriens & U. Hahn, eds., Parallelism in Natural Language Processing, Ablex, Norwood, NJ. Dryer, M.S. (1992) 'The Greenbergian word order correlations', Language 68: 81-138. Dryer, M.S. (2005a) ‘Order of relative clause and noun’, in M. Haspelmath, M.S. Dryer, D. Gil & B. Comrie, eds., The World Atlas of Language Structures, OUP, Oxford. Dryer, M.S. (2005b) ‘Relationship between the order of object and verb and the order of relative clause and noun’, in M. Haspelmath, M.S. Dryer, D. Gil & B. Comrie, eds., The World Atlas of Language Structures, OUP, Oxford. Ford, M. (1983) 'A method of obtaining measures of local parsing complexity throughout sentences', Journal of Verbal Learning and Verbal Behavior 22: 203-218. Gibson, E. (1998) 'Linguistic complexity: Locality of syntactic dependencies', Cognition 68: 1-76. Greenberg, J.H. (1963) 'Some universals of grammar with particular reference to the order of meaningful elements', in J.H. Greenberg, ed., Universals of Language, MIT Press, Cambridge, Mass.. Greenberg, J.H. (1966) Language Universals with Special Reference to Feature Hierarchies, Mouton, The Hague. Haspelmath, M. (2002) Morphology, Arnold, London. Hawkins, J.A. (1983) Word Order Universals, Academic Press, New York. 125 Hawkins, J.A. (1994) A Performance Theory of Order and Constituency, CUP, Cambridge. Hawkins, J.A. (1999) 'Processing complexity and filler-gap dependencies', Language 75: 244-285 Hawkins, J.A. (2000) 'The relative ordering of prepositional phrases in English: Going beyond manner-place-time', Language Variation and Change 11: 231-266. Hawkins, J.A. (2004) Efficiency and Complexity in Grammars, OUP, Oxford. Hawkins, J.A. (2007) ‘Processing typology and why psychologists need to know about it’, New Ideas in Psychology 25: 87-107. Hawkins, J.A. (2009) ‘Language universals and the performance-grammar correspondence hypothesis’, in M.H. Christiansen, C. Collins & S. Edelman, eds., Language Universals, OUP, Oxford, 54-78. Hawkins, J.A. (to appear) Cross-linguistic Variation and Efficiency, OUP, Oxford. Holmes, V.M. & O'Regan, J.K. (1981) 'Eye fixation patterns during the reading of relative clause sentences', Journal of Verbal Learning and Verbal Behavior 20: 417-430. Jaeger, T.F. (2006) ‘Redundancy and syntactic reduction in spontaneous speech’, Unpublished PhD dissertation, Stanford University, Stanford, CA. Jaeger, T.F. & Norcliffe, E. (2009) ‘The cross-linguistic study of sentence production: State of the art and a call for action’, Language and Linguistics Compass, Blackwell. Just, M.A. & Carpenter, P.A. (1992) 'A capacity theory of comprehension: Individual differences in working memory', Psychological Review 99:122-49. Keenan, E.L. (1987) ‘Variation in Universal Grammar’, in E.L. Keenan Universal Grammar: 15 Essays, Croom Helm, London, 46-59. 126 Keenan, E.L. & Hawkins, S. (1987) 'The psychological validity of the Accessibility Hierarchy', in E.L. Keenan, Universal Grammar: 15 Essays, Croom Helm, London. King, J. & Just, M.A. (1991) 'Individual differences in syntactic processing: The role of working memory', Journal of Memory and Language 30: 580-602. King, J. & Kutas, M. (1992) 'ERP responses to sentences that vary in syntactic complexity: Differences between good and poor comprehenders', Poster, Annual Conference of the Society for Psychophysiological Research, San Diego, CA. King, J. & Kutas, M. (1993) 'Bridging gaps with longer spans: Enhancing ERP studies of parsing', Poster presented at the Sixth Annual CUNY Sentence Processing Conference, University of Massachusetts, Amherst. Konieczny, L. (2000) ‘Locality and parsing complexity’, Journal of Psycholinguistic Research 29(6): 627-645. Kwon, N., Gordon, P.C., Lee, Y., Kluender, R. & Polinsky, M. (2010) ‘Cognitive and linguistic factors affecting subject/object asymmetry: An eye-tracking study of prenominal relative clauses in Korean’, Language 86: 546-82. Levy, R. (2008) ‘Expectation-based syntactic comprehension’, Cognition 106: 1126-1177. Lichtenberk, F. (1983) A Grammar of Manam, University of Hawaii Press, Honolulu. Primus, B. (1999) Cases and Thematic Roles, Max Niemeyer Verlag, Tuebingen. Quirk (1957) 'Relative clauses in educated spoken English', English Studies 38: 97-109. Keenan, E.L. & Comrie, B. (1977) 'Noun phrase accessibility and Universal Grammar', Linguistic Inquiry 8: 63-99. Stallings, L. M. (1998) 'Evaluating Heaviness: Relative Weight in the Spoken Production of Heavy-NP Shift', Ph.D. dissertation, University of Southern California. Traxler, M.J., Morris, R.K. & Seeley, R.E. (2002) ‘Processing subject and object relative clauses: Evidence from eye movements’, Journal of Memory and Language 47: 6990. 127 Uszkoreit, H., Brants, T., Duchier, D., Krenn, B., Konieczny, L., Oepen, S. and Skut, W. (1998) ‘Studien zur performanzorientierten Linguistik: Aspekte der Relativsatzextraposition im Deutschen’, Kognitionswissenschaft 7: 129-133. Vasishth, S & Lewis, R. (2006) ‘Argument-head distance and processing complexity: Explaining both locality and anti-locality effects’, Language 82: 767-794. Wanner, E. & Maratsos, M. (1978) 'An ATN approach to comprehension', in M. Halle, J. Bresnan & G.A. Miller, eds., Linguistic Theory and Psychological Reality, MIT Press, Cambridge, Mass., 119-161. Wasow, T. (2002) Postverbal Behavior, CSLI Publications, Stanford University, Stanford. Yamashita, H. & Chang, F. (2001) '"Long before short" preference in the production of a head-final language', Cognition, 81: B45-B55. Yamashita, H. & Chang, F. (2006) ‘Sentence production in Japanese’, in M. Nakayama, R. Mazuka & Y. Shirai, eds., Handbook of East Asian Psycholinguistics, Vol.2, CUP, Cambridge. 128 Acknowledgements Special thanks to the many collaborators and contributors to this research program as presented here, especially: Gontzal Aldai Bernard Comrie Gisbert Fanselow Luna Filipovic Kaoru Horie Ed Keenan Lewis Lawyer Barbara Jansing Stephen Matthews Fritz Newmeyer Beatrice Primus Anna Siewierska Lynne Stallings Tom Wasow 129 Financial Support has been received from the following sources for the research reported here and is gratefully acknowledged: German National Science Foundation fellowship (DFG grant INK 12/A1) European Science Foundation small grant Max Planck Institute for Evolutionary Anthropology (Leipzig) research fellowships 2000-04 University of California Davis research funds University of Cambridge Research Centre for English and Applied Linguistics research funds and UCD teaching buy-outs 2007-10 130