Default inheritance, Word Grammar morphology and French clitics1 Richard Hudson, draft March 2014 Abstract After a general introduction to the theory of Word Grammar (WG), including a discussion of why a cognitive perspective is important, the paper focuses on two issues: the theory of default inheritance, and (as an example of defaults in action) a detailed analysis of the morphosyntax of French clitics. For default inheritance (DI), there are six potential problems which the paper addresses, and solves: 1. 2. 3. 4. 5. 6. generality (how to generalize beyond morphology) reliability (how to ensure monotonicity) certainty (how to recognise and resolve conflicts) relevance (how to avoid inheriting irrelevant properties) economy (how to avoid storing inherited properties) ‘sensibleness’ (how to avoid silly classifications). WG avoids or solves these problems by assuming a network structure rather than attribute-value matrices, and by restricting DI to tokens. For French clitics, the analysis takes clitics as words realized by affixes which each have a ‘hostform’, a schematic morphological structure containing ordered position slots. Each clitic has an abstract syntactically-oriented relation to its hostform (such as ‘subject’ or ‘3rdperson direct object’) which is mapped onto one of the position slots by general rules, but these general mappings vary between the default hostform, found (surprisingly) with affirmative imperatives, and the exception, found with all other verbs. According to this analysis, clitics show the same default orders as their non-clitic equivalents: subject before the verb, and direct object followed by indirect after the verb. 1. Theoretical background Perhaps the most important and distinctive characteristic of Word Grammar (as described, among other places, in Hudson 2007, Hudson 2010, Gisborne 2010; called ‘WG’ in the rest of this chapter) is its cognitive orientation. Not only does it assume that linguistic structures are ultimately conceptual structures, but it also joins other versions of cognitive linguistics in rejecting cognitive modularity. Instead of treating language structure as 'sui generis', WG treats it as just an example of ordinary cognitive structure, with similar properties to the structures we use for remembering events, people, social relations and so on. Similarly, language processing is just an example of general-purpose processes such as attention, classification, binding and inference. Why should linguists, and morphologists in particular, concern themselves with cognition? After all, the mainstream tradition of morphology produces abstract analyses of patterns such as verb paradigms which can easily be seen as existing in their own right, without any connection to other parts of the world. It is very tempting to think we can study the formal properties of morphological patterns without considering how they relate either to people or to the rest of language, leaving 1 Thanks to Nik Gisborne for detailed comments on an earlier draft. processing matters to the psychologists (and syntax to the syntacticians). A hundred years ago this may well have been a wise position as a defence against speculative psychology, but now that cognitive science is so well developed it is indefensible. This is especially so when we consider a psychological notion such as ‘defaults’, which makes no sense outside human cognition; defaults are part of our everyday analysis of reality, but they are arguably no more part of that reality itself than the categories to which they attach. Moreover, language itself self-evidently resides in people’s brains, so it is part of our minds, whether or not we are interested in minds, and any theory of how language is organised, whether intentionally cognitive or not, must eventually be reconciled with a theory of how minds are organised. In other words, “a theory that aspires to account for language as a biologically based human faculty should seek a graceful integration of linguistic phenomena with what is known about other human cognitive capacities and about the character of brain computation” (Jackendoff 2011:586). In order to minimise rethinking at a later stage, it would be wise to prepare for it by immediately integrating at least some of the most elementary findings of cognitive science (such as memory networks, spreading activation and default inheritance). Psycholinguists and linguists are all 'partners in the broader linguistic enterprise.' (Ferreira 2005) Discussing psycholinguists, Ferreira warns: 'their focus is on processing, but the representations presumably being generated are linguistic. Therefore, it would be foolish to ignore insights from linguistic theory about the nature of those structures.' For linguists, the same argument applies but arguably even more strongly as we seek to develop more sophisticated theories of language structure. A cognitive orientation to morphology does not simply mean applying psycholinguistics to morphological phenomena, as in the recent debate about the single route versus the dual route for processing regular and irregular forms (Pinker 1998). There has been a great deal of productive psycholinguistic work on morphology (Marslen-Wilson 2006), but it has been conducted against the background of mainstream linguistic theories which were not designed with cognitive issues in mind; for instance, virtually all this work has assumed that ‘the lexicon’ is distinct from ‘the grammar’ (or ‘the rules’), whereas cognitive linguists agree in rejecting this distinction. One consequence of the conservative assumptions made by psycholinguists about language structure is that psycholinguistic research has had very little effect on theories of linguistic structure – and in particular, very little effect on theories of morphology. This is the intellectual background to the theory of WG, which tries to combine reasonably uncontroversial findings of cognitive science with well motivated assumptions about language structure. For instance, take the simple fact that knowing a language is remembering a lot of facts – that the word pronounced /kæt/ means ‘cat’, that prepositions allow a noun as their complement, that past-tense verbs locate the situation described at a point in time before the moment of speaking, and so on and on. How does human cognition handle facts? An uncontroversial answer, called the Network Notion, is that each fact links two concepts in a network (Reisberg 2007:252), so conceptual knowledge consists of a network of concepts. This answer also implies a definition of what a ‘concept’ is: simply a node in the network, an atom without any internal structure. This is very different from the ‘network’ notion of morphology in Network Morphology (Brown, Hippisley) or Construction Morphology (Booij), in which the nodes are words or word forms with internal structure. The Network Notion recognises nothing but linked atoms, where each atom is the meeting point of at least two relations to other atoms. In this view, the word CAT is the meeting point of links to the meaning ‘cat’, to the form {cat}, to the word class 'noun', and so on, each of which is in turn simply an atomic meeting point. One of the issues in cognitive sciece is whether the network is simply an associative network - a collection of undifferentiated associations - or something more sophisticated, with links of different types. At least in AI, the consensus is that links themselves need to be classified, and this is certainly the view from linguistics, where the various links from CAT are traditionally distinguished clearly in terms of relations such as 'meaning', 'realisation' and so on - the 'attributes' of any theory which uses attribute-value feature structures. Similarly in WG, where there are some very general and fundamental linktypes: ‘is-a’ (e.g. ‘Richard is-a linguist’) argument and value (e.g. the 'meaning' relation between CAT and 'cat' has CAT as its argument and 'cat' as its value) identity (e.g. Richard is identical to the person who wrote this paper) quantity (e.g. the the number of legs we expect a cat to have is exactly 4). These are the elementary relations, and there may be no others. In contrast, the number of more specific relation-types is open-ended because relations such as ‘father’ or ‘meaning’ are merely learnable concepts, like entities such as ‘cat’ or the word CAT. What distinguishes relational concepts from non-relational concepts is their 'argument' link to one entity and their 'value' link to another. These ideas are illustrated in Figure 1, which shows part of the network which defines the concept ‘cat’. Each of the square boxes names an entity, while the ovals name relations. Each relation is linked to its value by an arrow pointing at the value, and to its argument by a simple curve without a point. Finally, the small triangle indicates the ‘is-a’ relation iconically, with its broad base on the super-category and its (smaller) apex pointing towards the (smaller) sub-category. In words, Smudge is a cat, a general category which is defined as the meaning of the word CAT and as the purrer in the typical act of purring. meaning purrer purring cat Smudge Figure 1: A network for 'cat' CAT The conceptual networks presented in diagrams like this, unlike neural networks, are not intended to be models of brain structures. It seems almost certain that the brain does not allocate a single neuron to each concept, but linguistic analysis depends crucially on the assumption that we represent each linguistic concept separately, whether it is a phoneme, a word, a word class, a relation or a meaning. In short, the networks of WG are ‘symbolic networks’ with one node per concept, and vice versa. Morphological analysis requires very clear and stable representations of concepts such as Figure 1 rather than the much more diffuse and opaque ‘subsymbolic’ representation of concepts found in ‘distributed’ models of neural networks (Onnis and others 2006); indeed, we might even use morphology as evidence for separating the mind and the brain. On the other hand, the clarity of a symbolic network such as Figure 1 should not mislead us into thinking that such networks are stable and unchanging. As part of a human mind, a symbolic network is highly dynamic. There are three reasons for this. One is that we are using the network for processing our experiences, for planning actions and for thinking. All three kinds of use require changes to the network as we create new nodes for our new experiences, plans and thoughts, and as we enrich these nodes in the ways explained below; for example, when we see a cat, we must assign it a new node even if we eventually classify it as an example of a cat that we already know. Most such nodes disappear within minutes, or even seconds, of appearing. Another cause of changes in the network is learning, whether learning new low-level concepts (such as the neighbour’s new cat) or some higher-level generalisation induced from existing concepts. And the third cause is the activation in the underlying brain circuits which guides and follows our thinking. We know that concepts are easier to retrieve if we have used them more frequently or more recently (Ellis 2002a), and that an experience can ‘prime’ related concepts (Reisberg 2007:257), so we can be reasonably sure that activation levels vary both on a long-term scale and in the short term. These changes of activation affect the underlying brain-cells directly, but their indirect effect on the mental network is profound as they guide us in retrieving information. We shall see below how important they are in WG when considering the logic of default inheritance. Where do procedures fit into this view of the mind? The network itself is, of course, purely declarative, even when parts of it are changing. For instance, the link which shows that ‘cat’ is the meaning of the word CAT is simply there, as a declarative fact; it is not a procedure, nor is it the instruction for a procedure. As a fact, it is quite neutral as to directionality or timing, so it is just as relevant to speaking as it is to listening. It allows a speaker to find the word for 'cat', just as it allows a listener to find the meaning of CAT. This principle is especially important for morphology, where there is a strong procedural tradition expressed for instance in classroom formulae for building verbs such as: “first take the infinitive, then knock of the final –r, then add ... ”. This approach is represented at a theoretical level by Paradigm Function Morphology (Stump 2006a), but declarative approaches offer a viable alternative. The choice between declarative and procedural approaches is clearly a fundamental question for research. Like other branches of cognitive linguistics, WG assumes that knowledge (including language) is based on the learner’s experiences far more than on innate knowledge. When applied to language, what we know is based on other people’s usage. Usage-based learning explains why some words are far more active than others (Bybee 2010), as explained above, but it also means that we can store a great deal of fine detail about the items we used – about their social context, their phonetic details, and their linguistic context. There is a great deal of psycholinguistic evidence that 'language acquisition is a process of dynamic emergence and that learners' language is a product of their history of usage in communicative interaction.' (Ellis 2002b:297), but common experience also confirms that we know, and use, a great deal of itemspecific information; for instance, I know that the word ilk is very limited and I would certainly notice any example and remember something about the speaker and social context. However, if generalisations are based on memorised exemplars, it follows logically that memory for detail must go well beyond idiosyncracies such as the properties of ilk. As children, we must have induced regular patterns such as those for plural nouns from a collection of very similar particular cases; and there is no evidence that we forget these exemplars after creating the generalisation. To summarise the argument so far, the cognitive orientation of WG leads to the Network Notion, which in turn leads to: a view of mental structures as distinct from brain structures, a view of concepts as atoms defined only by their relations to other concepts, a view of relations either as basic links, such as ‘is-a’ or ‘argument’, or as relational concepts in an open-ended and hierarchically organised list, a view of networks as constantly changing with new nodes being added and lost or learned, and with underlying activation levels changing with experience. 2. The logic of default inheritance If knowledge is held in a declarative network, it is important to know what mental tools we have for exploiting it; and, if language knowledge is just ordinary knowledge, we may expect these same tools to provide what we need in processing language. Once again the cognitive orientation matters because it rules out any theory of language processing which requires assumptions specific to language, such as a dedicated ‘morphology module’ which might apply between other modules dedicated to syntax and to phonology; it also rules out processing models which apply just to speaking or just to hearing (e.g. Levelt and others 1999). WG provides five general procedural tools for exploiting the network (Hudson 2010:70-101). Node-creation: this allows us to create new nodes for handling elements of ongoing experience, as well as for new concepts that we create for induced generalisations. Node-creation also creates new links to the new node. We shall see below that node-creation includes the whole of default inheritance for enriching the newly created nodes. Binding: this includes the familiar anaphoric binding patterns discussed in syntax and logic, such as the binding of a pronoun by its antecedent; but it is much more general, because it allows us to bind any two existing nodes as ‘the same’, without actually merging them into a single node. For instance, this is the effect of hearing a sentence such as That building is the post office, or indeed of simply realising that the building concerned (already known) is the post office. It may even extend into perception, where 'the binding problem' is the challenge of explaining how we coordinate colour and shape in vision (Reisberg 2007:55). It is unclear whether binding is in fact a separate process from node-creation, because the logical effect of identifying nodes A and B can be achieved by creating a new node C with 'is-a' links to both A and B. Activation: this affects the activation level of a given mental node (transmitted via the neurons that underly it). The activation is an observable physical reality in neuroscience, and the way it changes in mental networks is one of the main research themes of cognitive neuroscience. One point of general agreement is that it spreads out from one mental node to all its neighbours in a rather indiscriminate way, which gives rise to 'priming' effects in experiments in which hearing one word makes a related word easier to retrieve; for example, the word nurse can be shown to prime doctor thanks to the activation of the former spreading indiscriminately to the latter (Reisberg 2007: 251-7). Attention: this is closely related to the notion of activation because attention seems to be at least in part a matter of channeling activation so that concepts that we focus attention on receive extra activation (Reisberg 2007:112). Attention is important in any theory of language structure and processing if we think of language as a means for channeling the hearer's attention. For instance, I say dog in order to make you 'think of' a dog - i.e. in order to get you to pay attention to that concept, thanks to the 'meaning' link between the two. When it comes to morphology, the links between forms and meanings become much more complicated, but they still serve as a route map for the hearer's attention. Default inheritance: this is the process by which new nodes are enriched by inference, and the topic of the present section. In essence, default inheritance allows generalisations to have exceptions. The logic of default inheritance comes from Artificial Intelligence (Luger and Stubblefield 1993: 387-9), but the basic insight is the same as in the massive literature on prototypes in cognition (Rosch 1978, Taylor 1995). However intuititively obvious it may seem, default inheritance has been subject to a great deal of theoretical debate in the research literature of both logic and computational linguistics (Bouma 2006, Briscoe and others 1993, Carpenter 1992, Daelemans and Desmedt 1994, Flickinger 1987, Lascarides and others 1996, Luger and Stubblefield 1993, Pelletier and Elio 2005, Russell and others 1993, Touretzky 1986, Vogel 1998). This literature identifies serious problems, especially for attempts to implement default inheritance in computer programs. These problems are so serious that some would argue that default inheritance cannot actually underlie human reasoning. This conclusion is disappointing, considering how obvious the basic idea is: defaults generalise except when overridden; even more simply, generalisations allow exceptions. The classic non-linguistic example of exceptionality is the fact that penguins are birds, but don’t fly; as mentioned earlier, examples like this are often discussed in the cognitive literature on ‘prototype effects’ - the finding that we recognise some examples of a concept as 'better' or clearer (so for instance a sparrow is a better example of a bird than a penguin is, and a table is a clearer example of furniture than an ashtray). Prototype effects are taken as evidence that general concepts such as 'bird' or 'furniture' are defined by their typical (or prototypical) members rather than by necessary and sufficient conditions. WG accepts this analysis, and explains it in terms of default inheritance logic. If the logic allows any property of any concept to be exceptional, it follows that any example of any concept may be exceptional; consequently, examples can be ranked according to how many exceptional properties they have. The exceptionality of penguins can be represented in WG notation as in Figure 2, where once again the small triangle signals the is-a relation. In prose, a penguin is-a bird, and in this diagram we are also imagining a ‘token’ – i.e. some exemplar entity – which is-a penguin. The typical bird flies (is the flier in the activity of flying), and conversely, the typical flier is a bird. The fact that this flying happens is indicated by the number (‘#’) ‘>0’, meaning that one would expect to find some examples of flying for every bird. However, exceptionally, penguins don’t fly; this is shown by the ‘0’ number, meaning that one would expect no instance of flying in the case of penguins. Because the token is-a penguin, it inherits 0 rather than >0 for flying – in other words, we don’t expect it to fly. Apart from the WG notation that it introduces, the main point of this diagram is the conflict between 0 and >0 which is resolved in favour of 0 by the logic of default inheritance because penguins are exceptions, and exceptions always defeat defaults. # flier bird >0 flying # 0 penguin # token 0 Figure 2: Penguins as an exception in WG notation The immediate question is what role default inheritance (DI) can play in morphology, but we must first address the general objections to DI mentioned above. The following paragraphs will identify all the problems with DI of which I am aware, and offer solutions. The first problem is generality: we need a domain-general theory of DI that applies throughout cognition, and not just in morphology. This is a serious problem for any theory which defines default inheritance in terms of attribute-value matrices, because as the name implies, these only allow values to be overridden. If the example of penguins not flying is correctly analysed here, it cannot be expressed as an exceptional value, but as an exceptional argument: 'penguin' has an exceptional property in which the exceptionality lies in the argument (no flying) rather than in the value (the flier). Of course, it could be argued that this fact could be reworded so as to reverse the directionality, giving something like ‘a penguin’s means of locomotion is not flying’, but this misses the point that penguins simply cannot fly – it’s not just that they prefer swimming. In any case, it would surely be wrong to base the theory of DI on the assumption that arguments can never be exceptional. WG avoids this problem by using a network which is not equivalent to an attribute-value matrix. A WG network is not a DAG (directed acyclic graph) because it allows cycles in which a pair of nodes A and B may be related in both directions, creating a cycle; in fact, cycles are common and play an important part in WG. The second problem is reliability: how to cope with a ‘non-monotonic’ logic which allows earlier inferences to be overridden by later ones. If later overriding is possible, no inference is reliable, because at any point we may discover some overriding inference. For instance, if the token in Figure 2 were to inherit from ‘bird’ before it inherited from ‘penguin’, the first inference would turn out later to be invalid. The solution is obvious: design the inheritance process in such a way that the token inherits from ‘penguin’ before it inherits from ‘bird’. Consequently, WG incorporates an algorithm for inheritance (Hudson 2010:90) which consists of two processes: ‘the searcher’, which searches for inheritable properties. ‘the copier’, which copies them down to the inheriting node. The bottom-up search strategy is guaranteed by a very simple principle: only newlycreated nodes inherit; in other words, inheritance is part of the node-creation process mentioned above, and since tokens are always, by definition, at the foot of the is-a hierarchy, there is only one direction in which the searcher can go: up. This principle of Token-only Inheritance will also solve a number of other problems as we shall see below. In short, the supposed non-monotonicity of DI is an illusion; it is actually strictly monotonic, so every inference is reliable. For example, in the case of penguins, ‘cannot fly’ is inherited first, so ‘can fly’ is never inherited. However it should also be noted that this solution is only available in a cognitively based theory of morphology which includes psychological processes such as node-creation. The third problem is certainty: recognising clashes and winners. DI is all about resolving the conflicts between defaults and exceptions, so it is important both to recognise when there is a conflict, and to know which of the competitors should win. In a case like Figure 2, both the conflict and the outcome are straightforward, because ‘>0’ and ‘0’ are competing as the values of the same relation, and ‘0’ must win because it is inherited first. However, there are two situations where the outcome is less obvious. One is multiple inheritance, where a token inherits from two sources – an important scenario in inflectional morphology, which relies on multiple inheritance from both a lexical category (a lexeme) and an inflectional one (e.g. dogs inherits from both DOG and Plural). In general, multiple inheritance works smoothly precisely because inherited properties are orthogonal - i.e. each source contributes a different set of properties (e.g. in morphology, the lexeme contributes the base and the inflection the affixes); but conflicts can arise, as in the famous ‘Nixon diamond’ (Touretzky 1986), in which the American president Richard Nixon inherited a positive attitude to war from his membership of the Republican party, but a negative one from his Quaker background. But where such conflicts arise, it doesn’t show a problem in the logic, but in the world. People do hold conflicting beliefs and behave in contradictory ways, and multiple inheritance simply explains why the beliefs and behaviours are contradictory. Such conflicts may even arise in morphology, as in the case of the *I amn’t gap, which I believe can be explained by the conflict between inheriting the form am from ‘first person singular’ and aren’t from ‘negative’ (Hudson 2000). If this analysis is correct, the logic of default inheritance explains why the conflict cannot be resolved. In contrast, Network Morphology, with its base in the default logic of DATR, cannot explain the gap because DATR is designed so that such conflicts can never arise. The other area of uncertainty is where a term (i.e. value or argument) of one relation is defined indirectly by a term of another relation – i.e. in attribute-value terminology, by ‘re-entrancy’ (Bouma 2006). Figure 3 shows a typical example of this situation, in which an English verb’s default present-tense fully-inflected form (abbreviated to ‘full’) is the same as its base (as in they walk), but the verb BE is an exception, because its present-tense form is {are} rather than the base {be}. As usual, the triangles indicate is-a links, so the nodes labelled A and B are examples of the ‘full’ relation, and D is an example of {are}. verb full {be} BE present C base base A BE, present {are} B token D Figure 3: By default, a verb’s present form is its base, but BE is an exception The uncertainty lies around the relation ‘base’: in relation to 'present', C binds its base to its ‘full’, so is the same true of {are} in relation to ‘BE, present’? At the point where the searcher for this token inherits from ‘present’, one might expect it also to inherit the binding of the ‘full’ and base relations; and though it is easy to imagine situations where it doesn’t matter, there may be situations where it does matter, and it is important for the token either to inherit the extra relation, or not to inherit it. Once again, the solution is obvious: let the network decide. If {are} is the base as well as the ‘full’ of ‘BE, present’, show this in the network; if not, not (as in Figure 3). Thus, when the token D inherits from {are}, it also inherits all the relevant relations, but it doesn’t later add any further relations to {are}. Put more formally: if the searcher for token t finds a property [T, R, X], where t is-a T and R is a relation with T and X as its terms, and if there is already a property with t and x (an example of X) as its terms o then the copier creates a copy [t, r, x], where r is an example of R; and if there is already a property with t as its term and r as its relation, o then the copier does nothing (i.e. the existing property overrides the new one), otherwise, the copier creates a copy [t, r, x], where x is a new example of X. The fourth problem for DI is relevance: how to avoid irrelevant enrichment. The reason for enriching tokens is to provide information which we can’t derive simply from observation. The problem is that most of the information we could inherit is simply irrelevant in most situations. If you see a cat sunning itself, it is relevant to know that cats like to be stroked, but not that they suckle their babies; and if you see penguins in the zoo, it may be relevant to know that they can’t fly, but not to know that they have skin. This is a serious problem in a realistic model of inheritance because the process of searching and copying takes time and resources, and because of the sheer quantity of information that we know about general categories, and could therefore inherit. The WG solution exploits its cognitive basis by invoking activation levels and the fact that these apply to relational concepts as well as to entities. In any situation some relations and entities will be more active than others, reflecting different degrees of relevance, so the searcher simply ignores properties in which the relation is below some threshold of activation. The fifth problem is economy: how to avoid filling our memory with inheritable information. One might think that once we have inherited some fact, we might store it away for use on future occasions, and this does seem to be true for facts that we use very frequently; but very frequently used facts may be only a small minority of the total, and it seems that most inheritable facts simply disappear after we have inherited them. For instance, there is evidence from the past tenses of English verbs that frequently-used forms are stored while those that are used less frequently are not stored, but created as needed (Bybee 1995). The WG explanation once again invokes the Tokens-only Inheritance principle: if only tokens inherit, then inherited properties are not attached to permanent categories in memory, but vanish with the tokens to which they are attached. This leaves us with a question about how inheritable facts attach to very frequently accessed entities, but at least we have changed the problem. Finally, we have the problem of sensibleness: how to avoid unmotivated classifications. The problem has been presented like this: ... if we define a penguin as a bird that does not fly, what is to prevent us from asserting that a block of wood is a bird that does not fly, does not have feathers and does not lay eggs? (Luger and Stubblefield 1993:389) The solution, again, is rather obvious: recognise that this is a problem for classification, not for DI. It is classification that establishes the initial is-a link between a token and some stored type, and DI is only responsible for enriching the token on the basis of that classification. What is needed is a sensible theory of classification such as the WG theory based on activation levels (Hudson 2010:93-8). The conclusion to which this discussion leads is that the supposed problems of DI are all easy to solve in a cognitively-based theory such as WG, so DI is indeed a suitable logic for work in morphology. The rest of this chapter will illustrate the benefits of being able to refer to defaults in morphology. 3. Morphology and realization WG includes a theory of morphology which is inferential and realizational, so it belongs to the Word and Paradigm family (Stump 2001:3). In other words, morphological structure is not just a case of syntactic structure within the word. Indeed, this choice is almost forced on WG by its theory of syntax, which is based on word-word dependencies rather than on phrase structure. A phrase-structure analysis of Cows moo might consider whether the division of cows into {cow} and {s} is the same kind of split as when cows moo is split into cows and moo, but this option is not available in a theory which does not recognise any units of syntax larger than the word. The Network Notion means that realization is a relation rather than a process; so the plural of COW has a 'realization' relation to {cow} and {s}, rather than being realized by the process of adding {s} to {cow}. It could be argued that these two descriptions are just different metaphors for the same pattern, but there is an important difference. If realization is a process, then it takes place in time and when more than one process is involved, they can be ordered in time, as ordered rules: first do this, then do that. But as I mentioned in section 1, ordered procedures are problematic in linguistics because of the asymmetry of speaking and listening. What is needed is a time-neutral, declarative statement which can be exploited equally easily when speaking or listening. Moreover, if some complex forms are stored rather than created, ordered rules for creating them are irrelevant but their structures still need to be recorded; for example, if we store the form {{walk}{ed}} as the past tense of WALK, we need to be able to describe this structural pattern rather than providing a recipe for creating it. Consequently, WG morphology has no ordered rules or procedures, but analyses structures simply in terms of static declarative relations, such as the relations ‘full’ and ‘stem’ in Figure 3. One of the controversial issues in realizational theories is whether words are realized by phonological objects or by special morphological objects, such as morphs (a term that I prefer to the much more abstract ‘morpheme’ which tends to encroach on the territory of morphosyntactic categories). At one time WG favoured direct realization by phonology (Hudson 1984:53, Hudson 1990:181), but the current WG answer accepts special morphological objects, i.e. a level of morphology separating syntax from phonology. The evidence for this position comes from several different directions, all independently pointing to the same conclusion: The argument from conflicting segmentation: Imagine a small child whose vocabulary already includes both know and nose and is now confronted with an example of knows. Since knows and nose are homophones they must have the same phonological structure and segmentation, namely the syllable /nəʊz/. But it is highly likely that the child also recognises the similarity between knows and know in their shared realisation /nəʊ/; but in the phonological structure, this is not a unit - it is merely part of a syllable. The only way in which the child's mind can recognise the similarity between knows and know is by dividing the former's realisation in a non-phonological way; in short, by recognising the morph {know}, co-existing with, and mapped onto, the phonological pattern /nəʊz/. The argument from recycling: The general principle that ‘the rich get richer’, which underlies so much human behaviour (Barabasi 2009), means that we try to ‘recycle’ existing meanings (Hudson and Holmes 2000) and forms in new words. This is most clearly seen in folk etymology, where ordinary people (not linguists) reanalyse complex words in terms of existing words, regardless of the meaning. For instance, when hamburger (originally, a sausage from Hamburg) was adopted in English, its first syllable was identified with the form of ham, even though the sausage didn't actually contain ham, leading eventually to creations such as cheeseburger. In short, {ham} is a morph shared by both ham and hamburger rather than just a bit of phonology. The argument from priming: psychological experiments show that phonological priming (e.g. nurse – verse) dies out much faster than morphological priming (e.g. contain – retain) (Frost and others 2000). This shows that morphological and phonological patterns must be cognitively distinct. These psychological arguments strongly support the traditional ‘morphomic’ view (Aronoff and Volpe 2006, Blevins 2003) that morphology is a distinct level between syntax and phonology, with its own units (morphs, wordforms and intermediate-sized units), its own classes (root, prefix, suffix, etc.) and its own relations (realization, stem, full, etc.). Typically, then, words have meaning and are realized by morphology, not phonology; morphs are basically meaningless but realize words and are realized by phonology (and graphology); and phonology realizes morphology and is realized by phonetics. This typical arrangement is shown in Figure 4. However, there is no reason to believe that our minds are incapable of representing other relations, such as a direct relation between phonology and meaning – indeed, this seems to be precisely what we have in intonation where 'tunes' carry meaning independently of the words and syntactic structures with which they combine. (Another such case is the existence of ‘phonesthemes’ such as glitter, glisten, glow, gleam, glare and glint, where the pattern /gl/ is related to the idea of bright light.) concept 'cat' meaning word CAT realization morph {cat} realization syllable /kæt/ Figure 4: four levels: semantics, syntax, morphology, phonology The architecture of language outlined above is a model of language representation, the kinds of mental structures that we need to represent the things we hear and say (and read and write, if we add a level of graphology in parallel with phonology). It distinguishes three kinds of entities apart from meanings (syntactic, morphological and phonological or graphological) related by realization relations – a conservative model of language architecture. But two familiar elements of traditional morphology are missing. First, as in other branches of cognitive linguistics, there is no attempt to separate ‘the grammar’ from ‘the lexicon’. In many models these are the names for supposedly separate parts of the underlying system, involving either general rules or specific lexical entries; but if cognitive linguistics is right, then properties are inherited down an inheritance hierarchy which has no boundary between ‘general’ and ‘specific’. Generality is simply a matter of degree, with the same item – such as the lexeme CAT – inheriting its properties from concepts ranging from very general (‘word’) to very specific (‘CAT’). And secondly, as in Network Morphology (Hippisley) and other sub-varieties of Declarative Morphology (Neef 2006), there are constraints but no rules. Instead of rules, we have the generalisations which are available for inheritance and which act as constraints on possible representations but do not ‘create’ them. One advantage of avoiding rules in this way is that the information is neutral between production and perception, whereas a rule such as ‘to form the plural, add {s}’ applies to production, and needs to be rephrased for perception along the lines of ‘if you see a word ending in {s}, recognise it as plural’. Another advantage of the static analysis is its neutrality between storage and creation, as required by any theory which allows forms to be either stored or created as needed. In both cases, the structures are covered by the same generalisations, so the morphological structure of, say, walks is {{walk}+{z}}, regardless of whether it is stored ready-made or created as needed. As mentioned earlier, this rule-free approach contrasts with Paradigm Function Morphology (Stump), where rules play a fundamental role. A single PFM rule may be logically (and psychologically) equivalent to a relation, but it is unclear whether the same is true of entire blocks of rules. Since Stump invokes his analysis of Fench pronominal clitics as evidence for rule blocks, section 5 of this chapter shows that the same data can be analysed without recourse to rule blocks. First, though, we need a brief introduction to the WG theory of morphosyntax. 4. Morphosyntax in WG This section explains how WG handles morphosyntax, the syntax-oriented part of morphology which maps syntactic categories onto morphological structures consisting of morphs. Morphophonology, which relates these morphs to their phonological realisations, is discussed briefly elsewhere (Hudson 2007:81-100) but is relatively underdeveloped in WG. The basis of the WG approach is to rely heavily on a rich set of relations, starting with the ‘realization’ relation and its sub-types. If {cat} is the realization of the lexeme CAT, then what is the relation between {cat} and the plural of CAT, called ‘CAT, plural’? Clearly there is a relation, and it too is an example of realization, but equally clearly, it cannot be exactly the same kind of relation as the first. In short, ‘realization’ breaks down into a number of more specific relations such as ‘base’ and ‘full’ (meaning ‘fully inflected form’) which we have encountered already. It will be recalled that WG allows relational concepts as well as entity concepts to be classified and sub-classified in a hierarchy. In this analytical system, therefore, the ‘full’ of ‘CAT, plural’ is {{cat}{s}}, while its ‘base’ is {cat}; and ‘fully inflected form’ and ‘base’ are both sub-types of the more general relation ‘realization’. These two relations cover the morphosyntax of the word cats, which is inherited in a regular way from the inflectional category ‘plural’ as shown in Figure 5. full plural CAT base base s-variant CAT, plural full {{cat}{s}} {cat} base s-variant Figure 5: Cats as an inflection of cat The crucial role in this analysis is played by another relation, called 'variant'. It is exemplified here by 's-variant', which defines the relation between a plural noun's fully-inflected form and its base. (The same relation also defines the singular form of a present-tense verb.) The diagram doesn't try to add any further details about the regular pattern, but it shows that in the case of CAT, the s-variant is {{cat}{s}}. In contrast, the s-variant of MOUSE would be the irregular {mice}. The 'variant' relation also provides a convenient mechanism for syncretism which avoids the arbitrary choices of direction involved in rules of referral (Zwicky 1985b). For example, the syncretism of the ‘perfect participle’ and ‘passive participle’ in English is expressed by invoking the same 'en-variant' relation in both cases, rather than by arbitrarily selecting one as basic and deriving the other from it. This example shows how WG treats a simple example of inflectional morphology; the most controversial characteristic of WG inflectional morphology is probably the use of inflectional word-types such as ‘plural’ instead of morphosyntactic features (Hudson 2010:44). Features are permitted by WG, but they coexist with the much more flexible is-a hierarchy, so in principle a distinction such as that between singular and plural can be accommodated in either way: hierarchically, by recognising 'plural' as a subclass of 'noun', with singular nouns as the default. in terms of the feature (or attribute) 'number' and two values, 'singular' and 'plural'. Features are well motivated in only one case: when there is agreement (X and Y both have the same value for the feature Z). Only then can we be sure that the logic of the system requires a feature. In contrast, hierarchical subclassification is a normal part of any classification system, and in morphology it has the great advantage of providing a natural way of showing markedness by identifying the unmarked case as the default. Indeed, even when a feature is justified by agreement, it can be combined with a hierarchical classification; for example, English agreement rules recognises a feature 'number', whose value is by default 'singular', but (exceptionally) 'plural' for plural nouns. The peculiarity of inflectional morphology is that the morphosyntactic categories that it distinguishes are only available via morphology. In contrast, derivational morphology is one of the mechanisms for relating ordinary lexemes which are classified in terms of ordinary lexical classes. For example, it is derivational morphology that recognises the morph {farm} in both the ordinary verb FARM and the ordinary noun FARMER. Similarly, compounding relates two bases to a third which includes them both, as where the base of FARMHOUSE contains the bases of both FARM and HOUSE. The WG structures for these examples are shown in Figure 6 with most labels omitted for simplicity; ‘1’ means ‘the first part’. agentive FARMHOUSE FARMER FARM {farmhouse} er-variant {farmer} {farm} 1 Figure 6: Derivation by suffixation and compounding In addition to inflectional and derivational morphology, morphosyntax handles various kinds of mismatch between syntactic and morphological structure, including fusion and cliticization (Camdzic and Hudson 2007, Hudson 2007:104-15). Fusion maps two words onto a single morph. For example, French au replaces the expected à le, as in au jardin, ‘to the garden’, compared with à la maison, ‘to the house’, and in English the abbreviated form of you are is arguably a single morph with the same pronunciation as your (or possibly even the same morph as in your). LE À {au} BE, present YOU {your} Figure 7: Fusion In contrast, cliticization maintains the expected one-one relation between words and morphs, but reduces the word's realisation to a mere affix instead of the expected full wordform. At the level of syntax, a clitic has a normal place in the dependency structure of syntax. But in morphology, as a mere affix it is often said to need a ‘host’ word to support it. Take example (1). (1) The boys’ll fix it. Here ‘WILL,present’ is realized by a suffix {’ll}, so this attaches to the preceding word for support. But it cannot be an extra part of the wordform {{boy}{s}} because this is already ‘full’ – i.e. fully inflected, without any space for a further suffix. Consequently, the clitic must create its own morphological ‘host’ containing both it and the supporting word: {{{boy}{s}}{‘ll}}. Strictly speaking, therefore, a clitic's host is not a word but a hostform, a purely morphological entity without any counterpart in syntactic structure (e.g. boys'll is not a syntactic unit). As a theoretical construct, it is a normal part of morphological structure: a formal template which accommodates morphs in a rigid and sometimes arbitrary order (Stump 2006b), in contrast with the 'layered' structures which are also found. In a language like English, hostforms are very simple, just as morphological structure in general tends to be; so an English hostform has only two slots, which are filled according to the ordinary word-order rules of syntax - in other words, we have just 'simple clitics' (Zwicky 1985a). As we shall see below in the analysis of French, the template can be much more complex, just as the ordinary inflectional morphology is. Indeed, it is tempting to speculate about a connection between these two types of complexity: maybe 'special clitics' such as those found in French are only found in languages that also have complex templates for inflectional morphology? A simplified syntactic and morphological structure for example (1) is shown in Figure 8, where the node labelled ‘X’ is the host of the clitic affix {’ll}, and provides two ordered ‘slots’ (actually, relations) labelled ‘0’ (for the main one, the 'anchor') and ‘+1’ (for the affix). This is actually just the same kind of morphological structure as the one that relates the node labelled ‘{{boy}{s}}’ to its parts, {boy} and {s}, but these relations are omitted for simplicity. The ‘ll boys fix it. full full {‘ll} {{boy}{s}} 0 +1 X host Figure 8: A simple clitic: ... boys'll ... This simple example provides the basic ingredients for the analysis of French clitics in the next section: an affix rather than a full wordform as the fully inflected form (‘full’) of the clitic. a hostform, which is stored as a completely schematic morphological form with a structure but no specific phonological realization of its own., but which in particular instances brings together the morphological realizatios of the clitic and its host. a ‘host’ relation linking this hostform to the clitic affix. a relation such as ‘+1’ from the hostform to the clitic affix a relation labelled ‘0’ from the hostform to its anchor, a full wordform. This is enough apparatus to analyse simple clitics, but special clitics, where the clitic is in a special position, need a little more. 5. French clitic pronouns The challenges of French clitic pronouns are well known because of the ordering and co-occurrence constraints that go beyond anything we might explain in terms of syntax or semantics (Hawkins and Towell 2001:67-71). Since they have been presented by Stump as a test-case for the explanatory power of Paradigm-Function Morphology, it may be interesting to compare the PFM analysis with one based on WG; the following analysis builds on the one in Hudson 2001 and Hudson 2007:1113, but introduces some significant improvements. The special clitics that are attached to verbs (not all of which are pronouns) are illustrated in the following examples: (2) Il ne le lui y donnerait pas. he not it to-him there would-give not He wouldn’t give him it there. (3) Il ne te le donnerait pas. he not to-you it would-give not He wouldn’t give you it. (4) Il y en mangerait. he there from-it would-eat He would eat some of it there. These examples include the following: il: a subject pronoun ne: a negative marker, paired (here) with pas. le: a direct-object pronoun. lui: an indirect-object pronoun. y: a word meaning ‘there’ or ‘to it’. en: a word meaning ‘from it’ or ‘of it’. (Some of these ‘words’ could be analysed as fusions of a preposition de, ‘of, from’ or à, ‘to, at’, with a pronoun following the model of au for * à le mentioned above; for instance, lui = *à le (human), en = *de le (non-human). For present purposes, this possibility is irrelevant; all that matters is that the morphological realization is an affix.) These facts call for an analysis in terms of a template in which the order of morphs is simply stipulated as a linear sequence: Table 1: French clitic pronouns A subject B C neg non-third or reflexive object je, tu, il, elle, nous, vous, ils, elles ne me, te, se, nous, vous D third-person nonreflexive direct object le, la, les E third-person non-reflexive indirect object lui, leur F to G of y en As one might expect, since only one position is available for each group, only one member of each group is allowed, even if syntax allows two to combine – a clear example of the effects of morphology mentioned above. This rules out examples such as (5), in which two pronouns from group C are combined. (5) *Il se me présentera. he himself to-me will-introduce He will introduce himself to me. (Or: He will introduce me to himself.) According to Stump's presentation, ‘ethic datives’ are allowed as exceptions, as in (6). (6) Il te nous a passé un de ces savons. he ‘you’ to-us has passed one of those soaps He gave us an incredible telling-off. The obvious possibility is that ethic datives have an extra position slot between B and C. This slot also seems to be available, for some speakers, in what Stump calls ‘cliticclimbing’, where a dative pronoun belongs to an adjective as in (7). (7) Jean me te semble fidèle. John to-me to-you seems faithful John seems to me to be faithful to you. This detail is largely irrelevant here and can be left to future research. However there are some challenging complications which require a more sophisticated theory of morphosyntax: Typically, these forms all stand before the verb as in (2) to (4), but if the verb is affirmative imperative, all but A and B follow the verb, as in Fais-le!, ‘Do it!’ (but: Ne le fais pas! ‘Don’t do it!’) In affirmative imperatives, the order of groups C and D is reversed, as in Donnezle-moi! ‘Give it to me!’ In this case, the ordinary form of me and te (you) is replaced by moi and toi. Although groups C and E occupy different positions, they cannot combine as in *Il se lui présentera. ‘He will introduce himself to him.’ If the pronouns belong syntactically to a non-finite verb depending on an auxiliary verb (avoir ‘have’ or être ‘be’), they attach morphologically to the auxiliary (as in Il le leur a envoyé, ‘He has sent it to them.’), and likewise if the non-finite verb depends on faire, ‘make’ (as in Il les lui fera manger, ‘He will make him eat them’). If the verb to which the pronouns belong syntactically depends on laisser, ‘let’, envoyer, ‘send’ or a perception verb, the pronouns may attach morphologically either to their own verb or to the verb on which this depends, as in either Tu les lui laisses lire? or Tu la laisses les lire?, both meaning ‘Do you let her read them?’ (Hawkins and Towell 2001:70). In addition to these morphosyntactic details, clitic pronouns show peculiarities in morphophonology which point clearly to their being involved in a morphological structure rather than merely in a syntactic structure mapped directly onto phonology. For example, the object pronoun le or la can be omitted provided it would immediately precede the pronoun lui (Bonami and Boyé 2007): (8) Paul la lui apportera. Paul it to-him will-bring (9) Paul lui apportera. The object pronoun must be present in the syntax, because it is required by the valency of apportera, but the condition for its omission is the presence of lui in the morphology. How should French clitic pronouns be treated? A widespread view is that since these constraints are clearly morphological rather than syntactic, they must be handled by the machinery of inflectional morphology (Miller and Sag 1997, Bonami and Boyé 2007), an asssumption shared by Stump's very sophisticated analysis in terms of PFM. However, there is an alternative which deserves consideration: that the clitic pronouns are, in fact, clitics - syntactic units realised directly by affixes. In other words, they are not part of inflectional morphology, even though their analysis shares the same morphological apparatus such as templates. The key difference between the two approaches concerns the categories involved. The inflectional analysis requires extra morphosyntactic features mediating between syntax and morphology, whereas the clitic analysis does not. For example, (8) is syntactically and semantically very similar to (10). (10) Paul apportera la viande à Jean. Paul will-bring the meat to John In the inflectional analysis, the verb in (8) carries morphosyntactic features which reflect the presence of the two pronouns and which can be invoked by special syntactic rules which explain how the full NPs and PPs of (10) can be replaced by the mere features of (8). In contrast, the clitic analysis dispenses with these mediating features; so instead of saying, for example, that la realizes a feature which is interpreted syntactically and semantically as though it was an ordinary object, the clitic analysis says that la is an ordinary object with the morphological peculiarity of being realised by an affix, {la}. The clitic analysis is cognitively much more plausible. After all, the relation between the morphological form and its non-clitic counterparts is fully transparent even to the extent of using ordinary determiners le, la, les as pronouns - and the easiest way to understand the choice between (say) la viande and the clitic la is to give them the same syntactic status, just as in the English choice between the meat and it. It would be very strange if learners of French could only understand this rather obvious pattern by postulating inflectional features of the verb. I now present a WG version of the clitic analysis. We already have the apparatus introduced earlier for simple clitics. The key concept is 'hostform', a morphological template which is extremely schematic in the grammar but which in particular cases has a concrete realization just like any other concept; so in (8) the hostform is la lui apportera. Although in a network analysis it is just an atom, it has a link to each of the 'slots' which may be part of it and whose order is defined by labels such as ‘0’ (for the host’s anchor) and ‘+1’ (for the first part after 0). If we apply this framework of analysis to the elementary ordering in Table 1, we can say that French clitics have a host whose network structure consists of eight parts (not all of which are present on every occasion), numbered from ‘-7’ to ‘-1’ and (finally) ‘0’, the anchor. This kind of ordering is very familiar in general cognition, as it represents our ability to order random pairs of numbers – to know, for example, that 3 is ordered in between 1 and 7. Somehow, then, each clitic must be mapped onto one of these ordered parts; and the anchor verb is always mapped onto ‘0’. One of the most challenging facts is the apparently exceptional behaviour of affirmative imperatives, allowing examples like (11) to break the normal pattern in (12). (11) Donnez-le-moi! Give it to-me (12) Paul me le donnera. Paul to-me it will-give To start with I shall ignore affirmative imperatives as an exception, but the discussion will in fact lead to a very different view in which they actually represent the basic pattern - a complete reversal of the obvious analysis. Where do hostforms come from? It is easy to see that each clitic could be stored with a schematic hostform in which it already has its position, but suppose two clitics both depend on the same verb: if each of them has a different hostform, how will these combine to define their relative positions? The solution lies in the operation of binding that we have already invoked, and which binds selected concepts to other concepts to show that they are, in some sense, the same. The mechanism for binding is very general, applying in language to operations as diverse as anaphoric binding and parsing, not to mention a host of applications outside language; it relies heavily on activation levels. In the case of clitics, it merges any two hostforms that are highly active, so if we have two clitics, each with its own position in its own hostform, binding would merge the two hostforms into a single one which assigns them different positions - in other words, a 'clitic cluster'. For example, in the sequence la lui apportera, la and lui each need a hostform with a verb as its anchor, and at an early point in the process of speaking or hearing that is all the processor knows; but since they are both highly active at the same time, and their potential hostforms have compatible properties, and in any case there is only one candidate verb, the two hostforms get bound together. But even if we can assign all clitics to a single hostform, how will this help to position them before the verb? The answer is again obvious: the merged hostform has a schematic anchor which must be a verb, so we bind the most active verb to it. Notice that this mechanism only generates hostforms when they are needed by a clitic. The result is a single hostform which has bidirectional relations to each clitic but a single relation to the verb. For example, in the case of la lui apportera, this whole sequence is the hostform of the clitic la, with the latter as its '-4' (as explained below) and the verb as its anchor (labelled '0'). I can now start to illustrate the structures that I believe we need for special clitics in French. Figure 9 shows the structure for (13). (13) Il la leur présentera. he her to-them will-introduce He will introduce her to them. In the diagram, the node labelled ‘X’ is the merged host of the three clitics' realizations, and assigns each form a position number relative to the verb; '3SMS' is a syntactic representation of il ('third singular masculine subject'). It is clear how this structure will guarantee the right order of elements, but what is less clear is why the requirements of morphology should always override those of syntax (Sadock 1991). However intuitively obvious it may be, this doesn't seem to follow from the normal principles of default inheritance and requires more work on the underlying logic of the WG system. 3SMS 3PI 3SFO PRÉSENTE, future, 3sg realization {il} {la} -7 home {leur} -4 -3 {{{présent}{er}}{a}} 0 X Figure 9: Partial structure for Il la leur présentera. Linking the verb to the hostform allows its properties to affect those of the hostform. As mentioned earlier, affirmative imperative verbs affect clitics differently from other verbs, with enclitics rather than proclitics. To explain these differences we assign affirmative imperatives to a special kind of hostform, with different effects on ordering. This will solve the problem, but it also creates a different problem: how to map the clitics to the position slots. If {leur}, for example, inherits the label ‘-3’ because this is what it needs when it is attached before the verb, how can it have a different number such as ‘+2’ when it follows the verb? The answer is to introduce a more abstract set of relations to mediate between clitic forms and their position. Thus instead of assigning ‘-3’ directly to {leur}, we first identify {leur}, within the hostform, as the realization of ‘third-person non-reflexive indirect object’ (abbreviated to ‘3io’), and then link this category to different positions according to the type of hostform. This means that in Figure 9 we should add even more relations between ‘X’ and the clitics, including a ‘3io’ arrow from ‘X’ to {leur}, which would by default be paired with a ‘-3’ position. Exceptionally, however, the ‘3io’ arrow would be paired in a post-verb hostform with a ‘+2’ position. The latter analysis for (14) is shown in Figure 10. (14) Présentez la leur! introduce her to-them Introduce her to them! PRÉSENTE, imperative la leur {la} {leur} realization {{présent}{ez}} 3do 0 3io +1 host +2 X Figure 10: An affirmative imperative, Présentez la leur! We are now ready to reverse the obvious analysis, as promised earlier. Why should affirmative imperatives be different from all other verbs in their effect on clitics? Seen simply in terms of morphosyntactic features, this peculiarity is indeed peculiar. But seen in terms of clitics, these verbs have two important and highly relevant properties: they allow neither a subject pronoun nor a negative ne – the first two clitics in the ordering of Table 1. Both of these clitics have functional reasons for preceding the verb: The subject clitic precedes it because its position is one of the devices used to distinguish declarative and interrogative clauses (e.g. Tu m'aimes 'You love me' versus M'aimes-tu? 'Do you love me?) The negative clitic ne precedes the verb so that the verb can separate it from the post-verbal particle or pronoun with which it is normally paired (e.g. Tu ne m'aimes pas 'You do not love me'. Je ne dors jamais 'I never sleep'). Suppose we assume that these two clitics have to precede the verb, and, further, that all the clitics form a single clitic cluster. The result is that all the clitics are dragged before the verb to act as proclitics. These functional pressures are admittedly fairly weak, and in particular the pressure from the subject clitic, because this is only loosely attached to the other clitics. Not only can it be inverted in interrogatives without affecting the position of other clitics, as just mentioned, but unlike the objects it can be shared by two verbs (Bonami and Boyé 2007), as in (15) compared with (16). (15) Il lira ce livre et le critiquera. he will-read this book and it will-criticize ‘He will read this book and criticize it.’ (16) * Il le lira aujourd’hui et critiquera demain. he it will-read today and will-criticize tomorrow. ‘He will read it today and criticize it tommorrow.’ Nevertheless, it is clear, and generally agreed, that the subject pronouns really are clitics; for example, it is impossible to coordinate them (as in *Il et elle viennent, 'He and she come'). And given that the other clitics clearly form a clitic cluster, it seems reasonable to assume that a non-inverted subject is also part of this cluster. My suggestion, therefore, is that it is the subject and negative clitics that are responsible for the proclitic ordering, without which clitics would be enclitics. In other words, contrary to our assumptions so far, clitics are enclitics by default, but the default is overridden where the hostform contains a subject and/or negative. This suggestion would be quite counterintuitive if we applied it to verb classes, as there is no other reason for taking affirmative imperatives as the default inflection for verbs; but the suggestion actually applies not to verbs but to hostforms. Since hostforms are separate from their anchor verbs, it is quite possible that the default hostform has a non-default verb as its anchor. Further support for this analysis comes from the order in which enclitics appear with affirmative imperatives: exactly the same as the order found in syntax, where objects, complements and adjuncts follow the verb and only subjects precede it. Moreover, direct objects precede indirect objects; for example, the order in (17) is the same as the syntactically required order in (18), in contrast with the order found with proclitics as in (19) and (20). (17) Donnez le nous! 'Give it us!' (18) Donnez le livre à Paul! 'Give the book to Paul!' (19) Il nous le donnera. 'He will give it to us.' (20) *Il le nous donnera. 'He will give it to us'. What this means is that by default French clitics are actually simple clitics, following the ordinary word-order rules of syntax, rather than special clitics. It is only in the non-default case that they require special ordering rules. We now turn to another curiosity of Table 1: the fact that groups C (me, te, se, nous, vous) and E (lui, leur) cannot combine even though they occupy different position slots and are completely compatible in syntax. This odd restriction is illustrated in (21) to (24). (21) Je présenterai Jean à Marie. ‘I will introduce John to Mary.’ (22) Je le lui présenterai. ‘I will introduce him to her.’ (23) Je me présenterai à Marie. ‘I will introduce myself to Mary.’ (24) *Je me lui présenterai. ‘I will introduce myself to her.’ Combining a member of C with a member of E is just as impossible as combining two members of C, as in *Je me te présenterai. ‘I will introduce myself to you.’ In this case we may find an explanation in the abstract relations which allow the same clitic to occur either as a proclitic or as an enclitic. Following a suggestion made by Stump, suppose we assign C and E the same abstract category, which we can call 'io' since they are all able to act as indirect object, even though group C can also act as direct object. This shared category explains why C and E are mutually exclusive. As for the differences in position, these can be explained by taking one of the positions as the default, with the other as an exception. We shall see shortly that there are in fact good reasons to take E as the default for 'io', whose properties are overridden by pronouns in group C (called ‘1/2/r’ for ‘first- or second-person or reflexive’). If these suggestions are right, then the relevant grammar is shown, in part, in Figure 11. In words, enclitic pronouns, found in the enclitic hostform of an affirmative imperative, show the default ordering, with third-person direct objects before other pronouns (so-called ‘io’, which includes direct objects). All other kinds of verb have a proclitic hostform. This includes a subject (which may be responsible diachronically for the proclitic positioning) and locates first- and second-person, and reflexive, ‘io’ pronouns in a different position from non-reflexive third-person indirect objects (the default case). For simplicity, the diagram omits other clitics such as the negative. 3do enclitic hostform +1 io +2 subj -7 3do -4 proclitic hostform io -3 1/2/r -5 Figure 11: A grammar for French clitic pronouns The analysis so far has solved all the morphological problems, so we are ready to turn to the syntax. I have assumed so far that each clitic depends, in syntactic structure, directly on its host verb, but this need not be so. (25) Je les ai trouvés. ‘I have found them.’ (les depends on trouvés) (26) Tu la laisses les lire? ‘Do you let her read them?’ (27) Tu les lui laisses lire? ‘Do you let her read them?’ (les depends on lire) (28) Jean en mange beaucoup. ‘John eats a lot of it’ (en depends on beaucoup, ‘a lot’) (29) Jean lui a été fidèle. ‘John has been faithful to her.’ (lui depends on fidèle) In all these cases, the pronoun’s host is a finite verb higher up the dependency chain than the word on which it depends. This ‘clitic climbing’ is easy both to describe and to explain if we assume that some verbs, such as auxiliaries, have hostforms of their own, and that this hostform is available for binding to any available clitics as described earlier. Thus if ai in (25) has a hostform, and general binding tries to bind the hostform of les to any other active hostform then it will attach les to ai rather than to trouvés. A similar explanation will apply to verbs such as laisser, ‘let’ and faire, ‘make’ which allow 'clitic climbing'. In the case of laisser, this is optional so we get the choice between (26) and (27). In contrast, non-verbs such as beaucoup and fidèle don't have a hostform, so their clitic complement must attach to a verb as in (28) and (29). All these syntactic complications turn out to be rather simple. The proposed analysis of French clitics has solved all the problems that I listed earlier, but this was only possible because of certain theoretical premises of WG: the logic of default inheritance (which allows enclitics as default) the process of binding (which allows hostforms to bind to each other) the subclassification of relations (which allows '1/2/r' to be a subcase of the 'io' relation) the assumption that the grammar is sensitive to cognitive contexts such as activation levels. References Aronoff, Mark and Volpe, Mark 2006. 'Morpheme', in Keith Brown (ed.) Encyclopedia of Language and Linguistics, Second Edition. Oxford: Elsevier, pp.274-276. Barabasi, Albert L. 2009. 'Scale-Free Networks: A Decade and Beyond', Science 325: 412-413. Blevins, J. P. 2003. 'Stems and paradigms', Language 79: 737-767. Bonami, Olivier and Boyé, Gilles 2007. 'French pronominal clitics and the design of Paradigm Function Morphology', in Geert Booij (ed.) On-line Proceedings of the Fifth Mediterranean Morphology Meeting (MMM5) Fréjus 15-18 September 2005, University of Bologna. Bouma, Gosse 2006. 'Unification, Classical and Default', in Editor-in-Chief:-á-á Keith Brown (ed.) Encyclopedia of Language & Linguistics (Second Edition). Oxford: Elsevier, pp.231-238. Briscoe, Ted, Copestake, Ann, and De Paiva, Valeria 1993. Inheritance, defaults, and the lexicon. Cambridge England: Cambridge University Press Bybee, Joan 1995. 'Regular Morphology and the Lexicon', Language and Cognitive Processes 10: 425-455. Bybee, Joan 2010. Language, Usage and Cognition. Cambridge: Cambridge University Press Camdzic, Amela and Hudson, Richard 2007. 'Serbo-Croat Clitics and Word Grammar', Research in Language (University of Lodz) 4: 5-50. Carpenter, Bob 1992. The Logic of Typed Feature Structures. Cambridge: University of Cambridge Press Daelemans, W and Desmedt, K 1994. 'Default inheritance in an object-oriented representation of linguistic categories', International Journal of HumanComputer Studies 41: 149-177. Ellis, Nick 2002a. 'Frequency effects in language processing: a review with implications for theories of implicit and explicit language acquisition.', Studies in Second Language Acquisition 24: 143-188. Ellis, Nick 2002b. 'Reflections on frequency effects in language processing', Studies in Second Language Acquisition 24: 297-339. Ferreira, Fernanda 2005. 'Psycholinguistics, formal grammars, and cognitive science', The Linguistic Review 22: 365-380. Flickinger, D (1987). Lexical rules in the hierarchical lexicon. Stanford PhD dissertation. PhD dissertation, Stanford University. Frost, Ram, Deutsch, Avital, Gilboa, Orna, Tannenbaum, Michael, and MarslenWilson, William 2000. 'Morphological priming: Dissociation of phonological, semantic, and morphological factors', Memory & Cognition 28: 1277-1288. Gisborne, Nikolas 2010. The event structure of perception verbs. Oxford: Oxford University Press Hawkins, Roger and Towell, Richard 2001. French Grammar and Usage. London: Arnold Hudson, Richard 1984. Word Grammar. Oxford: Blackwell. Hudson, Richard 1990. English Word Grammar. Oxford: Blackwell. Hudson, Richard 2000. '*I amn't.', Language 76: 297-323. Hudson, Richard 2001. 'Clitics in Word Grammar', UCL Working Papers in Linguistics 13: 243-294. Hudson, Richard 2007. Language networks: the new Word Grammar. Oxford: Oxford University Press Hudson, Richard 2010. An Introduction to Word Grammar. Cambridge: Cambridge University Press Hudson, Richard and Holmes, Jasper 2000. 'Re-cycling in the Encyclopedia', in Bert Peeters (ed.) The Lexicon/Encyclopedia Interface. Amsterdam: Elsevier, pp.259-290. Jackendoff, Ray 2011. 'What is the human language faculty?: Two views', Language 87: 586-624. Lascarides, A., Briscoe, Ted, Asher, N, and Copestake, A. 1996. 'ORDER INDEPENDENT AND PERSISTENT TYPED DEFAULT UNIFICATION', Linguistics and Philosophy 19: 1-90. Levelt, Willem, Roelofs, Ardi, and Meyer, Antje 1999. 'A theory of lexical access in speech production', Behavioral and Brain Sciences 22, 1-45. Luger, George and Stubblefield, William 1993. Artificial Intelligence. Structures and strategies for complex problem solving. New York: Benjamin Cummings Marslen-Wilson, William 2006. 'Morphology and Language Processing', in Keith Brown (ed.) Encyclopedia of Language & Linguistics, Second edition. Oxford: Elsevier, pp.295-300. Miller, P. H. and Sag, Ivan 1997. 'French clitic movement without clitics or movement', Natural Language & Linguistic Theory 15: 573-639. Neef, Martin 2006. 'Declarative Morphology', in Editor-in-Chief:-á-á Keith Brown (ed.) Encyclopedia of Language & Linguistics (Second Edition). Oxford: Elsevier, pp.385-388. Onnis, Luca, Christiansen, Morten, and Chater, Nick 2006. 'Human Language Processing: Connectionist Models', in Keith Brown (ed.) Encyclopedia of Language & Linguistics (Second Edition). Oxford: Elsevier, pp.401-409. Pelletier, Jeff and Elio, Renee 2005. 'The case for psychologism in default and inheritance reasoning', Synthese 146: 7-35. Pinker, Steven 1998. 'Words and rules', Lingua 106: 219-242. Reisberg, Daniel 2007. Cognition. Exploring the Science of the Mind. Third media edition. New York: Norton Rosch, Eleanor 1978. 'Principles of categorization', in Eleanor Rosch & Barbara Lloyd (eds.) Cognition and categorization. Hillsdale, NJ: Lawrence Erlbaum, pp.27-48. Russell, Graham, Ballim, Afzal, Carroll, John, and Warwick-Armstrong, Susan 1993. 'A practical approach to multiple default inheritance for unification-based lexicons', in Ted Briscoe, Valeria De Paiva, & Ann Copestake (eds.) Inheritance, defaults and the lexicon. Cambridge: Cambridge University Press, pp.137-147. Sadock, Jerrold 1991. Autolexical Syntax: A theory of parallel grammatical representations. Chicago: University of Chicago Press Stump, Gregory 2001. Inflectional Morphology: A Theory of Paradigm Structure. Cambridge: Cambridge University Press Stump, Gregory 2006a. 'Paradigm Function Morphology', in Editor-in-Chief:-á-á Keith Brown (ed.) Encyclopedia of Language & Linguistics (Second Edition). Oxford: Elsevier, pp.171-173. Stump, Gregory 2006b. 'Template Morphology', in Keith Brown (ed.) Encyclopedia of Language & Linguistics. Oxford: Elsevier, pp.559-562. Taylor, John 1995. Linguistic Categorization: Prototypes in linguistic theory. Oxford: Clarendon Touretzky, David 1986. The Mathematics of Inheritance Systems. Los Altos, CA: Morgan Kaufmann Vogel, Carl 1998. 'A Generalizable Semantics for a Default Inheritance Reasoner', Zwicky, Arnold 1985a. 'Clitics and particles', Language 61: 283-305. Zwicky, Arnold 1985b. 'How to describe inflection', in Mary Niepokuj, Mary Van Clay, Vassiliki Nikiforidou, & Deborah Feder (eds.) Proceedings of the 11th annual meeting of the Berkeley Linguistics Society. Berkeley: Berkeley Linguistics Society, pp.372-386.