Cognitive linguistics and language structure Abstract Cognitive linguists agree that language is handled mentally by general cognitive structures and processes rather than by a dedicated mental module. However, in spite of remarkable progress in some areas, cognitive linguists have generally paid little attention to the possible implications of this ‘cognitive assumption’ for the theory of language structure – how language is organised and what structures we assign to utterances. Cognitive Grammar has avoided formalization, and the various versions of construction grammar have adopted rather conservative views on grammatical structure. The exception is Word Grammar, which offers a radical alternative view of language structure. This paper defends the structural claims of Word Grammar on the grounds that most of them follow logically from the cognitive assumption (though a few need to be revised). The paper starts by breaking this assumption into a number of more specific tenets relating to learning, network structures, ‘recycling’, inheritance, relations, activation and chunking. It then shows how these tenets support various claims that distinguish Word Grammar in cognitive linguistics. According to this argument, morphology and syntax are distinct levels, so language cannot consist of nothing but ‘symbols’ or ‘constructions’. Moreover, in syntax the main units – and possibly the only units – must be words, not phrases, so the basic relation of syntax is the dependency between two words, not the relation between a phrase and its part. In other words, sentence structure is dependency structure, not phrase structure – a network, as expected in cognition, not a tree. One of the benefits of this analytical machinery is in the treatment of the various patterns that have been called ‘constructions’, which benefit from the flexibility of a network structure. This kind of structure is also very appropriate for semantic analysis, illustrated here by two examples: the distinction between universal and existential quantifiers, and a detailed structural analysis of the meaning of the how about X? construction, covering its illocutionary force as well as deictic binding. Finally, the paper discusses a formal property, the order of words, morphs and so on, arguing that the constraints on order are expressed in terms of the ‘landmark’ relation of Cognitive Grammar, while the actual ordering requires the more primitive relation found in any ordered string, here called ‘next’. The paper explains how landmark relations can be derived from wordword dependencies in both simple and complex syntactic patterns, and why the words in a phrase normally stay together. 1. Introduction Now that cognitive linguistics (CL) has established itself as a valid and productive approach to the study of language, it is reasonable to ask what progress it has made on one of the traditional questions of linguistic theory: how language is structured. In particular, how is the answer to this question affected, if at all, by what we might call the Cognitive Assumption in (1)? (1) The only cognitive skills used in language are domain-general skills which are also used outside language. This unifying belief of all cognitive linguists has been expressed more or less pithily by others: ‘knowledge of language is knowledge’ (Goldberg 1995:5), we should ‘derive language from non-language’ (Lindblom and others 1984:187, quoted in Bybee 2010:12), ‘language is not an autonomous cognitive faculty’ (Croft and Cruse 2004:1). What difference does the Cognitive Assumption make to our ideas about how language is organised, compared with the alternative views in which language is seen either as having nothing to do with cognition, or as a separate module of cognition? The natural place to look for an answer is in the theoretical packages that address this question directly. The Oxford Handbook of Cognitive Linguistics lists three ‘models of grammar’ (Geeraerts and Cuyckens 2007): Cognitive Grammar, construction grammar (without capitals) and Word Grammar. Cognitive Grammar has not yet been developed into a sufficiently formal system because ‘the cost of the requisite simplifications and distortions would greatly outweigh any putative benefits’ (Langacker 2007: 423). Whatever the merits of this strategic decision, it means that we cannot expect a precise account of how language is organised, comparable to the accounts that we find in non-cognitive theoretical packages. When written without capitals, ‘construction grammar’ has sometimes been identified simply with ‘the cognitive linguistics approach to syntax’ (Croft and Cruse 2004:225). In his 2007 survey, Croft divides construction grammar into four versions (including Cognitive Grammar). The Fillmore/Kay ‘Construction Grammar’ (with capitals) is formally explicit, but makes very similar claims about language structure to the non-cognitive model Head-Driven Phrase Structure Grammar (HPSG; Pollard and Sag 1994). The Lakoff/Goldberg version is much less formally explicit, but offers syntactic analyses that are very similar to those of Construction Grammar (Croft 2007:486). Finally, Croft’s own Radical Construction Grammar does comprise original claims about language structure, but the arguments for these claims are only loosely related to the Cognitive Assumption, and indeed, I shall suggest in section 5 that they are incompatible with it. Construction grammarians agree in rejecting the distinction between grammar and lexicon. This is certainly an innovation relative to the recent American tradition (though Systemic Functional Grammar has recognised the grammar-lexicon continuum for decades under the term ‘lexicogrammar’ - Halliday 1961, Halliday 1966). Otherwise, however, Cognitive Grammar and the other versions of construction grammar make assumptions about language structure which are surprisingly conservative considering their radical criticisms of ‘main-stream’ linguistic theories. It would probably be fair to describe the assumed model of syntax as little more sophisticated than Zwicky’s ‘plain vanilla syntax’ (Zwicky 1985), and more generally the assumed grammatical model is little different from the American structuralism of the 1950s. The aim of this paper is to question some of these assumptions on the grounds that they conflict with the Cognitive Assumption. (There are also good ‘linguistic’ reasons for questioning them, but these arguments will be incidental to the main thrust of the paper.) The third ‘model of grammar’ recognised by the Oxford Handbook is Word Grammar. Not all cognitive linguists recognise Word Grammar as a part of cognitive linguistics; for instance, neither of the articles about the other two models mentions Word Grammar, nor is it mentioned at all in the 800 pages of Cognitive Linguistics: An introduction (Evans and Green 2006), and although Croft and Cruse 2004 mention it once, they do not regard it as an example of cognitive linguistics. However, if the Cognitive Assumption is critical, then Word Grammar definitely belongs to cognitive linguistics. Since the theory’s earliest days its publications endorse this assumption in passages such as the following: ‘... we should assume that there is no difference between linguistic and nonlinguistic knowledge, beyond the fact that one is to do with words and the other is not’ (Hudson 1984:36-7) To reinforce the link to early cognitive linguistics, this is coupled with an approving reference to Lakoff’s view: For me, the most interesting results in linguistics would be those showing how language is related to other aspects of our being human. (Lakoff 1977) By 1990, cognitive linguistics existed as such and is cited with approval in the next book about Word Grammar (Hudson 1990:8) in connection with ‘cognitivism’, one of the theory’s main tenets. By 2007 it was possible to write of cognitive linguistics that Word Grammar ‘fits very comfortably in this new tradition’ (Hudson 2007:2), and in 2010: ‘Like other ‘cognitive linguists’, I believe that language is very similar to other kinds of thinking’ (Hudson 2010:1). On the one hand, then, Word Grammar incorporates the same Cognitive Assumption as other cognitive theories. Moreover, it has been heavily influenced by the work of other cognitive linguists, such as Lakoff’s work (mentioned earlier) on prototypes, Langacker’s analyses of construal in languages such as Cora (Casad and Langacker 1985), Fillmore’s analyses of English lexical fields such as commercial transactions and risk (Fillmore 1982, Fillmore and Atkins 1992) and his joint work on constructions (Fillmore and others 1988, Kay and Fillmore 1999), and Bybee’s work on learning (Bybee and Slobin 1982). On the other hand, one of the distinctive characteristics of Word Grammar is its focus on questions of language structure – ‘formal’ questions about the ‘formal’ properties of language. Unfortunately, ‘formalism’ is associated in the literature of cognitive linguistics with Chomskyan linguistics (Taylor 2007:572), and as noted earlier, Cognitive Grammar has positively resisted formalisation as a dangerous distraction. Measured solely in terms of insights into the formal structure of language, Chomsky has a point when he claims that cognitive linguistics (as he understands it) ‘has achieved almost no results’ (Chomsky 2011). But there is no inherent incompatibility between the Cognitive Assumption and formalisation. After all, an interest in formal cognitive structures is the hallmark of Artificial Intelligence, and we have already noticed the similarity between Construction Grammar and the very formal theoretical work in HPSG. As in other kinds of linguistics, work is needed on both formal and informal lines, and progress depends on fruitful interaction between the two. This paper focuses on general formal questions about language stucture, to argue that the Cognitive Assumption actually leads to quite specific answers which are different from the ‘plain vanilla syntax’ which is generally taken for granted, and that these answers generally coincide with the claims of Word Grammar (though some revision is needed). The next section analyses the Cognitive Assumption into a number of more specific tenets that are relevant to language, and the following sections apply these assumptions to a number of questions about language structure: the nature of linguistic units, the relations between morphology and syntax, the status of ‘constructions’ and their relation to dependencies in syntax, the nature of meaning and the ordering of words. The last section draws some general conclusions. 2. The Cognitive Assumption unpacked If cognition for language shares the properties of the general-purpose cognition that we apply in other domains, the first question for cognitive linguistics is what we know about the latter. Of course, cognitive scientists know a great deal about cognition, so the immediate question is what they know that might have consequences for the theory of language structure. Most of the following tenets are recognised in any undergraduate course on cognitive psychology or AI as reasonably ‘mainstream’ views, even if they are also disputed; so I support them by reference to textbooks on cognitive psychology (Reisberg 2007) and AI (Luger and Stubblefield 1993). These tenets are also very familiar to any reader of this journal, so little explanation or justification is needed. The first relevant tenet of cognitive psychology might be called ‘the learning tenet’ and consists of a truism: (2) The learning tenet: We learn concepts from individual experiences, or ‘exemplars’. One conclusion that can be drawn from experimental results is that we learn by building ‘prototype’ schemas on the remembered exemplars, but without deleting the latter from memory (Reisberg 2007:321); and another is that schemas have statistical properties that reflect the quantity and quality of the experiences on which they are based. Thus it is not just language learning, but all learning, that is ‘usage-based’. The ‘network tenet’, also called the ‘network notion’ (Reisberg 2007:252), is this claim: (3) The network tenet: The properties of one concept consist of nothing but links to other concepts, rather than ‘features’ drawn from some separate vocabulary. In this view, concepts are atoms, not bundles or boxes with an internal structure. A concept is simply the meeting point of a collection of links to other concepts. Consequently, the best way to present an analysis of some area of conceptual structure is by drawing a network for it. Moreover, since a concept’s content is carried entirely by its links to other concepts, labels carry no additional information and are simply mnemonics to help researchers to keep track of the network models they build; so in principle, we could remove all the labels from a network without losing any information (Lamb 1998:59). An important corollary of the network tenet is that when we learn a new concept, we define it as far as we can in terms of existing concepts. This is such an important idea that we can treat it as a separate tenet: (4) The recycling tenet: Existing concepts are ‘recycled’ wherever possible as properties of other concepts. The recycling tenet explains the often observed fact that social and conceptual networks tend to be ‘scale-free’, meaning that they have more densely-linked nodes than would be expected if links were distributed randomly (Barabasi 2009). It also has important implications for the formal structure of language which we shall consider in relation to morphology (section 3) and semantics (section 6). Next, we have the ‘inheritance tenet’: (5) The inheritance tenet: We build taxonomies of concepts linked to one another by the special relation ‘isa’, which allow generalisations to spread down the taxonomy from more general to more specific concepts. The ‘isa’ relation is widely recognised as a basic relation ((Reisberg 2007:270), and at least in AI, this process of generalisation is called ‘inheritance’ (e.g. Luger and Stubblefield 1993: 386), so that more specific concepts can ‘inherit’ properties from more general ones. Many AI researchers accept the need for inheritance to allow exceptions, so that properties are inherited only by default. This logic is variously called ‘default inheritance’, ‘normal inheritance’ or ‘normal mode inheritance’, in contrast with ‘complete inheritance’ which forbids exceptions. If our basic logic is default inheritance, this explains one part of the learning tenet above: how we can accommodate both prototypical and exceptional members in the same category. The fifth relevant claim of cognitive psychology and AI concerns the classification of relations, so we can call call it the ‘relation tenet’: (6) The relation tenet: Most relations are themselves concepts which we learn from experience. On the one hand it is obvious that the links in a network are of different types; for instance, the links from the concept ‘dog’ to ‘animal’, ‘tail’ and ‘bark’ are fundamentally different from each other. On the other hand, we cannot assume that these different types are all ‘built in’, part of our inherited conceptual apparatus (Reisberg 2007: 270). A few of them must be built in, because they underlie our most basic logical operations, the clearest example being the ‘isa’ relation mentioned above; but most of them must be learned from experience just like ordinary concepts. One solution to this dilemma is to recognise these learned relations as a sub-type of concept: ‘relation’, contrasted with ‘entity’. The sixth tenet that is relevant to language structure is the ‘activation tenet’: (7) The activation tenet: Retrieval is guided by activation spreading from node to node. Indeed, the strongest evidence for the network tenet is the evidence for spreading activation, notably the evidence from priming experiments and from speech errors (Reisberg 2007:254). In any search, the winner is the most active relevant node, and, all being well, this will also turn out to be the target node. A node’s activation level is influenced in part by previous experience – hence the statistical differences between schemas noted earlier – but partly by the immediately current situation to which the person concerned is paying attention. Finally, we note the ‘chunking tenet’: (8) The chunking tenet: We understand experience by recognising ‘chunks’ and storing these in memory as distinct concepts (Reisberg 2007:173). For present purposes, the most important part of this claim is that we create new concepts for new experiences; so given a series of digits to memorize, we create a new concept for each digit-token as well as for each ‘chunk’ of digits. This is evidence for node-creation on a massive scale, though of course most of the nodes created in this way have a very short life. Those that survive in our memories constitute the exemplars of the learning tenet, and once in memory they are used in future experience as a guide to further node-creation. These seven elementary tenets of cognitive science are closely interconnected. Take, for example, the temporary exemplar nodes that are created under the chunking tenet. What the network tenet predicts is that these temporary nodes must be part of the conceptual network, since that is all there is in conceptual structure; so (by the recycling tenet) the only way in which we can understand an experience is by linking it to pre-existing nodes of the conceptual network. The activation tenet predicts that these nodes are highly active because they are the focus of attention. Moreover, the inheritance tenet predicts an ‘isa’ relation between each temporary node and some pre-existing node in the network from which it can inherit properties; this follows because the aim of classifying exemplar nodes is to inherit unobservable properties, and ‘isa’ is the relation that allows inheritance. As for the exemplar’s observable properties, these must also consist of links to pre-existing concepts; but according to the relation tenet, these links must themselves be classified, so we distinguish the very general ‘isa’ relation from much more specific relational concepts. To make these rather abstract ideas more concrete before we turn to specifically linguistic structures, consider a scene in which you see an object and recognise it as a cat. Although recognition is almost instantaneous, it must consist of a series of interconnected steps, with considerable feedback from logically later steps to earlier ones: node-creation: you create highly active temporary nodes for all the objects you can see, including not only the cat’s parts but also the ‘chunk’ that will eventually turn out to be a cat. classification: you link each temporary node to the most active pre-existing node in the network which classifies it as a paw, a tail, a cat and so on. enrichment: you enrich your knowledge of the cat by inheriting from ‘cat’ through highly active relation nodes; for example, if you want to know whether to stroke it, this is how you guess an answer. Each of these processes is covered by some of the seven tenets: node-creation by the learning, activation and chunking tenets; classification by the learning, recycling, network, relation and activation tenets, and enrichment by the learning and inheritance tenets. The rest of the paper will explore the consequences of these rather elementary ideas about general cognition for the theory of language structure, following much the same logical route as Langacker’s (Langacker 1987) but starting from general cognitive psychology rather than Gestalt psychology and ending up with more specific claims about language structure. Whether or not similar bridges can be built from psychology to other theories of language structure I leave to the experts in those theories. 3. Morphology and syntax One of the issues that divides grammatical theories concerns the fundamental question of how the patterning of language can be divided into ‘levels’ or ‘strata’, such as phonology, grammar and semantics. The basic idea of recognising distinct levels is uncontroversial: to take an extreme case, everyone accepts that a phonological analysis is different from a semantic analysis. Each level is autonomous in the sense that it has its own units – consonants, vowels, syllables and so on for phonology, and people, events and so on for semantics – and its own organisation; but of course they are also closely related so that variation on one level can be related in detail to variation on the other by ‘correspondence’ or ‘realisation’ rules. A much more controversial question concerns the number of levels that should be distinguished between phonology and semantics. Two answers dominate cognitive linguistics: none (Cognitive Grammar), and one (construction grammar). Cognitive Grammar recognises only phonology, semantics and the ‘symbolic’ links between them (Langacker 2007:427), while construction grammar recognises only ‘constructions’, which constitute a single continuum which includes all the units of syntax, the lexicon and morphology (Evans and Green 2006:753, Croft and Cruse 2004:255). In contrast, many theoretical linguists distinguish two intervening levels: morphology and syntax (Aronoff 1994, Sadock 1991, Stump 2001); and even within cognitive linguistics this view is represented by both Neurocognitive Linguistics (Lamb 1998) and Word Grammar. I shall now argue that the Cognitive Assumption in (1) supports the distinction between morphology and syntax. Consider first the chunking tenet (8). This effectively rules out reductionist analyses which reduce the units of analysis too far; so a reductionist analysis of the sequence ‘19452012’ recognises nothing but the individual digits, ignoring the significant dates embedded in it (1945, 2012). Psychological memory experiments have shown that human subjects are looking for significant ‘higher’ units and it is these units, rather than the objective string, that they recognise in memory experiments. The higher units are the ‘chunks’ which we actively seek, and which create a ‘higher’ level of representation by ‘representational redescription’ (Karmiloff-Smith 1992, Karmiloff-Smith 1994). A chunk based on a collection of ‘lower’ units is not evidence against the cognitive reality of these units, but coexists with them in a hierarchy of increasingly abstract levels of representation; so the individual digits of ‘1945’ are still part of the analysis, but the sequence is not ‘merely’ a sequence of digits – more than the sum of its parts. Chunking is really just one consequence of the recycling tenet (4), which says that a concept’s properties are represented mentally by links between that concept and other existing concepts. Rather obviously, this is only possible if the concepts concerned are represented by single mental nodes (even if these nodes are themselves represented at a more concrete level by large neural networks). To pursue the previous example, I must have a single mental node for ‘1945’ because otherwise I couldn’t link it to the concept ‘date’ or to events such as the end of WWII. But once again it is important to bear in mind that the existence of the concept ‘1945’ does not undermine the mental reality of the concepts ‘1’, ‘9’ and so on, which are included among its properties. Bearing chunking in mind, consider now Langacker’s claim that language consists of nothing but phonology, semantics and the links between them: ..., every linguistic unit has both a semantic and phonological pole... Semantic units are those that only have a semantic pole ... phonological units ... have only a phonological pole. A symbolic unit has both a semantic and a phonological pole, consisting in the symbolic linkage between the two. These three types of unit are the minimum needed for language to fulfill its symbolic function. A central claim ... is that only these are necessary. Cognitive Grammar maintains that a language is fully describable in terms of semantic structures, phonological structures, and symbolic links between them. (Langacker 2007:427) This claim ignores the effects of chunking. Take a simple example such as cat, pronounced /kat/. True, this is phonology, but it is not mere phonology. The ‘symbolic unit’ needs a single phonological pole, and in this simple case we might assume this is the syllable /kat/ - a single phonological unit – but meaning-bearing units need not correspond to single phonological units. Apart from the obvious challenge of polysyllabic words, the literature of linguistics is full of examples where phonological boundaries are at odds with word boundaries, so that (unlike the case of cat) words cannot be identified as a single phonological unit. An extreme case of this mismatch arises in cliticization; for example, the sentence You’re late clearly contains a verb, written as ’re, and has the same meaning and syntax as You are late. And yet, in a non-rhotic pronunciation there is no single phonological unit that could be identified as the ‘phonological pole’ of the symbolic unit corresponding to are. The words you’re sound different from you, but, given the absence of /r/, the difference lies entirely in the quality of the vowel. No doubt the phonological analysis could be manipulated to provide a single unit, such as an abstract feature, but this would be to miss the point, which is that phonological analysis pays attention to completely different qualities from grammatical analysis. In short, chunking creates mental nodes which are only indirectly related to phonological structure; so we must recognise units corresponding not only to syllables such as /kat/, but also to sequences of syllables such as /katǝgri/ (category) or parts of syllables such as the vowel quality of /jɔ:/ (you’re). These more abstract units, like the dates discussed earlier, are mapped onto more concrete units by properties; so just as ‘1948’ is mapped onto ‘1’ and so on, symbolic units can be mapped onto phonological structures such as /kat/, /katǝgri/ or /jɔ:/. But this is very different from saying that the chunk is a phonological structure. In fact, this view is just the same as the traditional view that words mediate between phonology and meaning. The chunk is the word, and it exists precisely because it serves this mediating function: it brings together a recognisable stretch of phonology and a recognisable bit of meaning which systematically cooccur. And it brings sound and meaning together by having both of them as its properties; so in traditional terms, the word cat is pronounced /kat/ and means ‘cat’. But of course once we have recognised the category of ‘words’, and allowed them to have two properties – a pronunciation and a meaning – there is nothing to stop us from treating them just like any other type of concept, with multiple properties. So just as linguists have been saying for centuries, words can be categorized (as nouns, verbs and so on), words have a language (such as English or French), words have a spelling, they can be classified as naughty or high-fallutin’, and they even have a history and an etymology. This view of words as collecting points for multiple properties is an automatic consequence of chunking combined with the network principle. None of this is possible if a symbolic unit is simply a link between a semantic pole and a phonological pole. A link is a property, but it cannot itself have properties; so even if there is a link between /kat/ and ‘cat’, it cannot be extended to include, say, a spelling link; the only way to accommodate spelling in this model would be to add another symbolic link between ‘cat’ and the spelling <cat>, as though /kat/ and <cat> were mere synonyms. In contrast, recognising words allows us to treat spellings and pronunciations, correctly, as alternative realisations of the same unit. Moreover, if meanings and pronunciations belong to a much larger set of properties, we can expect other properties to be more important in defining some words or word classes. Rather surprisingly, perhaps, Langacker himself accepts that some grammatical classes may have no semantic pole: At the other end of the scale [from basic grammatical classes] are idiosyncratic classes reflecting a single language-specific phenomenon (e.g. the class of verbs instantiating a particular minor pattern of past-tense formation). Semantically the members of such a class may be totally arbitrary. (Langacker 2007:439) Presumably such a class is a symbolic unit (albeit a schematic one), but how can a symbolic unit have no semantic pole, given that a unit without a semantic pole is by definition a phonological unit (ibid: 427)? My conclusion, therefore, is that Langacker’s ‘symbolic units’ are much more than a mere link between meaning and sound: they are units with properties – concepts, defined like all other concepts by their links to other concepts. Following the network tenet that structures can be represented as networks of atomic nodes, this conclusion can be presented as a rejection of the first diagram in Figure 1 in favour of the second: ‘cat’ ‘cat’ CAT /kat/ /kat/ Figure 1: Symbolic units as concepts rather than links Suppose, then, that symbolic units are like the unit labelled ‘CAT’. If so, they are indistinguishable from ‘constructions’ as defined by construction grammar. This brings us to the second issue of this section: how well does the notion of ‘construction’ accommodate the traditional distinction between morphology and syntax? As mentioned earlier, one of the claims of construction grammar is that constructions comprise a single continuum which includes all the units of syntax, the lexicon and morphology. Croft describes this as ‘one of the fundamental hypotheses of construction grammar: there is a uniform representation of all grammatical knowledge in the speakers’s mind in the form of generalized constructions’ (Croft 2007:471). According to Croft, the ‘syntax-lexicon continuum’ includes syntax, idioms, morphology, syntactic categories and words. This generalisation applies to Cognitive Grammar as well as construction grammar: ‘the only difference between morphology and syntax resides in whether the composite phonological structure ... is smaller or larger than a word.’ (Langacker 1990:293). Like other cognitive linguists I fully accept the idea of a continuum of generality from very general ‘grammar’ to a very specific ‘lexicon’; but the claim that syntax and morphology are part of the same continuum is a different matter. To make it clear what the issue is, here is a diagram showing an analysis of the sentence Cats miaow in the spirit of construction grammar. The diagram uses the standard ‘box’ notation of construction grammar, but departs from it trivially in showing meaning above form, rather than the other way round. ‘cats miaow’ ‘cats’ ‘cat’ ‘miaow’ plural miaow s cat cats cats miaow Figure 2: Cats miaow: a construction analysis in box notation We must be careful not to give too much importance to matters of mere notation. According to the network tenet (3), all conceptual knowledge can be represented as a network consisting of nothing but links and atomic nodes; this rules out networks whose nodes are boxes with internal structure, such as the one in Figure 2. However, a box is just a notation for part-whole relations, so it will be helpful to remind ourselves of this by translating this diagram into pure-network notation, with straight arrows pointing from wholes to their parts. Bearing this convention in mind, the next figure presents exactly the same analysis as Figure 2: ‘cats miaow’ ‘miaow’ ‘cats’ ‘cat’ plural cat s miaow cats Cats miaow. Figure 3: Cats miaow: a construction analysis in network notation As before, the vertical lines show the ‘symbolic’ link between each form and miaow its meaning, and the diagram illustratescats Croft’s generalisation about constructions providing a single homogeneous analysis for the whole of grammar, including the morphology inside cats as well as the syntax that links cats to miaow. His claim fits well with the ‘plain vanilla’ American Structuralist tradition in which morphology is simply syntax within the word – a tradition represented not only by pre-Chomskyans (Harris 1951, Hockett 1958) but also by Chomsky himself in his famous analysis of English auxiliaries as two morphemes (e.g. be+ing), and by Distributed Morphology (Halle and Marantz 1993). But how well does it mesh with the Cognitive Assumption (and indeed, with the linguistic facts)? I shall now present an objection to it based on the prevalence of homonymy. The argument from homonymy is based on the recycling tenet (4) and goes like this. When we first meet a homonym of a word that we already know, we don’t treat it as a completely unfamiliar word because we do know its form, even though we don’t know its meaning. For instance, if we already know the adjective ROUND (as in a round table), its form is already stored as a ‘morph’ – a form which is on a higher abstraction level than phonology; so when we hear Go round the corner, we recognise this form, but find that the expected meaning doesn’t fit the context. As a result, when creating a new word-concept for the preposition ROUND we are not starting from scratch. All we have to do is to link the existing form to a new word. But that means that the existing form must be conceptually distinct from the word – in other words, the morph {round} is different from the words that we might write as ROUNDadj and ROUNDprep. Wherever homonymy occurs, the same argument must apply: the normal processes of learning force us to start by recognising a familiar form, which we must then map onto a new word, thereby reinforcing a structural distinction between the two levels of morphology (for forms) and syntax (for words), both of which are different from phonology and semantics. The proposed analysis for the homonymy of round is sketched in Figure 4. ‘around’ ‘rotund’ ROUNDadj ROUNDprep /au/ /n/ syntax morphology {round} /r/ semantics /d/ phonology Figure 4: Homonymy and the levels of language It might be thought that this conclusion could be avoided simply by linking the form {round} directly to its two different meanings. But this won’t do for purely linguistic reasons: because the different meanings are also associated with very different syntactic properties, so distinct concepts are needed to show these correlations. The problem is especially acute in the case of bilingual individuals, who may link the same meaning to words from different languages where the syntactic and morphological differences are even greater. The fact is that round is not mere phonology, because it is a listed, and recognised, ‘English word’ – i.e. a word-form. But this word-form is actually shared by (at least) two distinct words, each with its different syntactic properties and each distinct from the meaning that it expresses. In short, we need to distinguish the forms of morphology from the words of syntax. Moreover, the relation between a word and its form is different from the part-whole relations that are recognised within both morphology and syntax. It makes no sense to say that {round} is a part of ROUND for the simple reason that, if we can measure ‘size’ at all, they are both the same size at least in the sense that they both map onto the same number of phonological segments. Rather, the relation between form and word is the traditional relation called ‘realization’, where the form ‘realizes’ the word (by making it more ‘real’, or more concrete). The psychological reality of morphological form is evident in another area of language-learning: ‘folk etymology’, where we try to force a new form into an existing pattern, regardless of evidence to the contrary. For instance, our word bridegroom developed out of Old English bryd-guma when the word gyma ‘man’ fell out of use, leaving the guma form isolated. The form groom had the wrong meaning as well as the wrong phonology, but it had the great virtue of familiarity – i.e. it already existed in everybody’s mind as an independent concept – so it was pressed into service (‘recycled’) for lack of a better alternative. The fact that folk etymology has affected so much of our vocabulary, and continues to do so, is clear evidence of our enthusiasm for recycling existing forms so that every word consists of familiar forms, regardless of whether this analysis helps to explain their meaning. This conclusion about morphology and syntax undermines the main distinctive claim of construction grammar. According to the construction-based analysis of Cats miaow in Figure 3, the relation of the morphs cat and s to the word cats is just the same as the latter’s relation to the entire sentence, namely a part-whole relation. But if the argument from homonymy is correct, the morphs {cat} and {s} exist on a different level of analysis from the word cats (which we might write, to avoid ambiguity, as CAT:plural); and the morphs’ relation to the word is different from the word’s relation to the rest of the sentence. In short, grammatical structure within the word is not merely a downwards extension of grammatical structure above the word; as has often been pointed out by morphologists since Robins (Robins 2001), morphology is not just syntax within the word. It could be objected that languages vary much more than we might expect (if morphology and syntax are fundamentally different) in the sheer amount of morphology that they have, with ‘analytical’ languages such as Vietnamese showing virtually none. How can a language have no inflectional morphology at all if morphology is a logical necessity in any language? This objection misses the point of the argument, where there was no mention of the morphological processes or patterns that we associate with inflectional morphology (or, for that matter, with derivational morphology). The only claim is that if a language has homonyms, then it will also distinguish words from the forms that realize them, however simple or complex those realization relations may be. We can push the argument a step further by questioning the constructional claim that every unit of grammar has a meaning. This claim is explicit in Figure 3, where the form s means ‘plural’. In contrast, the two-level analysis of Figure 4 has no direct link between the morph {round} and either meaning, and a similar analysis of cats would only link {s} to ‘plural’ via the word CAT:plural. This separation of form from meaning follows from the homonymy argument: if meanings are correlated (as they are) with syntactic behaviour, they must be a property of words, not forms; so homonyms must be words that share the same form, not meanings that (directly) share the same form. So far, then, the argument seems to support a rather traditional four-level analysis of language in which semantic structures relate to words, which relate to morphs, which relate to phones (or whatever the units of phonological structure may be). However, one of the weaknesses of the traditional view of language architecture is its restrictiveness; it not only claims that meanings are related to words, but it also claims that they cannot be related to morphs or to phones. This is psychologically very implausible, for the simple reason that we learn by recording correlated patterns, so if a morph correlates with a particular meaning, there is nothing to prevent us from learning the correlation as a link between the two. So if the morph {un} correlates with the meaning ‘not’, it seems likely that we will record this link in addition to any more specific link that we may establish between ‘not’ and words such as UNTIDY. There is no reason to believe that we avoid redundancy – in fact, redundancy is to be expected in any adaptive system - so we can assume considerable direct linkage between morphs and meanings. Moreover, one of the continuing embarrassments for the traditional model has always been the existence of ‘phonesthemes’ and other kinds of sound symbolism, such as the meaning ‘indolence or carelessness’ shared by words such as slack, slattern, sleazy, slob, slut (Reay 2006). Such examples seem to show a direct connection between phonological patterns and meanings – a connection which is even more obvious in the case of intonation, where neither morphs nor words are available to mediate the link to meaning. The conclusion of this argument, then, is that homonymy automatically pushes learners to create separate mental representations for morphs and for words. Typically, it is words that are directly linked to meanings, while morphs only have meanings indirectly by virtue of the words that they help to realize, and phones are even more indirectly related to meanings via morphs and words; but exceptions can be found in which morphs, or even phones, have meanings. This is a very different model from construction grammar, in which a construction is by definition a pairing of a meaning with a form, and words and morphs co-exist on the same level as ‘forms’. Nor, on the other hand, is it quite the same as published versions of Word Grammar (Hudson 2007, Hudson 2010), which all assume that only words can have meanings. Encouragingly, the argument from homonymy takes us to exactly the same conclusion that many linguists reached in the 1950s by simply looking at the facts of language. In those days the choice was between the American Structuralist approach called ‘Item and Arrangement’ (with its process-based alternative called ‘Item and Process’) and the European approach called ‘Word and Paradigm’ (Robins 2001). The argument centred round the place of the word in grammatical analysis, with the Americans tending to deny it any special place and the Europeans making it central; for the Americans, the grammar within the word is simply a downwards extension of syntax, whereas for the Europeans it was different. The European argument centred on languages such as Latin in which it is very easy to show that morphs have no direct or simple relation to meanings; for example, in the Latin verb amo, ‘I love’, the suffix {o} expresses person, number and tense all bundled up together. The conclusion is that, unlike words, morphs have no meaning in themselves; instead, they help to realise word-classes (such as ‘first person singular present tense’), and it is these that have meaning. Of course this is no longer a debate between Europe and America, as many leading American morphologists accept the same conclusion (Aronoff 1994, Sadock 1991, Stump 2001); but it is very noticeable that the literature of construction grammar follows the American Structuralist path rather than engaging with the debate. Equally encouragingly, the separation of phonology, morphology and syntax is confirmed by psycholinguistic priming experiments, which offer a tried and tested way to explore mental network structures (Reisberg 2007:257). In a priming experiment, the dependent variable is the time (in milliseconds) taken by a person to recognise a word that appears on a screen in front of them, and to make some decision about it which depends on this recognition; for example, as soon as the word doctor appears, the subject may have to decide whether or not it is an English word and then press one of two buttons. The independent variable is the immediately preceding word, which may or may not be related to the current word; for instance, doctor might follow nurse (related) or lorry (unrelated). What such experiments show is that a related word ‘primes’ the current word by making its retrieval easier, and therefore faster; and crucially, they allow us to distinguish the effects of different kinds of priming: semantic (nurse – doctor: Bolte and Coenen 2002, Hutchison 2003) lexical (doctor – doctor; Marslen-Wilson 2006) morphological (factor – doctor; Frost and others 2000) phonological (nurse – verse; Frost and others 2003) Once again, therefore, we have evidence for a three-level analysis which breaks down the general notion of ‘form’ into three kinds of structure, each with its own distinctive units: phonology, morphology and syntax (whose units are words). This is the architecture claimed in Word Grammar (and many other theories of grammar), but it conflicts with construction grammar and even more so with Cognitive Grammar. One possible objection to this line of argument is that it could be the thin end of a very large wedge which would ultimately reduce the argument to absurdity. Why stop at just the three formal levels of phonology, morphology and syntax? Why not recognise four or five levels, or, for that matter, forty or fifty? This question requires a clear understanding of how levels are differentiated: in terms of abstractness, and not in terms of either size or specificity. For instance, the only difference between the word DOG, the morph {dog} and the syllable /dɒg/ is their abstractness, because they all have the same size and the same degree of specificity. DOG has properties such as its meaning and its word-class which are more abstract than those of {dog}, which in turn are more abstract than those of /dɒg/. The question, therefore, is whether three levels of abstractness is in some way natural or normal. In relation to English, might we argue for further levels? Are there other languages where the evidence points to many more levels? Maybe, but it seems unlikely given that we already have evidence from massively complicated morphological systems that a single level of morphology is enough even for them (Sadock 1991). 4. Phrases or dependencies Another structural issue which has received very little attention in the cognitive linguistics literature is the choice between two different views of syntax. Once again, cognitive linguistics is firmly located in the tradition of American Structuralism rather than in its European rival. The American tradition dates from the introduction of ‘immediate constituent analysis’, which inspired modern Phrase Structure Grammar (PSG; Bloomfield 1933, Percival 1976). In contrast, the European tradition of Dependency Grammar (DG) dates back a great deal further, and certainly at least to the Arabic grammarians of the 8th century (Percival 1990, Owens 1988), and provided the basis for a great deal of school grammar on both sides of the Atlantic. Among the theories of grammar that are aligned with cognitive linguistics, Word Grammar is the only one that even considers Dependency Grammar as a possible basis for syntax. The essential difference between the two approaches lies in their assumptions about what relations can be recognised in syntax. For PSG, the only possibility is the very elementary relation of meronymy: the relation between a whole and its parts. This restriction follows automatically from the definition of a phrase-structure tree as equivalent to a bracketed string. Brackets are very elementary devices whose purpose is simply to distinguish parts from non-parts; for instance, the brackets in ‘a [b c]’ show that b and c are both parts of the larger unit ‘b c’, but a is not. A bracketed string is inherently incapable of giving information about the relations between parts and non-parts (e.g. between a and b). In contrast, DG focuses on the relations between individual words, recognising traditional relations such as ‘subject’, ‘complement’ and ‘modifier’. To take a very simple example, consider Small birds sing. PSG recognises the phrase small birds as well as the whole sentence, but it recognises no direct relation at all between small and birds, or between birds and sing. In contrast, the individual words and their direct relations are all that a DG recognises, although the phrase small birds is implicit in the link between small and birds. Of course, there are versions of PSG which include the traditional relations as an enrichment of the basic part-whole relation, so that small birds is recognised explicitly as the subject of the sentence. This is true in Lexical Functional Grammar and Head-Driven Phrase Structure Grammar, as well as in other ‘functional’ theories such as Relational Grammar and Systemic Functional Grammar. More relevantly here, it is also a characteristic of construction grammar (except Radical Construction Grammar). But this compromise should not obscure the fact that the relations concerned are still basically part-whole relations. All these versions of PSG, including those found in CL, still exclude direct relations between words, such as those between small and birds and between birds and sing. To clarify the issues it will again help to consider concrete diagrammatic structures, so here are three diagrams for the syntactic structure of Small birds sing. (A) is pure PSG, without function labels; (B) is a compromise analysis which is at least within the spirit of construction grammar in terms of the information conveyed, even if tree notation is not popular in the cognitive linguistics literature; and (C) is an example of DG enriched with function labels. The ‘stemma’ notation in (C) was introduced by the first DG theorist (Tesnière 1959) and is widely used in DG circles. It has the advantage of comparability with the tree notation of PSG, but I shall suggest a better notation for dependencies below. All three diagrams show syntax without semantics, but this is simply because the present topic is syntactic relations. This is also why the nodes are unclassified. small birds sing small birds sing subject subject small birds sing small birds sing modifier small birds (A ) sing birds modifier birds small (B) small (C) Figure 5: Phrase structure or Dependency structure The issue can be put in concrete terms as two related questions about the example Small birds fly: What is the subject, and what is it the subject of? In the European dependency tradition, the subject is birds, and it is the subject of fly. In contrast, the American PSG tradition takes small birds as the subject of the sentence Small birds fly. The PSG tradition has had such an impact on grammatical theory that most of its followers take it as obviously true, but it has serious weaknesses, and especially so if we start from the Cognitive Assumption, so it is especially problematic for cognitive linguistics. This is already recognised in Cognitive Grammar: Symbolic assemblies exhibit constituency when a composite structure ... also functions as component structure at another level of organization. In Cognitive Grammar, however, grammatical constituency is seen as being variable, nonessential and nonfundamental. (Langacker 2007:442) I shall start with the specifically cognitive weaknesses, before turning to more familiar ‘purely linguistic’ weaknesses. From a cognitive point of view, PSG has two weaknesses, to do with relations and with tokens. The first weakness is the extreme poverty of its assumptions about possible relations, in contrast with the assumption accepted by almost every psychologist that cognitive structure is an associative network (Ferreira 2005). By excluding all relations except that between a whole and its parts, it excludes a great deal of normal cognitive structure – and not least, the whole of social structure, based as it is entirely on relations such as ‘mother’ and ‘friend’. The relation tenet (6) asserts that relations of many different types can be learned and distinguished from one another, so there is no reason to prioritise the part-whole relation. And if other relations are possible, then we can freely relate any object to any other object. So just as we can relate one human to another in our social world, in the realm of syntax we can relate one word directly to another. Although Bloomfieldian Immediate Constituent Analysis had roots in German psychology, Chomsky’s formalisation in terms of bracketed strings and rewrite rules removed any semblance of psychological plausibility. The second cognitive weakness of PSG is its assumption about tokens, i.e. about the sentences that a grammar generates. The symbols in a sentence structure are seen as mere copies of those used in the grammar; so if the sentence structure contains, say, the symbol N (for ‘noun’), this is the same symbol as the N in the grammar; and the symbol birds in the sentence is the same as the one in the grammar. Once again we find a very impoverished view of cognitive capacity, in which the only operation that we can perform is to copy symbols and structures from the grammar onto the sentence structure. In contrast, the chunking tenet (8) says that when we encounter a new experience, we build a new node for it, and use relevant parts of our stored knowledge to enrich this node. In this view, the nodes for tokens are much more than mere copies of those in the grammar for the relevant types; although the tokens inherit most of the properties of the types, they also have a great many other properties, reflecting the particularities of their context. If the sentence contains the word birds, this is not just the stored word birds, less still the lexeme BIRD or the category ‘plural’; instead, it is a distinct concept from all these, with properties that include the other words in the sentence. Suppose, then, that we take the relation and chunking tenets seriously. What mental structure would we expect for the sentence Small birds sing? First, small and birds are clearly related, so we expect a direct relation between these words; equally clearly, the relation between small and birds is asymmetrical because small modifies the meaning of birds, rather than the other way round. In traditional terminology, small is the ‘modifier’ and depends on birds, which is the implied phrase’s head. But if the meaning of birds is modified by small, it follows that birds – i.e. this particular token of birds – does not mean ‘birds’, but means ‘small birds’; so there is no need to postulate a larger unit small birds to carry this meaning. In other words, the meaning of a phrase is carried by its head word, so the phrase itself is redundant. Furthermore, the link between small and birds explains why they have to be positioned next to each other and in that order. We return to questions of word order in section 7, but we can already see that direct word-word links will play an important part in explaining word order. Similar arguments apply to the subject link. Once again, this is a direct asymmetrical link between two single word-tokens: birds (not: small birds) and sing, bearing in mind that birds actually means ‘small birds’. As in the first example, the dependent modifies the head’s meaning, though in this case the kind of modification is controlled by the specifics of the dependency, traditionally called ‘subject’. Given, then, that birds is the subject of sing, and that birds means ‘small birds’, it follows that the word-token sing doesn’t just mean ‘sing’, but means ‘small birds sing’. So once again, the head word (sing) carries the meaning of the whole phrase, and no separate phrasal node is needed. In this case, the ‘phrase’ is the entire sentence, so we conclude that the sentence, as such, is not needed. The proposed analysis is presented in Figure 6. meaning ‘small birds sing’ sing meaning ‘small birds’ subject birds modifier small Figure 6: Small birds sing: syntax and semantics This diagram includes the earlier ‘stemma’ notation for syntactic relations alongside a more obvious notation for the ‘meaning’ relation. This is inconsistent, as the ‘meaning’ relation and the syntactic dependency relations look more different than they should. After all, part of the argument above in favour of dependency analysis is that our conceptual apparatus already contains plenty of relations of various kinds in domains outside language, so the same apparatus can be applied to syntax. The trouble with ‘stemma’ notation is that it is specially designed for syntax, so it obscures similarities to other domains. In contrast, the notation for the ‘meaning’ relation is simply a general-purpose notation for relations as defined by the relation tenet of (6). This general-purpose notation works fine for syntax, and indeed avoids problems raised by the rather rigid stemma notation, so we now move to standard Word Grammar syntactic notation, as in Figure 7, with arrows pointing from a word to its dependents. ‘small birds sing’ ‘small birds’ meaning meaning modifier subject Small birds sing. Figure 7: Small birds sing: Word Grammar notation This notation should appeal to cognitive linguists looking for an analysis which treats syntax as an example of ordinary conceptual structures. After all, it makes syntax look like a network, which (according to the network tenet of (3)) is what ordinary conceptual structure is like; in contrast, the notation used for syntax in the cognitive linguistics literature is hard to relate to general conceptual structure. Admittedly, the structure in Figure 7 may not look obviously like a network, but more complicated examples certainly do. The sentence in Figure 8 makes the point, since every arrow in this diagram is needed in order to show well-know syntactic patterns such as complementation, raising or extraction. This is not the place to justify the analysis (Hudson 2007, Hudson 2010), but it is worth pointing to one feature: the mutual dependency between which and do. If this is indeed the correct analysis, then the case for dependency structure is proven, because mutual dependency is impossible in PSG. (And even assuming dependency structure, it is impossible in stemma notation.) In the diagram, ‘s’ and ‘c’ stand for ‘subject’ and ‘complement’. The notational convention of drawing some arrows above the words and others below will be explained in the discussion of word order in section 7. extractee predicative c Which object c adjunct s birds do you extractee s think sing best? extractee & s Figure 8: A complex syntactic network Another way to bring out the network-like nature of syntactic structure is to go beyond the structure of the current tokens in order to show how this structure is derived from the underlying grammar. Returning to the simpler example, Small birds sing, we know that birds can be the subject of sing because the latter requires a subject, a fact which is inherited from the ‘verb’ node in the grammar; and similarly, small can be the dependent of birds because it is an adjective, and adjectives are allowed to modify nouns. The word tokens inherit these properties precisely because they are part of the grammar, thanks to ‘isa’ links to selected word types. This is all as predicted by the inheritance tenet (5), and can be visualised in the diagram below, where the small triangle is the Word Grammar notation for the ‘isa’ relation. In words, because the token small isa the lexeme SMALL which isa adjective, and because an adjective is typically the modifier of a noun (represented by the left-hand dot, which isa noun), it can be predicted (by inheritance) that small is also the modifier of a noun, which (after some processing) must be birds. A similar logic explains the subject link from sing to birds. The main point is that the syntactic structure for the sentence is a small part of a much larger network, so it must itself be a network. modifier adjective SMALL Small subject • noun verb • BIRD Plural birds SING sing. Figure 9: Small birds sing and its underlying grammar The main conclusion of this section is that syntax looks much more like general conceptual structure if we allow words to relate directly to one another, as in dependency analysis, than if we apply a rigid PSG analysis. Dependency structure is very similar to many other areas of conceptual structure, such as social structure, whereas it is hard to think of any other area of cognition which allows nothing but part-whole relations. To make a rather obvious point, it is arguably PSG that has encouraged so many of our colleagues to accept the idea that language is unique; but PSG is simply a method of analysis, not a demonstrated property of language. The only way to prove that PSG is in fact correct is to compare it with the alternatives, especially dependency analysis; but this has hardly happened in the general literature, let alone in the cognitive linguistics literature. Quite apart from the cognitive arguments for direct dependencies between words. it is easy to find ‘purely linguistic’ evidence; for example, most lexical selection relations involve individual words rather than phrases. Thus the verb DEPEND selects the preposition ON, so there must be a direct relation between these two words which is impossible to show, as a direct relation, in PSG. In the next section I shall show how this advantage of dependency structure applies in handling constructions. However, it is important once again not to lurch from one extreme position to its opposite. I have objected to PSG on the grounds that it rules out direct links between words, a limitation which is arbitrary from a cognitive point of view; but it would be equally arbitrary to rule out ‘chunking’ in syntax. In our simple example, maybe we recognise small birds as a chunk as well as recognising the relations between its parts. Given that we clearly do recognise part-whole relations in other areas of cognition, it would be strange if we were incapable of recognising them in syntax. Even in the area of social relations, we can recognise collectivities such as families or departments as well as relations among individuals. And in syntax there do seem to be some phenomena which at least seem at first sight to be sensitive to phrase boundaries. One such is mutation in Welsh (Tallerman 2009) which seems to be triggered by the presence of a phrase boundary and cannot be explained satisfactorily in terms of dependencies. As we have already seen, there is no reason to think that cognition avoids redundancies; indeed, it is possible that gross redundancy is exactly what we need for fast and efficient thinking. This being so, we cannot rule out the possibility that, even if syntactic structure includes direct word-word dependencies, it also includes extra nodes for the phrases that these dependencies imply (Rosta 1997, Rosta 2005). 5. Idioms and constructions I have argued for a rather traditional view of language structure in which words are central both by virtue of being the units that define a level of language which is distinct from phonology and morphology, and also as the main (and perhaps only) units on that level. My main argument was based on the Cognitive Assumption and its tenets, but I also showed that the conclusion is required by more traditional types of evidence. We now consider how this model of structure accommodates the grammatical patterns that are so familiar from the constructional literature: idioms (9), (10), clichés or formulaic language (11), meaning-bearing constructions (12), and non-canonical constructions (13). (9) He kicked the bucket. (10) He pulled strings. (11) It all depends what you mean. (12) Frank sneezed the tissue off the table. (13) How about a cup of tea? All such examples show some kind of irregularity or detail that supports the idea of usage-based learning (rather than mere ‘triggering’ of an inbuilt propensity), which in turn is predicted by the learning tenet (2) and the inheritance tenet (5). But if the proposed view of language structure makes these patterns relatively easy to accommodate, this will also count as further ‘purely linguistic’ evidence for this view. The following comments build on a number of earlier discussions of how Word Grammar might handle constructions (Gisborne 2009, Gisborne 2011, Holmes and Hudson 2005, Hudson 2008, Sugayama 2002). One unifying feature of all the patterns illustrated in these examples is that they involve individual words rather than phrases. This is obvious in most of the examples; for instance, the meaning ‘die’ is only available if the verb is KICK combined with BUCKET. An apparent counterexample is (12), from Goldberg 1995:152, where it is given as an example of the ‘caused-motion construction’. According to Goldberg, ‘the semantic interpretation cannot be plausibly attributed to the main verb’, but this is not at all obvious. After all, it is precisely because the verb sneezed describes an action that it can be turned into a causative, and it is because it needs no object in its own right that an extra one can be added. I shall argue below that this is a straightforward example of word-formation, in which the explanation for the syntactic pattern and its meaning revolves around one word, the verb. If this is so, then individual words play a crucial role in every pattern that has been claimed to require ‘constructions’; and crucially, phrase structure, as such, plays no such role. We shall now work through the various types of ‘construction’ listed above, starting with idioms. The main challenge of idioms is that the idiomatic meaning overrides the expected literal meaning, so we need an analysis that includes the literal meaning as well as the idiomatic one, while also showing that the literal meaning is merely potential. A system that generated only one analysis would miss the point as much as an analysis of a pun which showed only one of the two interpretations. This view is supported by the psycholinguistic evidence that literal word meanings become active during idiom production (Sprenger and others 2006), and that the syntactic analysis of an idiom proceeds as normal (Peterson and others 2001). For example, in processing He kicked the bucket, we activate all the normal syntactic and semantic structures for KICK and BUCKET as well as the meaning ‘die’. A cognitive network is exactly what is needed to explain a multiple analysis like this because it accommodates the effects of spreading activation: the processes by which activation from the word token kicked spreads to the lexeme KICK, and then as activation converges from the and bucket, focuses on the sub-lexeme KICKdie, as found in kick the bucket. The diagram in Figure 10 shows the end-state of this processing, but in principle a series of diagrams complete with activation levels could have shown the stages through which a speaker or hearer passes in reaching this end-state. In words, the word token kicked isa KICKdie, which in turn isa KICK. According to the inheritance tenet (5), kicked inherits the properties of both these nodes, so it inherits from KICKdie the property of requiring the bucket as its object; and thanks to the workings of default inheritance, the lower node’s meaning overrides the default. meaning kick KICK THE c o die KICKdie BUCKET THEdie BUCKETdie s He kicked the bucket. Figure 10: An idiom and its literal counterpart The key notion in this analysis of idioms is ‘sub-lexeme’, one of the distinctive ideas of Word Grammar. Sub-lexemes are common throughout the lexicon, and not just in handling idioms; for example, the lexeme GROW carries important shared properties such as irregular morphology, but its transitive and intransitive uses combine different syntactic patterns with different meanings, so each requires a different sub-lexeme of GROW. This analysis reveals the similarities of morphology as well as the differences of syntax and meaning. The same apparatus seems well suited to the analysis of idioms, which combine special syntax and meaning with a close link to the literal pattern. Unfortunately, Jackendoff thinks otherwise. Another solution would be to say that kick has a second meaning, ‘die’, that only can be used in the context of the bucket. Coincidentally, in just this context, the and bucket must also have second meanings that happen to be null. Then the meaning of the idiom can in fact be introduced with a single word. The difficulty with this solution is its arbitrariness. There is no nontheory-internal reason to concentrate the meaning in just one of the morphemes. (Jackendoff 2008) His objection is strange, given the rather obvious fact that the idiom’s meaning ‘die’ is the literal meaning of a verb, so the verb is the obvious word to receive the idiom’s meaning. Moreover, the analysis that he himself offered ten years earlier did ‘concentrate the meaning in just one of the morphemes’ by linking the meaning ‘die’ directly to the verb kick (Jackendoff 1997:169). The apparatus of sub-lexemes and default inheritance allows us to model different degrees of irregularity in idioms. The classic discussion of idioms (Nunberg and others 1994) distinguished just two types: ‘idiomatic phrases’ such as kick the bucket and ‘idiomatically combining expressions’ such as pull strings, which allow a great deal more syntactic flexibility (e.g. Strings have been pulled; Strings are easy to pull; He pulled a lot of strings.) However, the historical development of idioms argues against such a clear division. Today’s metaphors tend to turn into tomorrow’s idioms, which become increasingly opaque as the original metaphor vanishes from sight. It seems much more likely that there is a continuum of irregularity, with kick the bucket at one end and pull strings near the other end. Whereas kick the bucket overrides the entire meaning of KICK, pull strings builds on a living metaphor, so the idiomatic meaning isa the literal one: if I pull strings for my nephew, this is presented as a deviant example of literally pulling strings. This explains the syntactic freedom. In between these two examples, we find idioms such as cry wolf which derives its meaning from a fairy story which some people know, but whose meaning is related in such a complicated way to the story that it shows virtually no syntactic freedom. Turning next to formulaic language such as It all depends (on) what you mean, usage-based learning guarantees that we store a vast number of specific exemplars, including entire utterances as well as individual word tokens. Every time we hear an example of a stored utterance its stored form becomes more entrenched and more accessible, which encourages us to use it ourselves, thereby providing a feed-back loop which maintains the overall frequency of the pattern in the general pool of utterances. Word Grammar provides a very simple mechanism by which formulaic expressions are stored: the concepts that represent the word tokens do not fade away as most tokens do, but persist and become permanent (Hudson 2007:54). For instance, imagine someone hearing (or saying) our earlier example: Small birds sing. Immediately after the event, that person’s mind includes a structure like the one shown in Figure 9, but the nodes for the word tokens are destined to degenerate and disappear within seconds. That is the normal fate of word tokens, but sometimes the tokens are memorable and are remembered – which means that they are available if the same tokens occur again. This process is logically required by any theory which accounts for the effects of frequency: While the effects of frequency are often not noted until some degree of frequency has accumulated, there is no way for frequency to matter unless even the first occurrence of an item is noted in memory. Otherwise, how would frequency accumulate? (Bybee 2010:18) In short, formulaic language is exactly what we expect to find, in large quantities, in the language network; and it is represented in exactly the same way as the utterances from which it is derived. Meaning-bearing constructions such as the ‘caused-motion construction’ are more challenging precisely because they go beyond normal usage. If anyone actually said Frank sneezed the tissue off the table, they would certainly be aware of breaking new linguistic ground, which is the exact opposite of the situation with idioms and formulaic language. The standard constructional analysis of such cases was presented by Goldberg (Goldberg 1995:152-79), so we can take this analysis as our starting point. Figure 11 shows Goldberg’s diagram (from page 54) for the transitive use of SNEEZE. This diagram is the result of unifying two others: the one in the middle line for ordinary intransitive SNEEZE, and the one for the caused-motion construction (which accounts for the top and botom lines). The letter ‘R’ stands for the relation between these two patterns, which at the start of the middle line is explained as ‘means’, expressing the idea that sneezing is the ‘means’ of the motion (rather than, say, its cause or manner). Sem CAUSE-MOVE R R: means SNEEZE Syn V < cause goal theme > < sneezer SUBJ > OBL OBJ Figure 11: Caused motion in constructional notation The constructional notation is unhelpful in a number of respects, but the most important is its semantic rigidity. The trouble is that it requires a one-one match between words and semantic units, so one verb can only express one semantic unit, whose arguments are expressed by the verb’s various dependents. This is a problem because the sentence Frank sneezed the tissue off the table actually describes two separate events: Frank sneezing, and the tissue moving off the table. Frank is the ‘sneezer’, but only in relation to the sneezing; and the tissue is the theme of the moving, but not of the sneezing. Similarly, if Frank (rather than the sneeze) is a cause, it is in relation to the moving, and not the sneezing; and off the table describes the direction of the moving, and not of the sneezing. Collapsing these two events into a single semantic structure is at best confusing, and arguably simply wrong. In contrast, the network notation of Word Grammar in Figure 12 provides whatever flexibility is needed. The analysis keeps as close as possible to Goldberg’s, and the example is simplified, in order to focus on the benefits of flexibility in the semantic structure. The dotted lines link the words to their meanings, and as usual the little triangles show ‘isa’ relations. The main benefit of this network notation is the possibility of separating the node labelled ‘Frank sneeze it off’ from the one labelled ‘it off’. The former is the meaning of the verb token sneezed, which, as usual, shows the effects of all the dependents that modify the verb. The latter is a single semantic entity which is contributed jointly by the object and directional adjunct (Goldberg’s ‘oblique’). In itself it isa motion, with a theme and a direction, but in relation to the sneezing, it is the ‘result’ of the verb’s action. Notice how the two-event analysis avoids all the problems noted above; so Frank is the sneezer, but plays no role at all in the movement, and contrariwise ‘it’ and ‘off’ define the movement but have nothing directly to do with the sneezing. sneezer Frank source result theme Frank sneeze it off it off sneeze caused-move it off it off. a s Frank o sneezed SNEEZE cause-move Figure 12: Caused motion in Word Grammar notation The last kind of ‘construction’ is represented here by How about a cup of tea? Such examples have always been central to Word Grammar (Hudson 1990:5-6), where I call them ‘non-canonical’, but they also play an important part in the literature of constructions where the classic discussion calls What’s X doing Y a ‘non-core’ construction because the normal ‘core’ rules are suspended (Kay and Fillmore 1999). To change examples, the ‘core’ rules require a sentence to have a finite verb; but there are exceptions without finite verbs such as the how about pattern in How about a cup of tea?. Once again, Word Grammar provides the necessary amount of flexibility, thanks to the focus on word-word dependencies and the possibility of ‘sub-lexemes’ mentioned above. Indeed, it seems that these ‘constructions’ all consist of a continuous chain of dependencies, a claim which cannot even be expressed in terms of phrase structure, so dependency structure is more appropriate than phrase structure (Holmes and Hudson 2005). The diagram in Figure 13 shows how the sub-lexemes HOWx and ABOUTx relate to each other and to their super-lexemes, and hints at a semantic analysis paired with the syntactic one. (Note how the notation allows us to ignore the vertical dimension when this is convenient; in this case, the isa triangles face upwards, reversing the normal direction.) This semantic sketch will be developed in the next section as an example of what is possible in a network-based analysis of meaning. ‘how about x?’ x c c HOWx ABOUTx HOW ABOUT x Figure 13: How about x? Dependency syntax This section has shown how easily Word Grammar accommodates the idiosyncratic patterns that lie at the heart of what I have called ‘constructional’ analyses: idioms such as kick the bucket and pull strings, clichés or formulaic language such as It all depends what you mean, meaning-bearing constructions such as the caused-motion construction, and non-canonical constructions such as how about X?. The key pieces of theoretical apparatus are sub-lexemes, default inheritance, token-based learning, flexible networks with classified relations and, of course, dependency structure in syntax. But how does this argument leave the notion of ‘construction’? It all depends what you mean by ‘construction’ (and ‘construction grammar’), and as we have already seen, this varies a great deal from theory to theory, and from person to person. If ‘construction grammar’ is simply the same as ‘cognitive linguistics’, then nothing changes. Since Word Grammar is definitely part of cognitive linguistics, it must also be an example of ‘construction grammar’ (though it is hard to see what is gained by this double naming). Similar conclusions follow if ‘construction grammar’ is a grammatical theory that rejects the distinction between ‘grammar’ and ‘lexicon’, and recognises very specific grammatical patterns alongside the very general ones. Here too Word Grammar is an ordinary example of construction grammar (Gisborne 2008). However, the debate becomes more interesting if we give ‘construction’ a more precise meaning, and define ‘construction grammar’ as a grammar that recognises nothing but constructions. For many authors, a construction is by definition a pairing of a ‘form’ (some kind of formal pattern, whether phonological, morphological or syntactic) with a meaning: ‘The crucial idea behind the construction is that it is a direct form-meaning pairing that has sequential structure and may include positions that are fixed as well as positions that are open.’ (Bybee 2010:9). This definition implies a very strong claim indeed: that every pattern that can be recognised at the levels of syntax or morphology can be paired with a single meaning. We have already seen (in section 3) that this claim is untenable if morphology and syntax are recognised as distinct levels, because (in this view) morphological structures are typically not directly linked to meaning. Since Word Grammar does recognise morphology and syntax, the grammatical patterns that it recognises cannot be described as ‘constructions’. And yet Word Grammar can accommodate all the idiosyncratic patterns that are often quoted as evidence for constructions. In this sense, then, Word Grammar is a radical departure from ‘construction grammar’, as radical as Croft’s Radical Construction Grammar, which departs in exactly the opposite direction. For Word Grammar, the basic units of syntax are words and the dependency relations between them. In contrast, Croft believes that the basic units are meaning-bearing constructions: Radical Construction Grammar ... proposes that constructions are the basic or primitive elements of syntactic representation and defines categories in terms of the constructions they occur in. For example, the elements of the Intransitive construction are defined as Intransitive Subject and Intransitive Verb, and the categories are defined as those words or phrases that occur in the relevant role in the Intransitive construction. (Croft and Cruse 2004:284, repeated as Croft 2007: 496). Croft’s example of a non-construction that would not be recognised is ‘verb’. Furthermore, the term ‘Intransitive Subject’ is presumably merely shorthand for something more abstract such as ‘the noun that expresses the actor’, or less abstract, such as ‘the noun before the verb’, because ‘there are no syntactic relations in Radical Construction Grammar’ because such relations are redundant if morphosyntactic clues are related directly to semantic relations (ibid:497). As I commented earlier, the claims of Radical Construction Grammar are not derived from the Cognitive Assumption; indeed, they conflict directly with some of the tenets, including the principle for learning that Croft himself expressed so well. In discussing the choice between fully general analyses without redundancy and fully redundant listing of specific examples of general patterns, he concludes: ‘grammatical and semantic generality is not a priori evidence for excluding the more specific models’ (Croft 1998). This is generally accepted as one of the characteristics of usage-based learning, so we can assume that any construction is stored not only as a single general pattern, but also as a collection of individual exemplars that illustrate the pattern. Indeed, the learning tenet requires the exemplars to be learned before the general pattern, because these are the material from which the general pattern is learned. But if exemplars can be mentally represented separately from the general construction, and if they are even represented before the construction, how can the construction be more basic? In short, the basic tenets of the Cognitive Assumption support the more traditional approach which Croft criticizes as ‘reductionist’, in which more abstract and general patterns are built out of more concrete and specific patterns. 6. Semantic/encyclopedic structures If the Cognitive Assumption (1) is right, it follows that there can be no boundary between ‘linguistic meaning’ and general conceptual structure, and therefore no boundary between ‘dictionary’ meaning and ‘encyclopedic information’. The typical meaning of a word or a sentence is simply the part of general conceptual structure that is activated in the mind of the speaker and hearer. This view of meaning is one of the tenets of cognitive linguistics (including Word Grammar) in contrast with the more ‘classical’ or ‘objectivist’ approaches to semantics that have dominated linguistic semantics. Cognitive linguistics cannot match the massive apparatus of formal logic that these approaches bring to bear on the analysis of meaning, but once again the Cognitive Assumption may be able to guide us towards somewhat more formal analyses than have been possible so far. The most relevant consequence of the Cognitive Assumption is the recycling tenet (4), the idea that each new concept is defined in terms of existing concepts. This immediately rules out any boundary between ‘dictionary’ and ‘encyclopedia’, because any dictionary entry is bound to recycle an encyclopedic entry. For example, take a child learning the word CAT and its meaning: the child stores the word and looks for a potential meaning in general memory, where the (encyclopedic) concept ‘cat’ is the obvious candidate. Recycling guarantees that this concept is the one that the child uses as the meaning of CAT. Recycling also rules out a popular approach to lexical semantics in which lexical meanings are defined in terms of a pre-existing metalanguage such as the ‘Natural Semantic Metalanguage’ suggested by Wierzbicka (Wierzbicka 1996). The argument goes like this (Hudson and Holmes 2000): Once a concept has been created, it is available as a property of other concepts, and should be recycled in this way whenever it is relevant. But new concepts cannot be used in this way if the only elements permitted in a definition are drawn from the elementary semantic metalanguage. To take a concrete example, consider Wierzbicka’s definition of a bicycle (Wierzbicka 1985:112), which refers to the pedals in at least three places: as a part of the structure, as the source of power, and as the place for the rider’s feet. The problem is that ‘pedal’ is not part of the metalanguage, so a circumlocution (‘parts for the feet’) has to be used, obscuring the fact that each reference is to the same object. In contrast, the recycling tenet requires us to recognise the concept ‘pedal’ and to name this concept whenever it is relevant; but this of course is totally incompatible with any attempt to define every concept soleley in terms of a fixed list of primitives. Another tenet highly relevant to semantic structure is the network tenet (3), which requires every scrap of information to be expressed in terms of network structures. This means that network notation has to be available for every analysis that can be expressed in other notations such as the predicate calculus. Take, for example, the universal and existential quantifiers which distinguish semantically between sentences such as the following: (14) Everyone left. ∀x, person(x) → left(x) (‘For every x, if x is a person then x left’) (15) Someone left. ∃x, person(x), left(x) (‘There is an x such that x is a person and x left’) The sentences undeniably have different meanings, and the linear notation of formal semantics distinguishes them successfully, but the challenge is to translate the linear notation into a network. Thanks to the inheritance tenet (5), the solution is surprisingly easy (Hudson 2007:33-4). Universal quantification is simply inheritance, because any instance of a category automatically inherits all of that category’s properties (unless of course they are overridden). Consequently, we can represent the meaning of Everyone left as shown in the first diagram of Figure 14. According to this diagram, the typical person left, so one of the properties to be inherited by any example of ‘person’ is that they left. In contrast, the diagram for Someone left shows that some particular person (represented by the dot) left, so leaving is not a property of ‘person’ and therefore cannot be inherited by other people. leaver leaver left person • s s Everyone left person left. Someone left. Figure 14: Everyone left and someone left. This simple and natural analysis of universal and existential quantification shows the benefit of starting from the Cognitive Assumption; and of course this assumption also leads to an analysis which is cognitively more plausible than traditional logic because default inheritance allows exceptions. As in ordinary conversation, the generalisation that everyone left is not, in fact, overturned completely if it turns out that a few people did not leave; exceptions are to be expected in human reasoning. Word Grammar offers structural analyses for many other areas of semantics (Gisborne 2010, Hudson 1984:131-210; Hudson 1990:123-166; Hudson 2007:211248; Hudson 2010:220-241), all of which are informed by the Cognitive Assumption. The example of universal and existential quantification illustrates a general characteristic of these analyses: patterns that other theories treat as special to semantics turn out to be particular cases of much more general cognitive patterns that are therefore found in other areas of language structure. We shall consider another example in the discussion of word order (section 7), where I argue that word order rules build on two relations also found in semantics: the landmark relation expressed by spatio-temporal prepositions, and the temporal relations expressed by tense. This sharing of patterns across linguistic levels is exactly as expected given the Cognitive Assumption, but it is rarely discussed in other theories. This article is not the place to summarise all the possibilities of Word Grammar semantics, but it may be helpful to illustrate them through the analysis of one concrete example. In the previous section I gave a syntactic analysis of how about x? in Figure 13 as an example of a non-canonical construction, with a promise of a fuller semantic analysis which I can now redeem. The meaning of the syntactic pattern how about x? is given as a node labelled ‘how about x’, an analysis which does at least show that ‘x’ is defined by the complement of about, though it leaves the rest of the meaning completely unanalyzed and undefined. But how might we define a notion like this? Constructional analyses generally leave semantic elements without definition; for example, Kay and Fillmore’s analysis of the meaning of the WXDY construction recognises an element called ‘incongruity-judgment’ made by a pragmatically-identified judge called ‘prag’ about the entity defined by the ‘x’ word. That is as far as the semantic analysis goes. But according to the network tenet (3), concepts are defined by their links to other concepts, so any semantic element can be defined by links to other concepts. The first challenge in the analysis of how about X? is its illocutionary force. If I say How about a cup of tea?, I am asking a yes-no question, just as in Is it raining? The only oddity is that this yes-no question is introduced by a wh-word, how (or what). Even if the construction originated in a wh-question such as What do you think about ..., the meaning has now changed as much as the syntax (just as it has in how come ....?). How, then, can a semantic network indicate illocutionary force? This is a rather fundamental question for any model, but especially for a usage-based model in which all the contextual features of utterances are part of the total analysis; but cognitive linguistics has so far produced few answers. In contrast, Word Grammar has always had some suggestions for the structural analysis of illocutionary force. The earliest idea was that it might be defined in terms of how ‘knowledge’ was distributed among participants (Hudson 1984:186-197), and knowledge is clearly part of the analysis. However, I now believe that the recycling principle (4) points to a simpler first step: linking to the notions ‘ask’ and ‘tell’, which are already needed as the meanings of the lexemes ASK and TELL. This is just like the ‘performative hypothesis’ of Generative Semantics (Ross 1970) except that the ‘performative’ structure is firmly in the semantics rather than in syntax (however ‘deep’). And as in the performative analysis, the speaker is the asker or teller, the addressee is the ‘askee’ or the ‘tellee’, and the content of the sentence is what we can call the ‘theme’ – the information transferred from one person to the other. For most theories, this analysis would be very hard to integrate into a linguistic structure because of the deictic semantics involved in ‘speaker’ and ‘addressee’ which link a word token to a person, thereby bridging the normal gulf between ‘language’ and ‘non-language’; but for Word Grammar there is no problem because the Cognitive Assumption rejects any boundary between language and non-language. The analysis of a word token is a rich package which includes a wide range of deictic information specific to the token – its speaker, its addressee, its time and place, and its purpose: what the speaker was trying to achieve by uttering it. In the case of the sentence-root, which carries the meaning of the entire sentence, its purpose may be to give information or to request it – in other words, the token’s purpose is its illocutionary force (Gisborne 2010). This treatment of illocutionary force is applied to How about X? in Figure 15, which is based in turn on Figure 13 above. The one relation which is not labelled in this diagram is that between ‘how about x?’ and ‘x’. We might be tempted to label it ‘theme’, but we should resist this temptation because x is not in fact the thing requested; for example, How about John? is not a request for John. I develop this point below. asking asker y askee ‘how about x?’ z x addressee purpose speaker meaning meaning c HOWx c ABOUTx x Figure 15: The meaning of How about x? – illocutionary force The next challenge, therefore, is to decide what the ‘theme’ of the question is. If, for example, How about John? is not a request for John, what does it want as an answer? Clearly, it is a request for either ‘yes’ or ‘no’, information about the truth of some proposition which we can call simply ‘p’; but what is ‘p’? This varies with the situation as illustrated by the following scenarios: (16) We need someone strict as examiner for that thesis, so how about John? (17) You say you don’t know any linguists, but how about John? (18) If you think Mary is crazy, how about John? Similar variation applies to How about a cup of tea?, but this is so entrenched and conventional that it needs no linguistic context. In every case, then, the x of How about x? is suggested as a possible answer to a currently relevant question of identity – the identity of a possible examiner in (16), of a linguist in (17) and of someone crazy in (18). What is needed in the structural analysis of How about x?, therefore, is an extra sub-network representing this identity-question, combined with a representation of x as a possible answer and the truth value of the answer. This supplementary network has two parts: one part which relates p to the ‘theme’ of the query, the choice between true and false, and another part which relates p to x. Starting with the choice, this involves the Word Grammar treatment of truth in terms of the primitive relation ‘quantity’, whose values range over numbers such as 0 and 1 (Hudson 2007:224-8). A node’s quantity indicates how many examples of it are to be expected in experience; so 1 indicates precisely one, and 0 none. This contrast applies to nouns as expected; so a book has a referent with quantity 1, while the referent of no book has quantity 0; but it also applies to finite verbs, where it can be interpreted in terms of truth. For example, the verb snowed in It snowed refers to a situation with quantity 1, meaning that there was indeed a situation where it snowed; whereas the root-word in It did not snow has a referent with quantity 0, meaning that no such situation existed. Seen in this light, a yes/no question presents a choice between 1 (true) and 0 (false) and asks the addressee to choose one of them. The ‘quantity’ relation is labelled ‘#’ in diagrams, so Figure 16 shows that the proposition ‘p’ has three quantities: ?, 1 and 0. {1, 0, ?} e member ? theme y z r • # # # asking asker 0 1 q a p b askee ‘how about x?’ x addressee purpose meaning speaker meaning c HOWx c ABOUTx x Figure 16: The meaning of How about x? - content The mechanics of choice in Word Grammar are somewhat complicated because they involve two further primitive relations called ‘or’ and ‘binding’; these have a special status alongside ‘isa’ and a small handful of other relations. A choice is defined by a set that includes the alternatives in the mutually exclusive ‘or’ relation, and a variable labelled ‘?’ which is simply a member (Hudson 2010:44-47). When applied to the choice between 1 and 0, we recognise a set {1, 0, ?} which contains 1 and 0 as its mutually-exclusive ‘or’ members as well as an ordinary member called ‘?’ which can bind to either of them. The ‘or’ relation is shown by an arrow with a diamond at its base while binding is represented by a double arrow; so the subnetwork at the top of Figure 16 shows that ‘?’, the theme of ‘how about x’, is either 1 or 0. We now have an analysis which shows that How about x means ‘I am asking you whether p is true’, where p has some relation to x. The remaining challenge is to explain how p relates to x. It will be recalled that p is a proposition (which may be true or false), but of course everything in this network must be a concept (because that is all we find in conceptual networks), so propositions must be a particular kind of concept. In this case, the proposition is the ‘state of affairs’ (Pollard and Sag 1994:19) in which two arguments, labelled simply ‘a’ and ‘b’, are identical: the proposition ‘a = b’. For instance, in (16) the proposition p is ‘the examiner we need = John’. The identity is once again shown by the primitive binding relation introduced above, which is shown in Figure 16 as binding a concept ‘q’ to x. Of these two concepts, we already know x as the referent of word x; for example, in How about John?, x is John. The other concept, q, is more challenging because it is the variable concept, the hypothetical examiner, linguist or crazy person in (16), (17) and (18). What these concepts have in common is that they have some currently active relationship to some other currently active entity. The entity and relationship may be explicit, as in (16) (examiner of that thesis), but How about a cup of tea? shows that they need not be. The analysis in Figure 16 shows the connection to currently active structures by means of the binding procedure (which triggers a search for the currently most active relevant node); so node e at the top of the diagram needs to be bound to some active entity node, illustrating how the permanent network can direct processes that are often considered to be merely ‘pragmatic’. Most importantly, however, the same process applies to the relationship node labelled ‘r’, binding it to an active relationship; so relationships and entities have similar status in the network and are subject to similar mental operations. This similarity of relationships and entities is exactly as required by the relation tenet (6), which recognises relationships (other than primitives) as a particular kind of concept; and it is built into the formal apparatus of Word Grammar. I am not aware of any other theory that treats relationships in this way. This completes the semantic analysis of What about x?, showing that it means something like: ‘I am asking you whether it is true that x is relevantly related to the relevant entity’, where relevance is defined in terms of activation levels. The main point, of course, is not the correctness of this particular analysis, but the formal apparatus that is needed, and that Word Grammar provides. The main facilities were binding, relational concepts, quantities, mutually exclusive or-relations and network notation (with the possibility of adding activation levels). Exactly as we might expect given the Cognitive Assumption, none of these facilities is unique to semantics. 7. Order of words, morphs etc. In this section we return to the ‘formal’ levels of syntax, morphology and phonology, but we shall find that they share some of the formal structures found in semantics. The question is how our minds handle the ordering of elements – word order in syntax, morph order in morphology and phone order in phonology. Once again the choice of notation is crucial. The standard notation of ‘plain vanilla’ syntax (or morphology or phonology) builds strongly on the conventions of our writing system, in which the left-right dimension represents time; so it is easy to think that a diagram such as Figure 2, the box-notation constructional analysis of Cats miaow, already shows word order adequately. But the network tenet requires every analysis to be translated into network notation, and the fact is that a network has no ‘before’ or ‘after’, or left-right dimension. In a network, temporal ordering is one relationship among many and has to be integrated with the whole. But of course the ordering of elements presents many challenges for linguistic theory beyond questions of notation, so what we need is an analysis which will throw some light on these theoretical questions. Another issue that arises in all questions of ordering is whether the ordering is spatial or temporal. Once again, given our experience of left-right ordering in writing we all tend to think in spatial terms, and indeed linguists often talk about ‘leftward movement’ or even ‘fronting’ (assuming that the ‘front’ of a sentence is its leftmost part). But given the logical primacy of spoken language, this spatial orientation is actually misleading, so we need to think of words and sounds as events in time; and that means that ordering is temporal, rather than spatial. Admittedly, this choice is blurred by the tendency for temporal relations to be described metaphorically as though they were spatial; but for consistency and simplicity the terminology will be temporal from now on. Temporal ordering of behaviour, including language, requires analysis at two levels of abstraction. On the one hand we have the ‘surface’ ordering which simply records the order of elements in a chain. This is the only kind of ordering there is if the chain is arbitrarily ordered, as in a telephone number (or a psychological memory experiment): so in a series such as ‘3231’, all we can say is that a 3 is followed by a 2, which is followed by another 3, which is followed by a 1. The same is true in any ordered series such as the digits, the alphabet or the days of the week. The only relation needed to record this relation is ‘next’ (which, to judge by the difficulty of running a series backwards, seems to spread activation in only one direction, so it may be a primitive relation). In network notation we might show this relation as a straight but dotted arrow pointing towards the next element, as in Figure 17. Notice in this diagram how the default ‘next’ relations are overridden by the observed ones, and how much easier it would have been to remember the series in the default order: 1 2 3 4. number 1 3 2 2 3 3 1 Figure 17: The 'next' relation in a series of numbers Like any other organised behaviour, then, a string of words has a surface ordering which can be described simply in terms of the ‘next’ relation; and for some purposes, the ‘next’ relation is highly relevant to language processes. This is most obviously true in phonology, where adjacency is paramount as when a morph’s final consonant assimilates to the first consonant of the ‘next’ morph. However, cognitive science also recognises a ‘deep’ ordering behind the surface order. The point is that behaviour follows the general patterns which have sometimes been analyzed in terms of ‘scripts’ for scenarios such as birthday parties or cleaning ones teeth (Cienki 2007). Whereas the ‘next’ relation records what actually happens, these deeper analyses explain why events happened in this order rather than in other possible permutations. For example, tooth-cleaning is organised round the brushing, so picking up the brush and putting paste on it precedes the brushing, and rinsing and putting away the brush follow it. A general feature of organised behaviour seems to be its hierarchical structure, with problems (e.g. brushing ones teeth) generating subproblems (e.g. picking up the brush and putting it down again); and this structure produces a typical ordering of events which can be described as a series linked by ‘next’. But notice that the deep ordering of problems and sub-problems is more abstract than the ‘next’ relation into which they will eventually be translated; for example, although picking up the tooth brush precedes brushing, we cannot say that brushing is the ‘next’ of picking up the brush because they may be separated by other events such as applying tooth-paste. In short, deep ordering mediates between abstract hierarchical relations and surface ordering. Returning to language, we find the Cognitive Assumption once again pushing us towards a particular view of language structure, in which deep ordering mediates between abstract hierarchical relations and surface ordering. This deep ordering is especially important in syntax, unsurprisingly perhaps because the ordering of words is the greatest leap in the chain of levels linking a completely unordered network of meanings to a completely ordered chain of sounds. For instance, in Cats miaow, the deep ordering mediates between the relation ‘subject’ and the surface relation ‘next’: cats is the subject of miaow. a typical verb follows its subject. therefore, miaow is the ‘next’ of cats. What we need, then, is a suitable relation for describing deep ordering. Cognitive Grammar has already defined the relation that we need: the relation between a ‘trajector’ and its ‘landmark’, as in ‘the book (trajector) on the table (landmark) (Langacker 2007:436). Langacker recognises that this relation applies to temporal relations, giving the example of a trajector event occurring either ‘before’ or ‘after’ its landmark event. This kind of analysis is normally reserved for semantics, where it plays an important part in defining the meaning of prepositions (such as before and after) and tenses, where a past-tense verb refers to an event whose time is before the verb’s time of utterance. Word Grammar, however, extends trajector/landmark analysis right into the heart of grammar, as the basis for all deep ordering. For example, in the day after Christmas, the word after takes its position from the word day, on which it depends; so the relation between the word day (landmark) and after (trajector) is an example of the same relation as that between Christmas (landmark) and the day after (trajector), because in both cases the trajector (the day after or the word after) follows its landmark (Christmas or the word day). One of the salient qualities of the trajector/landmark relation is that it is asymmetrical, reflecting the inequality of the ‘figure’ (trajector) and its ‘ground’ (landmark). For example, describing a book as ‘on the table’ treats the table as in some sense more easily identified than the book – as indeed it would be, given that tables are usually much bigger than books; it would be possible to describe a table as ‘under the book’, but generally this would be unhelpful. The same is true for events such as the parts of a tooth-cleaning routine. The main event is the brushing, and the associated sub-events are subordinate so we naturally think of them taking their position from the brushing, rather than the other way round. And of course the same is even more obviously true in syntax, especially given the dependency view advocated in section 4 in which the relations between words are inherently asymmetrical. Once again, then, we find general cognitive principles supporting the Word Grammar version of syntax as based on word-word dependencies, and not on phrase structure. It could of course be objected that another kind of asymmetrical relation is meronymy, the relation between a whole and its parts; and at least in syntax this may help to explain why the words in a phrase tend to stay together (Rosta 1994) – though we shall see below that even this tendency can largely be explained without invoking phrases. But meronymy cannot in itself be the abstract relation which determines deep ordering because a whole has the same relation to all its parts, so the part:whole relation, in itself, cannot determine how the parts are ordered among themselves. The only relation which can determine this is a relation between the parts, such as wordword dependencies. Let us suppose, then, that word-word dependencies determine deep ordering – in other words, that a word acts as landmark for all its dependents. Precisely how does this determine the ordering of words? Saying that cats has miaow as its landmark does not in itself guarantee any particular ordering of these two words, so how can we combine the trajector/landmark relation with ordering rules? At this point we should pay attention to the relation tenet (6), which allows relations to be classified and (therefore) sub-classified. Given the more general relation ‘landmark’, any particular specification of the relation gives a sub-relation linked by ‘isa’ to ‘landmark’; so in this case we can recognise just two sub-relations: ‘before’ and ‘after’, each of which isa ‘landmark’. In this terminology, if X is the landmark of Y, and more precisely X is the ‘before’ of Y, then Y is before X (or, perhaps more helpfully, ‘X puts Y before it’). In this terminology, miaow is the before of cats, meaning that cats is before miaow. This analysis is shown in Figure 18, where miaow is the ‘next’ of cats because it is the ‘before’ of cats because cats is its subject. dependent landmark before subj Cats miaow. Figure 18: Cats miaow, with deep and surface ordering One of the challenges of analysing general behaviour is that events can have complicated relations to one another. Consider your tooth-cleaning routine once again. Suppose you are a child being supervised by a parent, who wants to see you using the paste. In this case you have two goals: to be seen using paste as well as to actually brush them. You pick up the paste and then the brush – the reverse of your normal order – so when you are ready to put paste on the brush, it is already in your hand. Notice how it is the dominant goal – being seen – that determines the ordering of events, and that overrides the normal ordering. This simple example allows us to predict similar complexities within language, and especially in syntax – which, of course, we find in spade-fulls. For example, Figure 8 above gives the structure for sentence (19): (19) Which birds do you think sing best? Several words have multiple dependencies, in the sense that they depend on more than one other word; for example, which is the ‘extractee’ of do (from which it takes its position), but it is also the subject of sing, and you is the subject not only of do but also of think. (The evidence for these claims can be found in any introduction to syntax.) This is like the multiple dependencies between picking up the paste and the two goals of being seen and of applying paste to the brush; and as in the toothcleaning example, it is the dominant (‘highest’) dependency that determines the surface order; so which takes its position from do, and so does you, because do is the sentence-root – the ‘highest’ word in all the chains of dependencies. The conclusion, then, is that a word may depend on more than one other word, and that some of these dependencies may determine its position, while others are irrelevant to word order. In Word Grammar notation the two kinds of dependency are distinguished by drawing the dependency arcs that are relevant to word order above the word-tokens, leaving the remainder to be drawn below the words; the former, but not the latter, are paired with trajector/landmark relations which can therefore be left implicit. (This convention can be seen in Figure 8.) This distinction is exactly as we might expect given the logic of default inheritance. By default, a word takes as its landmark the word on which it depends; but if it depends on more than one word, it takes the ‘highest’ in the dependency chain – in other words, ‘raising’ is permitted, but ‘lowering’ is not. But even this generalisation has exceptions, in which a word has two dependents but takes the ‘lower’ one as its landmark. One such exception is the German pattern illustrated by (20) (Hudson 2007:144)/ (20) Eine Concorde gelandet ist hier nie. A Concorde landed is here never. ‘A Concorde has never landed here.’ No doubt there are many other exceptions. The main point of this discussion is that general cognitive principles support Word Grammar in its claim that word order can, and should, be handled as a separate relation, rather than simply left implicit in the left-right notation of plain-vanilla syntax. However, word order raises another fundamental issue where Word Grammar is different from other versions of cognitive linguistics. Why do the words of a phrase tend so strongly to stay together? If syntactic structure consists of nothing but dependencies between individual words (as I argued above in section 4), what is the ‘glue’ that holds the words of a phrase together? For example, why is example (21) ungrammatical? (21) *She red likes wine. In dependency terms, this structure should be permitted because all the local wordword dependencies are satisfied, as can be seen in Figure 19. In particular, red should be able to modify wine because the order of these two words is correct (just as in red wine), and similarly she should be able to act as subject of likes, just as in She likes red wine. And yet the sentence is totally ungrammatical. *She red likes wine. Figure 19: *She red likes wine and dependencies The notation allows an obvious explanation in terms of crossing lines – and indeed, this is very easy to explain when teaching syntactic analysis – but why should this be relevant? After all, if a network has no left-right dimension then the intersection of lines is just an artefact of our writing-based notation. Moreover, intersecting arrows don’t matter at all when (following the Word Grammar convention explained above) they are written below the words, so why should they matter above the words? Phrase structure explanations face similar objections when confronted with the need to be translated into network notation: if word order is shown, network-fashion, by a relation such as ‘next’, it is meaningless to ban motherdaughter lines that intersect. The explanation that we need, therefore, must not depend on any particular notation for linear order. The Word Grammar explanation is based on the ‘the Best Landmark principle’ (Hudson 2010:53), which is a principle for general cognition, and not just for syntax. When we want to define the location of something, we have to choose some other object as its landmark, and we always have a wide range of possibilities. For instance, imagine a scene containing a church, a tree and a bench. As a landmark for the bench, the church would be better than the tree because it is more prominent – larger, more permanent, more important for the local community and, because of that, more accessible in most people’s minds. But the preferred landmark also depends on distance, because the closer things are, the more likely it is that their relationship is distinctive, and has already been recorded mentally; so if the bench was next to the tree but a long way from the church, the tree would make a better landmark. The best landmark, therefore, is the one that offers the best combination of prominence and nearness. These considerations may in fact reduce to a single issue, identifiability: a prominent landmark is easy to identify in itself, while nearness make the landmark’s relation easy to identify. By hypothesis, this principle applies to memory by guiding our choice of landmarks when recording properties; so in remembering the bench in the scene just described, one of its properties would be its physical relationship to either the church or the tree, depending on which qualified as the best landmark. The principle also applies to communication, so when we describe the bench to someone else we only call it the bench by the tree if the tree is the best landmark; and, when we hear this phrase, we assume that the tree is the best landmark (at least in the speaker’s mind). Returning to syntax, the Best Landmark principle also explains why the words in a phrase hang together. As hearers, we always assume that the speaker has chosen the best landmark for each word, and as speakers, we try to satisfy this expectation. In syntax, prominence can easily be defined in terms of dependency, so the most prominent word for a word W is always the word on which W depends, its ‘parent’. Distance can be defined in terms of the number of intervening words, so W’s parent will always be separated from W by as few words as possible, given the needs of other words to be near to their respective landmarks. Consequently, any word W should always be as close as possible to its parent, and the only words which can separate the two words are other words which depend (directly or indirectly) on W’s parent. Now consider the example *She red likes wine. The troublesome word is red, which depends on wine, so wine should be its landmark; but it is separated from wine by likes, which does not need to be there. This order is misleading for a hearer, because it implies that likes must be the best landmark for red; so a speaker would avoid using it. In short, the ungrammaticality of *She red likes wine has just the same explanation as the infelicity of She sat on the bench by the church, if the bench is in fact much closer to the tree. 8. Conclusion The most general conclusion of this paper is that the Cognitive Assumption has important consequences for language structure. This is hardly surprising considering the history of our ideas about language structure. Our present ideas rest on four thousand years of thinking about language structure, very little of which was driven by a desire to explore mental structures, or was informed by reliable information about general cognition. Other influences were much more powerful, and not least the influence of writing. The teaching of literacy has always been the driving force behind a lot of thinking about language structure, so it is hardly surprising that the technology of writing has profoundly influenced our thinking. This influence is one of the themes of this paper, emerging in at least two areas: the difficulty of conceptually separating types and tokens, and the temptation to treat word order in terms of the left-right conventions of writing. Another major source of influence was the rejection of Latinbased grammar which encouraged many American structuralists to look for simplified models in analysing ‘exotic’ languages; the effect of this was the ‘plain-vanilla syntax’ of the American structuralists. In some ways these two influences pull in opposite directions, but their common feature is that neither of them is concerned at all with cognition. Similarly, standardisation, another driving force behind the study of language, encourages us to think of language ‘out there’ in the community rather than in any individual’s mind; and the publisher’s distinction between grammars and dictionaries suggests a similar distinction in language structure. These non-cognitive pressures have had a predictable impact on widely accepted views of ‘how language works’, but this is the tradition on which we all build (and within which most of us grew up). And as with any cultural change, it is all too easy to include unwarranted old beliefs in the new order. To simplify, we have seen two different ‘cognitive’ movements in the last few decades. In 1965, Chomsky launched the idea that a grammar could model competence, while also declaring language unique, which in effect made everything we know about general cognition irrelevant to the study of language. As a result, Chomsky’s claims about the nature of language structure derived more from the American Structuralists and mathematics than from psychology, so early transformational grammar can be seen as a continuation of Harris’s theory of syntax combined with some cognitive aims. Similarly, Cognitive Grammar developed out of generative semantics, and construction grammar out of a number of current approaches (including HPSG). Unsurprisingly, perhaps, the view of language structure which these theories offer is rather similar to the traditions out of which they grew. No doubt the same can be said about Word Grammar, but in this case the history is different, and the resulting mix of assumptions is different. Moreover, the theory has been able to change a great deal since its birth in the early 1980s, and at every point cognitive considerations have been paramount, so it is not merely a happy coincidence that Word Grammar tends to be compatible with the assumptions of general cognitive science. The following list includes the main claims about language structure discussed in the paper, all of which appear to be well supported by what we know about general cognition: Morphology and syntax are distinct levels, so language cannot consist of nothing but ‘symbols’ or ‘constructions’. Syntactic structure is a network of (concepts for) words, not a tree of phrases. This network provides the flexibility needed for analysing all the various kinds of ‘construction’. Semantic structure is also a network, and allows detailed analyses of both compositional and lexical meaning. The order of elements in syntax or morphology involves just the same cognitive mechanisms as we use in thinking about how things or events are related in place or time, notably the ‘landmark’ relation and a primitive ‘next’ relation. References Aronoff, Mark 1994. Morphology by Itself. Stems and inflectional classes. Cambridge, MA: MIT Press Barabasi, Albert L. 2009. 'Scale-Free Networks: A Decade and Beyond', Science 325: 412-413. Bloomfield, Leonard 1933. Language. New York: Holt, Rinehart and Winston Bolte, J. and Coenen, E. 2002. 'Is phonological information mapped onto semantic information in a one-to-one manner?', Brain and Language 81: 384-397. Bybee, Joan 2010. Language, Usage and Cognition. Cambridge: Cambridge University Press Bybee, Joan and Slobin, Dan 1982. 'Rules and schemas in the development and use of the English past tense.', Language 58: 265-289. Casad, Eugene and Langacker, Ronald 1985. ''Inside' and 'outside' in Cora grammar', International Journal of American Linguistics 51: 247-281. Chomsky, Noam 2011. 'Language and Other Cognitive Systems. What Is Special About Language?', Language Learning and Development 7: 263-278. Cienki, Alan 2007. 'Frames, idealized cognitive models, and domains', in Dirk Geeraerts & Hubert Cuyckens (eds.) The Oxford Handbook of Cognitive Linguistics. Oxford: Oxford University Press, pp.170-187. Croft, William 1998. 'Linguistic evidence and mental representations', Cognitive Linguistics 9: 151-173. Croft, William 2007. 'Construction grammar', in Dirk Geeraerts & Hubert Cuyckens (eds.) The Oxford Handbook of Cognitive Linguistics. Oxford: Oxford Univesity Press, pp.463-508. Croft, William and Cruse, Alan 2004. Cognitive Linguistics. Cambridge University Press Evans, Vyvyan and Green, Melanie 2006. Cognitive Linguistics. An introduction. Edinburgh: Edinburgh University Press Ferreira, Fernanda 2005. 'Psycholinguistics, formal grammars, and cognitive science', The Linguistic Review 22: 365-380. Fillmore, Charles 1982. 'Frame semantics.', in Anon (ed.) Linguistics in the Morning Calm. Seoul: Hanshin; Linguistic Society of Korea, pp.111-138. Fillmore, Charles and Atkins, Sue 1992. 'Towards a frame-based lexicon: the semantics of RISK and its neighbours.', in Adrienne Lehrer & Eva Kittay (eds.) Frames, Fields and Contrasts. New essays in semantic and lexical organisation. Hillsdale, NJ: Erlbaum, pp.75-102. Fillmore, Charles, Kay, Paul, and O'Connor, Mary 1988. 'Regularity and idiomaticity in grammatical constructions: the case of let alone.', Language 64: 501-538. Frost, Ram, Deutsch, Avital, Gilboa, Orna, Tannenbaum, Michael, and MarslenWilson, William 2000. 'Morphological priming: Dissociation of phonological, semantic, and morphological factors', Memory & Cognition 28: 1277-1288. Frost, Ram., Ahissar, M., Gotesman, R., and Tayeb, S. 2003. 'Are phonological effects fragile? The effect of luminance and exposure duration on form priming and phonological priming', Journal of Memory and Language 48: 346-378. Geeraerts, Dirk and Cuyckens, Hubert 2007. The Oxford Handbook of Cognitive Linguistics. Oxford: Oxford University Press Gisborne, Nikolas 2008. 'Dependencies are constructions', in Graeme Trousdale & Nikolas Gisborne (eds.) Constructional approaches to English grammar. New York: Mouton, pp. Gisborne, Nikolas 2009. 'Light verb constructions. ', Journal of Linguistics Gisborne, Nikolas 2010. The event structure of perception verbs. Oxford: Oxford University Press Gisborne, Nikolas 2011. 'Constructions, Word Grammar, and grammaticalization', Cognitive Linguistics 22: 155-182. Goldberg, Adele 1995. Constructions. A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press Halle, Morris and Marantz, Alec 1993. 'Distributed morphology and the pieces of inflection.', in Ken Hale & Samuel Keyser (eds.) The view from Building 20: essays in linguistics in honor of Sylvain Bromberger. Cambridge, MA: MIT Press, pp.111-176. Halliday, Michael 1961. 'Categories of the theory of grammar.', Word 17: 241-292. Halliday, Michael 1966. 'Lexis as a linguistic level', in Charles Bazell, John Catford, Michael Halliday, & Robert Robins (eds.) In Memory of J. R. Firth. London: Longman, pp.148-162. Harris, Zellig 1951. Structural Linguistics. Chicago: University of Chicago Press Hockett, Charles 1958. A Course in Modern Linguistics. New York: Macmillan Holmes, Jasper and Hudson, Richard 2005. 'Constructions in Word Grammar', in JanOla Östman & Mirjam Fried (eds.) Construction Grammars. Cognitive grounding and theoretical extensions. Amsterdam: Benjamins, pp.243-272. Hudson, Richard 1984. Word Grammar. Oxford: Blackwell. Hudson, Richard 1990. English Word Grammar. Oxford: Blackwell. Hudson, Richard 2007. Language networks: the new Word Grammar. Oxford: Oxford University Press Hudson, Richard 2008. 'Word Grammar and Construction Grammar', in Graeme Trousdale & Nikolas Gisborne (eds.) Constructional approaches to English grammar. New York: Mouton, pp.257-302. Hudson, Richard 2010. An Introduction to Word Grammar. Cambridge: Cambridge University Press Hudson, Richard and Holmes, Jasper 2000. 'Re-cycling in the Encyclopedia', in Bert Peeters (ed.) The Lexicon/Encyclopedia Interface. Amsterdam: Elsevier, pp.259-290. Hutchison, Keith 2003. 'Is semantic priming due to association strength or feature overlap? A microanalytic review.', Psychonomic Bulletin & Review 10: 785813. Jackendoff, Ray 1997. The Architecture of the Language Faculty. Cambridge, MA: MIT Press Jackendoff, Ray 2008. 'Alternative Minimalist Visions of Language.', in R Edwards, P Midtlying, K Stensrud, & C Sprague (eds.) Chicago Linguistic Society 41: The Panels. Chicago: Chicago Linguistic Society, pp.189-226. Karmiloff-Smith, Annette 1992. Beyond Modularity. A developmental perspective on cognitive science. Cambridge, MA: MIT Press Karmiloff-Smith, Annette 1994. 'Precis of Beyond modularity: A developmental perspective on cognitive science.', Behavioral and Brain Sciences 17: 693745. Kay, Paul and Fillmore, Charles 1999. 'Grammatical constructions and linguistic generalizations: The what's X doing Y? Construction', Language 75: 1-33. Lakoff, George 1977. 'Linguistic gestalts', Papers From the Regional Meeting of the Chicago Linguistics Society 13: 236-287. Lamb, Sydney 1998. Pathways of the Brain. The neurocognitive basis of language. Amsterdam: Benjamins Langacker, Ronald 1987. Foundations of Cognitive Grammar: Theoretical prerequisites. Stanford: Stanford University Press Langacker, Ronald 1990. Concept, Image and Symbol. The Cognitive Basis of Grammar. Berlin: Mouton de Gruyter Langacker, Ronald 2007. 'Cognitive grammar', in Dirk Geeraerts & Hubert Cuyckens (eds.) The Oxford Handbook of Cognitive Linguistics. Oxford: Oxford University Press, pp.421-462. Lindblom, Björn, MacNeilage, Peter, and Studdert-Kennedy, Michael 1984. 'Selforganizing processes and the explanation of language universals.', in Brian Butterworth, Bernard Comrie, & Östen Dahl (eds.) Explanations for Language Universals. Berlin/New York: Walter de Gruyter, pp.181-203. Luger, George and Stubblefield, William 1993. Artificial Intelligence. Structures and strategies for complex problem solving. New York: Benjamin Cummings Marslen-Wilson, William 2006. 'Morphology and Language Processing', in Keith Brown (ed.) Encyclopedia of Language & Linguistics, Second edition. Oxford: Elsevier, pp.295-300. Nunberg, Geoffrey, Sag, Ivan, and Wasow, Thomas 1994. 'Idioms', Language 70: 491-538. Owens, Jonathan 1988. The Foundations of Grammar: an Introduction to Mediaeval Arabic Grammatical Theory. Amsterdam: Benjamins Percival, Keith 1976. 'On the historical source of immediate constituent analysis.', in James McCawley (ed.) Notes from the Linguistic Underground. London: Academic Press, pp.229-242. Percival, Keith 1990. 'Reflections on the History of Dependency Notions in Linguistics.', Historiographia Linguistica. 17: 29-47. Peterson, Robert R., Burgess, Curt, Dell, Gary, and Eberhard, Kathleen 2001. 'Dissociation between syntactic and semantic processing during idiom comprehension.', Journal of Experimental Psychology: Learning, Memory, & Cognition. 27: 1223-1237. Pollard, Carl and Sag, Ivan 1994. Head-Driven Phrase Structure Grammar. Chicago: Chicago University Press Reay, Irene 2006. 'Sound Symbolism', in Keith Brown (ed.) Encyclopedia of Language &amp; Linguistics (Second Edition). Oxford: Elsevier, pp.531-539. Reisberg, Daniel 2007. Cognition. Exploring the Science of the Mind. Third media edition. New York: Norton Robins, Robert 2001. 'In Defence of WP (Reprinted from TPHS, 1959)', Transactions of the Philological Society 99: 114-144. Ross, John 1970. 'On declarative sentences', in Roderick Jacobs & Peter Rosenbaum (eds.) Readings in English Transformational Analysis. Waltham, Mass: Ginn, pp.222-272. Rosta, Andrew 1994. 'Dependency and grammatical relations.', UCL Working Papers in Linguistics 6: 219-258. Rosta, Andrew (1997). English Syntax and Word Grammar Theory. PhD dissertation, UCL, London. Rosta, Andrew 2005. 'Structural and distributional heads', in Kensei Sugayama & Richard Hudson (eds.) Word Grammar: New Perspectives on a Theory of Language Structure. London: Continuum, pp.171-203. Sadock, Jerrold 1991. Autolexical Syntax: A theory of parallel grammatical representations. Chicago: University of Chicago Press Sprenger, Simone, Levelt, Willem, and Kempen, Gerard 2006. 'Lexical Access during the Production of Idiomatic Phrases', Journal of Memory and Language 54: 161-184. Stump, Gregory 2001. Inflectional Morphology: A Theory of Paradigm Structure. Cambridge: Cambridge University Press Sugayama, Kensei 2002. 'The grammar of Be to: from a Word Grammar point of view.', in Kensei Sugayama (ed.) Studies in Word Grammar. Kobe: Research Institute of Foreign Studies, Kobe City University of Foreign Studies, pp.97111. Tallerman, Maggie 2009. 'Phrase structure vs. dependency: the analysis of Welsh syntactic soft mutation', Journal of Linguistics 45: 167-201. Taylor, John 2007. 'Cognitive linguistics and autonomous linguistics', in Dirk Geeraerts & Hubert Cuyckens (eds.) The Oxford Handbook of Cognitive Linguistics. Oxford: Oxford University Press, pp.566-588. Tesnière, Lucien 1959. Éléments de syntaxe structurale. Paris: Klincksieck Wierzbicka, Anna 1985. Lexicography and Conceptual Analysis. Ann Arbor: Karoma Wierzbicka, Anna 1996. Semantics: Primes and universals. Oxford: Oxford University Press Zwicky, Arnold 1985. 'The case against plain vanilla syntax', Studies in the Linguistic Sciences 15: 1-21.