Language and the digital code (DRAFT) Nigel Love Department of Linguistics, University of Cape Town Language is bound to be a focus of intense interest in any attempt to understand human cognition. It’s a trite truism that the fact that the cognitive abilities of Homo sapiens outrun those of other species and are in certain respects unique is connected in some way with the fact that Homo sapiens uses language. But exactly what it is that language does for us, and how it does it, are questions to which there are no agreed answers. The general issue of ‘language and cognition’ may be at once separated into two questions or clusters of questions: (i) how we cognise language and (ii) how language facilitates certain cognitive powers distinctive of human beings. Underlying both, of course, is the prior question how language itself is to be conceptualised. Not only does that question have priority, I think that trying for an answer to it may illuminate the other two. What does it mean to say, as DR does, that ‘natural languages are systems for digital signalling’, or that ‘coding information in a digitally structured way is at least one of the things that human languages do’? I suspect that there are many otherwise competent English-users who have at least some difficulty in decoding such statements. The implied contrast is with analog signalling. If I’m in pain I may grunt and groan. If I’m in extreme pain I may shout and scream. In the latter case the shouting and screaming signals the greater intensity of the pain. The greater loudness of the vocalisations iconically represents, or analogically models, the greater painfulness. But the semantic difference signalled by the switch from a mere grunt to a piercing scream is a feature of the message as a whole. By contrast, digital signalling would allow the separation of different semantic elements: one signal for pain, say, and another for the greater intensity. Just as when we insert the adjective intense in front of the noun pain and say ‘I am in intense pain’. Intense and pain are two discrete digits, or combinations of discrete digits, of the signalling system constituted by the English language. That at any rate seems to be the general idea. What does DR want to say about digital codes? Following Dennett, he argues that one of the functions of such coding is to narrate ‘selves’ into being. A self is a narrative construct similar to a character in a novel. It is a virtual object spun in public discourse and closely monitored and tended by the organism that hosts and takes the lead in constructing it. Early in the narration many possibilities remain open, but as people get older the possibility space becomes narrower. The reduced space is the self, says Ross; the reduction is what enables others, along with the subject himself, to predict what the person will do across a range of situations. Thus people can successfully coordinate their behaviour with that of others. Describing their behaviour in the terms made available by a natural language forces people to sort the data into categorical spaces of lower dimensionality than would be used by a neuroscientist or behavioural scientist who was trying to be objective. Thus there is a kind of dimensional compression through translation into a digital representational format. People must interpret and report the data in the terms made available by the culturally determined ‘report form’ they find in the environment – the local language. For people, as for no other animals, the most important part of the environment is a set of virtual artefacts, which are themselves evolved and evolving structures. Humans 1 force their thoughts to conform to evolved digital categorisation spaces by continuously narrating accounts of their behaviour and interpreted mental processes, both ongoing and in retrospect. This digitality is the essential property of language, says Ross, that makes humans ‘ecologically special’. I’m not sure Don is actually talking about ‘selves’, as usually understood. ‘Adult character’? Broadly speaking, this amounts to saying that you can only say what there are words for. And presumably that goes not just for narrating selves (or adult characters) into being, but for all uses of language whatever. So what emerges here, extracted from DR’s particular context, is a very familiar story about language. People can communicate linguistically in so far as there is an established public plan for doing so. And the same question arises about all such stories: how does the public plan come into being? Presumably it is not supposed to have existed prior to any use of language at all. There must have been pre-plan language use, out of which and as a result of which the plan somehow emerged. But it’s hard to get into focus what, according to plan theorists, this ‘emergence’ is supposed to amount to. Now I’m not sure what DR means when he talks of ‘virtual’ artefacts in this context – that is, I’m not sure what categorisation space he wants ‘virtual’ to occupy. I actually think it’s an excellent word in this context; it’s just that maybe the sense I want to use it in is not DR’s. Anyway, the virtuality of the linguistic plan, in my sense, is precisely what makes its role not only in the human environment but also its role in any naturalistic account of language in the human environment contentious and tricky in the extreme. All this I shall be going into. But let’s begin at the beginning. Why are the units of a digital signalling system called digits? Let’s look at some digits. ___________________________________ 0123456789 ___________________________________ The set of Arabic numerals. What about them? At bottom they are a familiar example of what semiologists sometimes call an emblematic frame. An emblematic frame is a closed, finite set of symbols with certain ready-made relationships among the members, but which have no intrinsic semiotic content. The most familiar examples in our culture are sets of items used in playing games. Take dominoes. A set of dominoes is an emblematic frame. The only given relationships among them are precedence and equivalence. Taking your turn at dominoes, in the usual game played with them, requires playing a piece one end of which has an equivalence relationship with the available end of a piece already played. The equivalence is usually symbolised as a matching number of pips. There is also a precedence relation – the higher the precedence of the pieces left in your hand at the end of the game, the worse off you are. But the emblematic frame in question isn’t intrinsically wedded to playing that particular game, or to playing games at all. A slightly more complex example would be a pack of playing cards. Again, the cards themselves ‘mean’ nothing: they simply offer a system of abstract relationships, again of precedence and equivalence, on which you can impose any significance you likes. The standard pack of 52 cards in four suits could obviously be pressed into service for calendrical purposes, for instance, in a society that attached special importance to marking the weeks and seasons of the year. (So 2 this week, in the northern hemisphere, might be the nine of diamonds in a playing-card calendar.) In fact, apart from divination, we mostly use them for playing games. But note the huge variety of kinds of card games. Is there any structural similarity between snap and bridge, or patience and whist? They’re all played with the cards, but the system of emblems is being exploited in very different ways. The Arabic figures are an emblematic frame in this sense; in fact they form a sub-part of the playing card frame. In this case the only given relationship among the members is precedence, or priority. There is an established numerical order, as given. But the significance of the ordering is not fixed. In particular, the figures are not necessarily used for counting. A group of people each pick blindly from a set of tokens numbered 0 to 9. Highest goes first, wins the prize or whatever. All that matters is, precisely, that the symbols have an established order of precedence. Nothing is being counted here. Very simple emblematic frames, like this one, when pressed into service for purposes of writing, constitute notations. Using the figures for counting, by treating them as representing numbers, is one of the most frequent uses we have for them, and it may be that historically their more semiologically primitive functions are, as a matter of fact, actually derived from this. But as symbols, as digits, as a notation system, they have no necessary connection with mathematics, and using them for mathematical purposes requires a set of rules. That is to say, the notation has to be used in a particular script. For instance, although the established numerical order may determine the arithmetic value to be attached to the ten symbols themselves, if they’re to be used in mathematical writing for representing numbers beyond nine, conventions have to be established: ___________________________________________________ 13 + 6 = 73 102 - 56 = 631 ___________________________________________________ In my script when you get beyond nine you write units tens and hundreds from left to right rather than right to left. The point is, whatever the actual rules may be, there have to be rules of that kind if the notation is to be used as a script. I take the use of the Arabic numeral notational system as a script for mathematical writing to exemplify the basic idea of a digital code. The set of digits becomes a code if and when there are rules for its use in systematically encoding semiotic values. The question is whether there is any parallel or analogue in language. The obvious candidate parallel is alphabetic writing: ________________________________________________ ABCDEFGHIJKLMNOPQRSTUVWXYZ ________________________________________________ Once again we have a notational system, this time with 26 members, in this particular version of the Roman version, with standard precedence relations as given, reading from left to right. This system shares some of its more primitive uses with the Arabic numeral system, and in those uses is indeed interchangeable with it. (Hence the term 3 ‘alphanumeric’.) So you can “number” the points in your argument 1, 2, 3, 4, or if you prefer, a, b, c, d – it makes no difference. In fact, in the case of some systems of vehicle registration so-called “numbers” and telephone so-called “numbers” letters and figures are combined. Note that in this kind of use even the basic precedence relationships are nullified – a telephone exchange may historically have dished out its numbers in numerical order, and at one time if no longer you could tell roughly how old the British civil aircraft you were flying in was by how close to the beginning of the alphabet its registration letters were – but in cases like this all that matters is that your “number” be a combination of digits different from everyone else’s. So from the semiologist’s point of view the alphabet is no more intrinsically wedded to representing language than the Arabic figures are to representing numbers. The question is whether, when the alphabet is used in connection with language, there is any parallel with the way figures are used in mathematical writing. In other words, is the alphabet, as a linguistic script, capable of use as a digital code? Some writers on the subject seem to think that the mere fact that it consists of a set of digits automatically makes it a digital code. Thus the literary theorist Florian Cramer says that whereas ‘sounds and images are not code by themselves, but have to be turned into code in order to be computed, any written text already is code’. But this seems to mean just that written text consists of alphanumeric characters. Doubtless that makes them easy to encode, for computing purposes: you assign the characters values in the particular, e.g. binary, digital code being used by the computer, e.g. the ASCII code. You can call a set of such characters ‘code’ if you like. In which case there is an important difference between ‘code’ and ‘a code’, which is what we’re interested in. What might the alphabet encode, where language is concerned? One possible, indeed traditional, answer is: speech sounds. The reason this answer seems tempting is that all known uses of the alphabet in connection with language are at least vaguely phonographic in tendency. That is, there is to different degrees a discernible correlation between letters and sound. This comes across strikingly in cases where the same symbols of the alphabetic notation happen to be used for different purposes in different scripts: ______________________________________________________________________ ALTER [German ‘age, epoch’] CHAIR SAIL [Welsh [French ‘flesh’] ‘foundation, basis’] TRUTH [Welsh ‘falsehood’] ______________________________________________________________________ Alternatively you can read them as English words. Fortuitous translingual homographies demonstrate very clearly the difference between a notation and a script. Notationally there are four items, but when you take account of scripts there are eight. But the point I want to make right now is that these completely unconnected words in different languages, although they have nothing in common 4 semantically, nonetheless show a certain phonetic similarity. Which is quite remarkable, given that the alphabet is, after all, just a notation. In fact as part of turning the alphabetic notation into a script some languages impose an extra layer of structure on the notation itself in order to facilitate the correlation with sounds. ______________________________________________________________________ A B C CH D DD E F FF G NG H I L LL M N O P PH R RH S T TH U W Y ______________________________________________________________________ This is the Welsh version of the Roman alphabet. No J, K, Q, V, X or Z, but nonetheless 28 “letters”, because consonantal digraphs count as separate units of the script. They have their own names and their own place in alphabetical order, which can make using a Welsh dictionary problematic. It’s as if we thought of, say ‘shop’, as a three-letter word beginning with ‘esh’. So the connection with phonography is there. That is why large stretches of broad transcription in a phonetic alphabet based on attaching to the Roman letters Standard Average European sound values are perfectly readable by an SAE speaker who has never learned to use the phonetic alphabet as such: ______________________________________________________________________ mat [mat], sat [sat], hat [hat], bat[bat]… ______________________________________________________________________ But it’s obvious that the phonographic tendency is just that – a sort of partial and sporadic keeping in touch with the idea of representing sounds. One reason that’s all it is is that speech neither consists of nor is determinately analysable as a series of discrete sound segments – a problem that phonetic alphabets – specifically and overtly intended to represent sounds -- have always come up against, let alone ordinary ones. In fact historically although the linguistic use of the alphabet may have its origins in some sort of analysis of the phonetic structure of syllables, that is conspicuously not a feature of how it actually works now, for any language you care to mention. English is an especially notorious instance. OK, so what does alphabetic writing represent, if anything, and how? ______________ mdroen raedres ______________ Very likely you can’t make much of that. But try this: ______________________________________________________________________ Reprsntng sepech suodns is evdntly not how aplahbtiec wirtnig wroks pyscohlgcly, at lesat for mdroen raedres ______________________________________________________________________ 5 DR has a term for people like me who find the idea problematic that languages are digital signalling systems. He calls us ‘processing holists’. But I’d say that if something like this can processed at all, ‘holistically’ is a pretty good word for how its being done. Your sense of what’s going on there is a function of seeing the whole lot together, i.e. holistically. And surely right there we see an important disanalogy with genuinely digital signalling. The slightest typo when keying in an email address and the stupid machine sends the message straight back to you. If anything can properly said to be “represented” here, it is something more abstract or high-level than how the inscription would sound as a spoken utterance. So we can safely conclude that if a language is a digital signalling system, the digits we are looking for are not to be equated with the individual letters of alphabetic writing. With what, then? Perhaps with combinations of letters as used to encode words or other individually meaningful units. But this too looks unpromising. Certainly the alphabet, even as a script, does not automatically lend itself to the systematic encoding of words, as this example shows ______________________________________________________________________ In any csae, rprsntng sepech suodns is evdntly not how aplahbtiec wirtnig wroks pyscohlgcly, at laset for mdroen raedres ______________________________________________________________________ We can certainly set up standard spelling systems if we like, although they are by no means universal, and we can occasionally get a standard spelling system to make useful word-level distinctions, as with _______ rite write right wright _______ But just as often we don’t bother, as with ________________ port (not starboard) port (not sherry) port (not harbour) ________________ The fact that we don’t always bother must surely tell us something about how the writing system is actually working. And in any case we can do perfectly well without using any such standard system, even if there is one: 6 ______________________________________________________________________ Cristes msse cristesmesse christ-masse kryst-masse cristemes cristemasse cristmes cristmas crysmas cristimas Christmasse Christmass ______________________________________________________________________ Those are all historically attested, and a lot of them, and others you could produce, are still in common use, especially among children sending out home-made Xmas cards.. Doubtless we all know, or are capable of coming to know, that these are variant spellings of the same word – or words. The question is how we know that, if the system for signalling words is supposed to be digital. The fact is, writing these out required my computer to generate different digital encodings for each one. That’s how it keeps them apart. Anyone who wants to say that they all in some sense encode ‘the same thing’ has a problem identifying the level at which the sameness gets encoded. So what do we do? There seems to be no higher level of abstraction to which we can go if we want to vindicate the idea that the sameness underlying the variants is itself somehow signalled digitally. Of course there is no difficulty in finding an ad hoc graphic representation of what we are looking for here – for instance, we can represent the English word of which all the above are variant spellings like this: ______________________________________________________________________ CHRISTMAS Cristes msse cristesmesse christ-masse kryst-masse cristemes cristemasse cristmes cristmas crysmas cristimas Christmasse Christmass Christmas ______________________________________________________________________ 7 Take the current standard spelling and put in authoritative Gill Sans caps to show that this is a metarepresentation of ‘the word itself’ and not just another variant. The trouble is that, do what you like with the typography, it is just another variant. (Down there at the bottom.) The writing conventions of English, like those of all other languages, offer no superordinate system of metarepresentations consistently establishable as such. One reason for that is, no doubt, that we’ve never felt the need for any such thing, Nor are we any better off – in fact we’re worse off – if we turn to spoken English. Here of course we find indefinitely many variant signals associated with all or any one of these written forms. What am I getting at here? The problem is that these written forms, considered as representations, are indeterminate as to the scope of what they represent. Take any one of them and ask: does that somehow intrinsically stand for the word itself, or merely instances of that particular version of it? If I took one of the ‘incorrect’ ones at random, e.g. ____________________________________________________ Christmass ____________________________________________________ and asked you to ‘copy that down’, what would you do? I think you’re as likely as not automatically to ‘correct’ the spelling. In other words, the level of abstraction at which you are supposed to take that is not somehow there to be read off its face. So, if languages encode information in a digitally structured way, where are the digits? Now of course, given a certain breadth of brush, or a certain level of generality, and especially if the only alternatives on offer are that linguistic signalling is either analog, or else digital, with no third possibility allowed – then clearly it has to go down as digital, or at least digital-like. But I think resting content with that sort of rough and ready characterisation may be positively misleading. There may be some value in teasing out ways in which, and reasons why, languages are not digital signalling systems and the implications of that for understanding it. There’s a current academic fashion for writing articles with the title: what’s special about language? – linguists, philosophers, cognitive scientists are all at it. Here’s my take on it. What’s special about language – and what we are up against with the Christmass problem – is that language is interpretatively terminal. There is nothing that stands to language in the relation that language stands in to everything else. Conceptual difficulties arise because, despite that fact, we are prey to the notion that language itself can be made to stand to language in the relation that language stands in to everything else. Language can be used to investigate and talk about anything under the sun, not least the sun itself. But what about language itself? A particular kind of use of language to talk about language – that is, a particular way of exploiting the reflexivity of language, we call ‘linguistics’. And the fact that linguistics is language about language has often made linguists uneasy. For instance, the linguist J. R. Firth remarks that ‘the reflexive character of linguistics, in which language is turned back on itself, is one of out major problems’ But 8 Firth never attempted to solve the problem, or even to state precisely what he took it to be. The problem is that turning the medium of inquiry back on itself it becomes an object of inquiry, and to envisage treating linguistic phenomena as objects is, in and of itself, to propose a distorted account of them. There are no (first-order) linguistic objects of any kind. Language, I want to suggest, is a temporally situated, ongoing process – the process of making and remaking signs in contextualised episodes of communicative behaviour. And if that is accepted then, apart from the provision of anecdotal accounts of the specifics of particular communicative episodes, one might be inclined to conclude that pointing this much out is where linguistics ought to stop. Whatever it might be to go further, if going further implies engaging in the kind of retrospective talk about linguistic signs that requires that they be identified in abstracto, it is not and cannot be a matter of reporting on objectively given first-order realities. In a sense, therefore, linguistics is logically impossible. Identifying a sign involves decontextualising the unique communicative episode within which and for purposes of which the sign was created, abstracting and reifying some aspect of that episode, and presenting the reification for inspection and analysis as ‘the sign’ in question. But whatever is thus presented cannot be the sign in question. For the sign has no existence outside its unique communicative episode. Analytic discourse about language – which involves identifying and citing the linguistic units recurrently signalled in the digital code – requires decontextualisation, abstraction and reification. The ultimate foundations for linguistic DA&R lie in the utterly familiar, everyday metalinguistic act of repetition. What is said or written can be repeated. Someone asks me ‘Did you say bat?’ My answer is: ‘No, I said hat’. But when I say ‘No, I said hat’, I am not somehow identifying the abstract unit of the English vocabulary of which my first utterance was a particular instance. I am simply repeating what I said – i.e. producing another utterance. The repetition will not of course be an exact replica. It won’t necessarily be anything objectively like the original. It will merely be similar to the first in whatever dimensions of similarity I think contextually relevant for usefully answering my questioner’s question. What those dimensions are will vary from occasion to occasion. Compare these exchanges: ______________________________________________________________________ Did you say []? No, I said []. Did you say []? Yes, I said []. ______________________________________________________________________ There is no one, universal, context-neutral dimension of relevant similarity. This is the point DR misses. According to Ross, to say that language or a language is a code is to say that ‘similar public linguistic representations cue similar behavioral responses in individuals with similar learning histories, as a result of conventional associations established by those similar histories’. As far as it goes, that’s fair enough. But what counts as ‘similar’ to what is a matter to be decided on the spot in the light of the particular communication situation. Knowing no Greek I once managed to find my way out of a maze-like public building in Athens when it dawned on me that if I transliterated a certain sign on the wall I got something very like the English word ‘exodus’. I was working with similarities all right, but presumably not the ones the Greek-speakers responsible for putting the sign there had in 9 mind. That little incident seems to me to epitomise how communication by means of language actually works. For certain metalinguistic purposes we do indeed entertain the idea of a context-neutral dimension of relevant similarity, which can be generalised and projected as the basis for a thoroughgoing, self-consistent reification of a whole language, giving us a determinate identification of the morphemes, words and higher-level units that constitute the linguistic system. How and why that idea arises, in literate societies, I went into in my contribution to our predecessor conference in Durban 2003 (Love 2004), and I don’t want to go into all that again now. The core of the argument is that writing is indeed, as DR says, a milestone in human cultural evolution, but not for the banal reason usually offered, that it makes messages durable and portable. The reason it’s a milestone is that it allows for systematically reconfiguring our conception of language itself, in such a way as to give substance to the idea that utterances are utterances of something more abstract. What is curious about this illusion is that its hold on us is simultaneously very powerful and yet surprisingly feeble. On the one hand we take it entirely for granted that there is a determinate analysis of our utterances and inscriptions in terms of the words, sentences and so on, that they instantiate. That of course is the basis for the invention of codes and ciphers of all kinds, digital and otherwise – the very idea of such codes, I suggest, is parasitic on the notion that natural languages work by encoding recoverable meanings in fixed, determinately identifiable forms. We retroject that idea on to first-order language itself and elaborate metalinguistic codifications of languages – i.e. grammar books and dictionaries. The idea that a language is a digital code is, at bottom, the idea that a language just is a grammar book plus a dictionary. While at the same time we find ourselves completely unfazed by the banal ease with which we can call in question or change our minds about our criteria for identifying linguistic units – which do indeed, as DR says, ‘evolve’. We don’t really believe in this story after all. What do I have to do in order to use language to communicate? As a speaker-writer what I have to do is call on my past linguistic experience so as to maximise the likelihood that you will perceive contextually determined relevant similarities between my utterance or inscription here and now, and other utterances and inscriptions you are likely to have come across, and as hearer-reader I have to do my best to understand you in the light of that same experience. This process doesn’t require either of us to be in possession of the deliverances of some particular codification of the language in use, whether private or public. Most of the time I simply don’t have to worry about whether when you say [] and I say [] or you say [()] and I say [] the dictionary would say they’re the same words or different words, although these differences might be communicationally salient and significant in some circumstances. In fact I don’t have to identify the linguistic units either of us use at all. Any more than the first humans who achieved semiosis by interpreting some vocalisation as non-iconically meaningful had to identify the linguistic units. They couldn’t have had to, because there weren’t any. To interpret recognising that two things are similar as a matter of recognising one thing of which both are instances is just that – an interpretation. It’s not one that we automatically resort to in other areas of life, nor is it one we need to resort to for communication by means of language. What is the relevance of all this? First of all, it eliminates the need to treat cognising language as something special or sui generis, for instance à la Chomsky. The Chomskyan paradigm is based on an essentially rhetorical argument about how quickly a language is 10 ‘acquired’, i.e. on how quickly we allegedly master the alleged code (contentiously characterised as consisting at its core of a recursive syntax). It couldn’t be done, so the argument runs, unless we were genetically primed for it. This whole line of inquiry rests on thinking of a language as a code. Secondly, my story requires that we conceive of language and its use in a way that aligns it with what certain people I’m going to refer to as distributed cognitionists tell us about the mind and its interaction with the body and the world. On my story we don’t proceed where language is concerned by selecting from a mental storehouse the digits or combinations of digits that have already been prepackaged for us as encoding what we want to say, or by identifying the same digits in other people’s utterances and referring to the mental storehouse in order to decode them. (It may feel like that to some of us, sometimes, or pehaps most of us most of the time, but that’s because we’ve been educated to think about language that way.) Instead, I suggest, the task is to work out on the hoof what semiotic significance to confer on certain phenomena (vocal noises, marks on paper, etc.) in order to operate relevantly on the world in accordance with the requirements of the unique real-time communication situation we find ourselves in. This involves coming up with adaptive responses to the totality of the context in which the communicative event is taking place. Finally, what makes languages at least seem like digital signalling systems, and yet turn out not to be when you actually look for the digits? I think the answer lies simply in the fact that linguistic signs, as Saussure insisted, are for the most part arbitrary, and that using linguistic signs is a matter of exploiting our ability to perceive and attach significance to similarities and analogies of all sorts. The reason you can’t actually find the digits is, as I’ve said, that language is interpretatively terminal. Real digital codes require to be invented and their use explained – in some higher-order language that has a semiotic flexibility that’s simply incompatible with its being itself any kind of established code. It is only because linguistic signs are radically indeterminate with respect to their identity as units of a prior code that natural languages can meet the open-ended, unpredictable communicational demands that we impose upon them. 11