Language and the digital code

advertisement
Language and the digital code (DRAFT)
Nigel Love
Department of Linguistics, University of Cape Town
Language is bound to be a focus of intense interest in any attempt to understand human
cognition. It’s a trite truism that the fact that the cognitive abilities of Homo sapiens outrun
those of other species and are in certain respects unique is connected in some way with the
fact that Homo sapiens uses language. But exactly what it is that language does for us, and
how it does it, are questions to which there are no agreed answers.
The general issue of ‘language and cognition’ may be at once separated into two questions
or clusters of questions: (i) how we cognise language and (ii) how language facilitates
certain cognitive powers distinctive of human beings. Underlying both, of course, is the
prior question how language itself is to be conceptualised. Not only does that question
have priority, I think that trying for an answer to it may illuminate the other two.
What does it mean to say, as DR does, that ‘natural languages are systems for digital
signalling’, or that ‘coding information in a digitally structured way is at least one of the
things that human languages do’? I suspect that there are many otherwise competent
English-users who have at least some difficulty in decoding such statements.
The implied contrast is with analog signalling. If I’m in pain I may grunt and groan. If I’m
in extreme pain I may shout and scream. In the latter case the shouting and screaming
signals the greater intensity of the pain. The greater loudness of the vocalisations iconically
represents, or analogically models, the greater painfulness. But the semantic difference
signalled by the switch from a mere grunt to a piercing scream is a feature of the message
as a whole. By contrast, digital signalling would allow the separation of different semantic
elements: one signal for pain, say, and another for the greater intensity. Just as when we
insert the adjective intense in front of the noun pain and say ‘I am in intense pain’. Intense
and pain are two discrete digits, or combinations of discrete digits, of the signalling system
constituted by the English language. That at any rate seems to be the general idea.
What does DR want to say about digital codes? Following Dennett, he argues that one of
the functions of such coding is to narrate ‘selves’ into being. A self is a narrative construct
similar to a character in a novel. It is a virtual object spun in public discourse and closely
monitored and tended by the organism that hosts and takes the lead in constructing it.
Early in the narration many possibilities remain open, but as people get older the
possibility space becomes narrower. The reduced space is the self, says Ross; the reduction
is what enables others, along with the subject himself, to predict what the person will do
across a range of situations. Thus people can successfully coordinate their behaviour with
that of others. Describing their behaviour in the terms made available by a natural
language forces people to sort the data into categorical spaces of lower dimensionality than
would be used by a neuroscientist or behavioural scientist who was trying to be objective.
Thus there is a kind of dimensional compression through translation into a digital
representational format. People must interpret and report the data in the terms made
available by the culturally determined ‘report form’ they find in the environment – the local
language. For people, as for no other animals, the most important part of the environment
is a set of virtual artefacts, which are themselves evolved and evolving structures. Humans
1
force their thoughts to conform to evolved digital categorisation spaces by continuously
narrating accounts of their behaviour and interpreted mental processes, both ongoing and
in retrospect. This digitality is the essential property of language, says Ross, that makes
humans ‘ecologically special’.
I’m not sure Don is actually talking about ‘selves’, as usually understood. ‘Adult character’?
Broadly speaking, this amounts to saying that you can only say what there are words for.
And presumably that goes not just for narrating selves (or adult characters) into being, but
for all uses of language whatever.
So what emerges here, extracted from DR’s particular context, is a very familiar story
about language. People can communicate linguistically in so far as there is an established
public plan for doing so. And the same question arises about all such stories: how does the
public plan come into being? Presumably it is not supposed to have existed prior to any
use of language at all. There must have been pre-plan language use, out of which and as a
result of which the plan somehow emerged. But it’s hard to get into focus what, according
to plan theorists, this ‘emergence’ is supposed to amount to.
Now I’m not sure what DR means when he talks of ‘virtual’ artefacts in this context – that
is, I’m not sure what categorisation space he wants ‘virtual’ to occupy. I actually think it’s
an excellent word in this context; it’s just that maybe the sense I want to use it in is not
DR’s. Anyway, the virtuality of the linguistic plan, in my sense, is precisely what makes its
role not only in the human environment but also its role in any naturalistic account of
language in the human environment contentious and tricky in the extreme.
All this I shall be going into. But let’s begin at the beginning. Why are the units of a digital
signalling system called digits? Let’s look at some digits.
___________________________________
0123456789
___________________________________
The set of Arabic numerals. What about them? At bottom they are a familiar example of
what semiologists sometimes call an emblematic frame. An emblematic frame is a closed,
finite set of symbols with certain ready-made relationships among the members, but which
have no intrinsic semiotic content. The most familiar examples in our culture are sets of
items used in playing games. Take dominoes. A set of dominoes is an emblematic frame.
The only given relationships among them are precedence and equivalence. Taking your
turn at dominoes, in the usual game played with them, requires playing a piece one end of
which has an equivalence relationship with the available end of a piece already played. The
equivalence is usually symbolised as a matching number of pips. There is also a
precedence relation – the higher the precedence of the pieces left in your hand at the end
of the game, the worse off you are. But the emblematic frame in question isn’t intrinsically
wedded to playing that particular game, or to playing games at all. A slightly more complex
example would be a pack of playing cards. Again, the cards themselves ‘mean’ nothing:
they simply offer a system of abstract relationships, again of precedence and equivalence,
on which you can impose any significance you likes. The standard pack of 52 cards in four
suits could obviously be pressed into service for calendrical purposes, for instance, in a
society that attached special importance to marking the weeks and seasons of the year. (So
2
this week, in the northern hemisphere, might be the nine of diamonds in a playing-card
calendar.) In fact, apart from divination, we mostly use them for playing games. But note
the huge variety of kinds of card games. Is there any structural similarity between snap and
bridge, or patience and whist? They’re all played with the cards, but the system of
emblems is being exploited in very different ways.
The Arabic figures are an emblematic frame in this sense; in fact they form a sub-part of
the playing card frame. In this case the only given relationship among the members is
precedence, or priority. There is an established numerical order, as given. But the
significance of the ordering is not fixed. In particular, the figures are not necessarily used
for counting. A group of people each pick blindly from a set of tokens numbered 0 to 9.
Highest goes first, wins the prize or whatever. All that matters is, precisely, that the
symbols have an established order of precedence. Nothing is being counted here. Very
simple emblematic frames, like this one, when pressed into service for purposes of writing,
constitute notations.
Using the figures for counting, by treating them as representing numbers, is one of the
most frequent uses we have for them, and it may be that historically their more
semiologically primitive functions are, as a matter of fact, actually derived from this. But as
symbols, as digits, as a notation system, they have no necessary connection with
mathematics, and using them for mathematical purposes requires a set of rules. That is to
say, the notation has to be used in a particular script. For instance, although the established
numerical order may determine the arithmetic value to be attached to the ten symbols
themselves, if they’re to be used in mathematical writing for representing numbers beyond
nine, conventions have to be established:
___________________________________________________
13 + 6 = 73
102 - 56 = 631
___________________________________________________
In my script when you get beyond nine you write units tens and hundreds from left to
right rather than right to left. The point is, whatever the actual rules may be, there have to
be rules of that kind if the notation is to be used as a script.
I take the use of the Arabic numeral notational system as a script for mathematical writing
to exemplify the basic idea of a digital code. The set of digits becomes a code if and when
there are rules for its use in systematically encoding semiotic values. The question is
whether there is any parallel or analogue in language.
The obvious candidate parallel is alphabetic writing:
________________________________________________
ABCDEFGHIJKLMNOPQRSTUVWXYZ
________________________________________________
Once again we have a notational system, this time with 26 members, in this particular
version of the Roman version, with standard precedence relations as given, reading from
left to right. This system shares some of its more primitive uses with the Arabic numeral
system, and in those uses is indeed interchangeable with it. (Hence the term
3
‘alphanumeric’.) So you can “number” the points in your argument 1, 2, 3, 4, or if you
prefer, a, b, c, d – it makes no difference. In fact, in the case of some systems of vehicle
registration so-called “numbers” and telephone so-called “numbers” letters and figures are
combined. Note that in this kind of use even the basic precedence relationships are
nullified – a telephone exchange may historically have dished out its numbers in numerical
order, and at one time if no longer you could tell roughly how old the British civil aircraft
you were flying in was by how close to the beginning of the alphabet its registration letters
were – but in cases like this all that matters is that your “number” be a combination of
digits different from everyone else’s. So from the semiologist’s point of view the alphabet
is no more intrinsically wedded to representing language than the Arabic figures are to
representing numbers.
The question is whether, when the alphabet is used in connection with language, there is
any parallel with the way figures are used in mathematical writing. In other words, is the
alphabet, as a linguistic script, capable of use as a digital code?
Some writers on the subject seem to think that the mere fact that it consists of a set of
digits automatically makes it a digital code. Thus the literary theorist Florian Cramer says
that whereas ‘sounds and images are not code by themselves, but have to be turned into
code in order to be computed, any written text already is code’. But this seems to mean
just that written text consists of alphanumeric characters. Doubtless that makes them easy
to encode, for computing purposes: you assign the characters values in the particular, e.g.
binary, digital code being used by the computer, e.g. the ASCII code. You can call a set of
such characters ‘code’ if you like. In which case there is an important difference between
‘code’ and ‘a code’, which is what we’re interested in.
What might the alphabet encode, where language is concerned? One possible, indeed
traditional, answer is: speech sounds. The reason this answer seems tempting is that all
known uses of the alphabet in connection with language are at least vaguely phonographic
in tendency. That is, there is to different degrees a discernible correlation between letters
and sound. This comes across strikingly in cases where the same symbols of the alphabetic
notation happen to be used for different purposes in different scripts:
______________________________________________________________________
ALTER [German ‘age, epoch’]
CHAIR
SAIL [Welsh
[French ‘flesh’]
‘foundation, basis’]
TRUTH [Welsh ‘falsehood’]
______________________________________________________________________
Alternatively you can read them as English words.
Fortuitous translingual homographies demonstrate very clearly the difference between a
notation and a script. Notationally there are four items, but when you take account of
scripts there are eight. But the point I want to make right now is that these completely
unconnected words in different languages, although they have nothing in common
4
semantically, nonetheless show a certain phonetic similarity. Which is quite remarkable,
given that the alphabet is, after all, just a notation.
In fact as part of turning the alphabetic notation into a script some languages impose an
extra layer of structure on the notation itself in order to facilitate the correlation with
sounds.
______________________________________________________________________
A B C CH D DD E F FF G NG H I L LL M N O P PH R RH S T TH U
W Y
______________________________________________________________________
This is the Welsh version of the Roman alphabet. No J, K, Q, V, X or Z, but nonetheless
28 “letters”, because consonantal digraphs count as separate units of the script. They have
their own names and their own place in alphabetical order, which can make using a Welsh
dictionary problematic. It’s as if we thought of, say ‘shop’, as a three-letter word beginning
with ‘esh’.
So the connection with phonography is there. That is why large stretches of broad
transcription in a phonetic alphabet based on attaching to the Roman letters Standard
Average European sound values are perfectly readable by an SAE speaker who has never
learned to use the phonetic alphabet as such:
______________________________________________________________________
mat [mat], sat [sat], hat [hat], bat[bat]…
______________________________________________________________________
But it’s obvious that the phonographic tendency is just that – a sort of partial and sporadic
keeping in touch with the idea of representing sounds. One reason that’s all it is is that
speech neither consists of nor is determinately analysable as a series of discrete sound
segments – a problem that phonetic alphabets – specifically and overtly intended to
represent sounds -- have always come up against, let alone ordinary ones. In fact
historically although the linguistic use of the alphabet may have its origins in some sort of
analysis of the phonetic structure of syllables, that is conspicuously not a feature of how it
actually works now, for any language you care to mention. English is an especially
notorious instance.
OK, so what does alphabetic writing represent, if anything, and how?
______________
mdroen raedres
______________
Very likely you can’t make much of that. But try this:
______________________________________________________________________
Reprsntng sepech suodns is evdntly not how aplahbtiec wirtnig wroks pyscohlgcly, at lesat
for mdroen raedres
______________________________________________________________________
5
DR has a term for people like me who find the idea problematic that languages are digital
signalling systems. He calls us ‘processing holists’. But I’d say that if something like this
can processed at all, ‘holistically’ is a pretty good word for how its being done. Your sense
of what’s going on there is a function of seeing the whole lot together, i.e. holistically.
And surely right there we see an important disanalogy with genuinely digital signalling.
The slightest typo when keying in an email address and the stupid machine sends the
message straight back to you.
If anything can properly said to be “represented” here, it is something more abstract or
high-level than how the inscription would sound as a spoken utterance. So we can safely
conclude that if a language is a digital signalling system, the digits we are looking for are
not to be equated with the individual letters of alphabetic writing.
With what, then? Perhaps with combinations of letters as used to encode words or other
individually meaningful units. But this too looks unpromising. Certainly the alphabet, even
as a script, does not automatically lend itself to the systematic encoding of words, as this
example shows
______________________________________________________________________
In any csae, rprsntng sepech suodns is evdntly not how aplahbtiec wirtnig wroks
pyscohlgcly, at laset for mdroen raedres
______________________________________________________________________
We can certainly set up standard spelling systems if we like, although they are by no means
universal, and we can occasionally get a standard spelling system to make useful word-level
distinctions, as with
_______
rite
write
right
wright
_______
But just as often we don’t bother, as with
________________
port (not starboard)
port (not sherry)
port (not harbour)
________________
The fact that we don’t always bother must surely tell us something about how the writing
system is actually working.
And in any case we can do perfectly well without using any such standard system, even if
there is one:
6
______________________________________________________________________
Cristes msse
cristesmesse
christ-masse
kryst-masse
cristemes
cristemasse
cristmes
cristmas
crysmas
cristimas
Christmasse
Christmass
______________________________________________________________________
Those are all historically attested, and a lot of them, and others you could produce, are still
in common use, especially among children sending out home-made Xmas cards..
Doubtless we all know, or are capable of coming to know, that these are variant spellings
of the same word – or words. The question is how we know that, if the system for
signalling words is supposed to be digital. The fact is, writing these out required my
computer to generate different digital encodings for each one. That’s how it keeps them
apart. Anyone who wants to say that they all in some sense encode ‘the same thing’ has a
problem identifying the level at which the sameness gets encoded.
So what do we do? There seems to be no higher level of abstraction to which we can go if
we want to vindicate the idea that the sameness underlying the variants is itself somehow
signalled digitally. Of course there is no difficulty in finding an ad hoc graphic representation
of what we are looking for here – for instance, we can represent the English word of
which all the above are variant spellings like this:
______________________________________________________________________
CHRISTMAS
Cristes msse
cristesmesse
christ-masse
kryst-masse
cristemes
cristemasse
cristmes
cristmas
crysmas
cristimas
Christmasse
Christmass
Christmas
______________________________________________________________________
7
Take the current standard spelling and put in authoritative Gill Sans caps to show that this
is a metarepresentation of ‘the word itself’ and not just another variant.
The trouble is that, do what you like with the typography, it is just another variant. (Down
there at the bottom.)
The writing conventions of English, like those of all other languages, offer no
superordinate system of metarepresentations consistently establishable as such. One
reason for that is, no doubt, that we’ve never felt the need for any such thing, Nor are we
any better off – in fact we’re worse off – if we turn to spoken English. Here of course we
find indefinitely many variant signals associated with all or any one of these written forms.
What am I getting at here? The problem is that these written forms, considered as
representations, are indeterminate as to the scope of what they represent. Take any one of them and
ask: does that somehow intrinsically stand for the word itself, or merely instances of that
particular version of it? If I took one of the ‘incorrect’ ones at random, e.g.
____________________________________________________
Christmass
____________________________________________________
and asked you to ‘copy that down’, what would you do? I think you’re as likely as not
automatically to ‘correct’ the spelling. In other words, the level of abstraction at which you
are supposed to take that is not somehow there to be read off its face.
So, if languages encode information in a digitally structured way, where are the digits?
Now of course, given a certain breadth of brush, or a certain level of generality, and
especially if the only alternatives on offer are that linguistic signalling is either analog, or
else digital, with no third possibility allowed – then clearly it has to go down as digital, or at
least digital-like. But I think resting content with that sort of rough and ready
characterisation may be positively misleading. There may be some value in teasing out
ways in which, and reasons why, languages are not digital signalling systems and the
implications of that for understanding it.
There’s a current academic fashion for writing articles with the title: what’s special about
language? – linguists, philosophers, cognitive scientists are all at it. Here’s my take on it.
What’s special about language – and what we are up against with the Christmass problem –
is that language is interpretatively terminal. There is nothing that stands to language in the
relation that language stands in to everything else. Conceptual difficulties arise because,
despite that fact, we are prey to the notion that language itself can be made to stand to
language in the relation that language stands in to everything else.
Language can be used to investigate and talk about anything under the sun, not least the
sun itself. But what about language itself? A particular kind of use of language to talk
about language – that is, a particular way of exploiting the reflexivity of language, we call
‘linguistics’. And the fact that linguistics is language about language has often made
linguists uneasy. For instance, the linguist J. R. Firth remarks that ‘the reflexive character
of linguistics, in which language is turned back on itself, is one of out major problems’ But
8
Firth never attempted to solve the problem, or even to state precisely what he took it to
be.
The problem is that turning the medium of inquiry back on itself it becomes an object of
inquiry, and to envisage treating linguistic phenomena as objects is, in and of itself, to
propose a distorted account of them. There are no (first-order) linguistic objects of any
kind. Language, I want to suggest, is a temporally situated, ongoing process – the process of
making and remaking signs in contextualised episodes of communicative behaviour. And
if that is accepted then, apart from the provision of anecdotal accounts of the specifics of
particular communicative episodes, one might be inclined to conclude that pointing this
much out is where linguistics ought to stop. Whatever it might be to go further, if going
further implies engaging in the kind of retrospective talk about linguistic signs that requires
that they be identified in abstracto, it is not and cannot be a matter of reporting on
objectively given first-order realities. In a sense, therefore, linguistics is logically
impossible. Identifying a sign involves decontextualising the unique communicative
episode within which and for purposes of which the sign was created, abstracting and
reifying some aspect of that episode, and presenting the reification for inspection and
analysis as ‘the sign’ in question. But whatever is thus presented cannot be the sign in
question. For the sign has no existence outside its unique communicative episode.
Analytic discourse about language – which involves identifying and citing the linguistic
units recurrently signalled in the digital code – requires decontextualisation, abstraction and
reification. The ultimate foundations for linguistic DA&R lie in the utterly familiar,
everyday metalinguistic act of repetition. What is said or written can be repeated. Someone
asks me ‘Did you say bat?’ My answer is: ‘No, I said hat’. But when I say ‘No, I said hat’, I
am not somehow identifying the abstract unit of the English vocabulary of which my first
utterance was a particular instance. I am simply repeating what I said – i.e. producing
another utterance. The repetition will not of course be an exact replica. It won’t
necessarily be anything objectively like the original. It will merely be similar to the first in
whatever dimensions of similarity I think contextually relevant for usefully answering my questioner’s
question. What those dimensions are will vary from occasion to occasion. Compare these
exchanges:
______________________________________________________________________
Did you say []? No, I said [].
Did you say []? Yes, I said [].
______________________________________________________________________
There is no one, universal, context-neutral dimension of relevant similarity. This is the point DR
misses. According to Ross, to say that language or a language is a code is to say that
‘similar public linguistic representations cue similar behavioral responses in individuals with
similar learning histories, as a result of conventional associations established by those
similar histories’. As far as it goes, that’s fair enough. But what counts as ‘similar’ to what
is a matter to be decided on the spot in the light of the particular communication situation.
Knowing no Greek I once managed to find my way out of a maze-like public building in
Athens when it dawned on me that if I transliterated a certain sign on the wall I got
something very like the English word ‘exodus’. I was working with similarities all right, but
presumably not the ones the Greek-speakers responsible for putting the sign there had in
9
mind. That little incident seems to me to epitomise how communication by means of
language actually works.
For certain metalinguistic purposes we do indeed entertain the idea of a context-neutral
dimension of relevant similarity, which can be generalised and projected as the basis for a
thoroughgoing, self-consistent reification of a whole language, giving us a determinate
identification of the morphemes, words and higher-level units that constitute the linguistic
system. How and why that idea arises, in literate societies, I went into in my contribution
to our predecessor conference in Durban 2003 (Love 2004), and I don’t want to go into all
that again now. The core of the argument is that writing is indeed, as DR says, a milestone
in human cultural evolution, but not for the banal reason usually offered, that it makes
messages durable and portable. The reason it’s a milestone is that it allows for
systematically reconfiguring our conception of language itself, in such a way as to give
substance to the idea that utterances are utterances of something more abstract. What is
curious about this illusion is that its hold on us is simultaneously very powerful and yet
surprisingly feeble. On the one hand we take it entirely for granted that there is a
determinate analysis of our utterances and inscriptions in terms of the words, sentences
and so on, that they instantiate. That of course is the basis for the invention of codes and
ciphers of all kinds, digital and otherwise – the very idea of such codes, I suggest, is
parasitic on the notion that natural languages work by encoding recoverable meanings in
fixed, determinately identifiable forms. We retroject that idea on to first-order language
itself and elaborate metalinguistic codifications of languages – i.e. grammar books and
dictionaries. The idea that a language is a digital code is, at bottom, the idea that a
language just is a grammar book plus a dictionary. While at the same time we find
ourselves completely unfazed by the banal ease with which we can call in question or
change our minds about our criteria for identifying linguistic units – which do indeed, as
DR says, ‘evolve’. We don’t really believe in this story after all.
What do I have to do in order to use language to communicate? As a speaker-writer what I
have to do is call on my past linguistic experience so as to maximise the likelihood that you
will perceive contextually determined relevant similarities between my utterance or
inscription here and now, and other utterances and inscriptions you are likely to have come
across, and as hearer-reader I have to do my best to understand you in the light of that
same experience. This process doesn’t require either of us to be in possession of the deliverances of some
particular codification of the language in use, whether private or public.
Most of the time I simply don’t have to worry about whether when you say []
and I say [] or you say [()] and I say [] the
dictionary would say they’re the same words or different words, although these differences
might be communicationally salient and significant in some circumstances. In fact I don’t
have to identify the linguistic units either of us use at all. Any more than the first humans who
achieved semiosis by interpreting some vocalisation as non-iconically meaningful had to
identify the linguistic units. They couldn’t have had to, because there weren’t any. To
interpret recognising that two things are similar as a matter of recognising one thing of
which both are instances is just that – an interpretation. It’s not one that we automatically
resort to in other areas of life, nor is it one we need to resort to for communication by
means of language.
What is the relevance of all this? First of all, it eliminates the need to treat cognising
language as something special or sui generis, for instance à la Chomsky. The Chomskyan
paradigm is based on an essentially rhetorical argument about how quickly a language is
10
‘acquired’, i.e. on how quickly we allegedly master the alleged code (contentiously
characterised as consisting at its core of a recursive syntax). It couldn’t be done, so the
argument runs, unless we were genetically primed for it. This whole line of inquiry rests on
thinking of a language as a code.
Secondly, my story requires that we conceive of language and its use in a way that aligns it
with what certain people I’m going to refer to as distributed cognitionists tell us about the
mind and its interaction with the body and the world. On my story we don’t proceed
where language is concerned by selecting from a mental storehouse the digits or
combinations of digits that have already been prepackaged for us as encoding what we
want to say, or by identifying the same digits in other people’s utterances and referring to
the mental storehouse in order to decode them. (It may feel like that to some of us,
sometimes, or pehaps most of us most of the time, but that’s because we’ve been educated
to think about language that way.) Instead, I suggest, the task is to work out on the hoof
what semiotic significance to confer on certain phenomena (vocal noises, marks on paper,
etc.) in order to operate relevantly on the world in accordance with the requirements of the
unique real-time communication situation we find ourselves in. This involves coming up
with adaptive responses to the totality of the context in which the communicative event is
taking place.
Finally, what makes languages at least seem like digital signalling systems, and yet turn out
not to be when you actually look for the digits?
I think the answer lies simply in the fact that linguistic signs, as Saussure insisted, are for
the most part arbitrary, and that using linguistic signs is a matter of exploiting our ability to
perceive and attach significance to similarities and analogies of all sorts.
The reason you can’t actually find the digits is, as I’ve said, that language is interpretatively
terminal. Real digital codes require to be invented and their use explained – in some
higher-order language that has a semiotic flexibility that’s simply incompatible with its
being itself any kind of established code. It is only because linguistic signs are radically
indeterminate with respect to their identity as units of a prior code that natural languages
can meet the open-ended, unpredictable communicational demands that we impose upon
them.
11
Download