Explaining Colour Term Typology Basic colour terms are colour words which: - are highly salient - their extensions aren’t included within those of any other colour terms So in English the basic colour terms are red, orange, yellow, green, blue, purple, pink, brown, grey, black and white. There are many more colour words that aren’t basic, e.g. crimson, navy, chartreuse, magenta and turquoise. What needs explaining? - Prototype properties - The range of colours denoted by each colour word varies between languages. - As does the number of basic colour terms, from 2 to 11 (or maybe 12). - But there are cross-linguistic regularities in colour term systems. Prototype Categories Colour words are prototype categories. Some colours are better examples of the word red than others are. We can say: 'a good red' 'sort of red' 'slightly red' There is usually a single colour which is the best example of the colour word (the prototype). The further a colour is from the prototype the less good it is as an example of the colour category. Colour categories have fuzzy boundaries. It's not clear exactly which colours are members of the category. Some colours are marginal members. Colour Term Typology Berlin and Kay (1969) found clear cross-linguistic regularities in colour term systems. They proposed that languages evolved from having only 2 basic colour terms, and gradually added more over time until they reached a ceiling of a maximum of 11 basic terms. (Languages never lose basic colour terms.) The prototypes of basic terms from all languages fall into discrete clusters. People are very consistent in their choice of prototype (but not in where they place boundary colours). white white black red green or yellow green and yellow blue brown purple pink orange grey Kay and Maffi (1999) - Their results are based on the world colour survey which has collected data from 110 languages. - There are six foci: black, white, red, yellow, green and blue which are the prototypes of most colour terms in most languages. - Languages almost always partition the colour space so that each colour will be named by one basic colour term – exceptions to this are rare but do exist. white-red-yellow + black-green-blue white + red-yellow + black-green-blue white + red-yellow + black + green-blue white + red + yellow + black + green-blue white + red + yellow + black + green + blue white + red + yellow + black-green-blue white + red + yellow + green + black-blue white + red + yellow-green-blue + black white + red + yellow-green + blue + black - But there are exceptions to this evolutionary order. MacLaury (1997) - In languages with composite green-blue the focus is in green, other times in blue, and sometimes it is focussed in either green or blue. - It’s common for purple, pink, orange and brown to emerge before the green-blue term has split into green and blue, sometimes this even happens before yellow-red splits. - The order of emergence of derived categories tends to be purple, brown, pink, grey then orange, but there’s lots of variation in this order. - Co-extension – sometimes two colour terms overlap, covering more or less the same range of colours, but with foci in different places. - Different speakers of the same language can be at different evolutionary stages – and in some languages most speakers don’t fit into any evolutionary classification. - Some colour systems are based mainly on lightness, while most are based more on hue. Explaining Language Typology What causes this kind of typological pattern? - Individual psychology. Primary Linguistic Data Language Acquisition Device Individual's Knowledge of Language Chomsky’s Language Acquisition Device - Or an interaction of psychological and social processes. Language Acquisition Device Primary Linguistic Data Individual's Knowledge of Language Arena of Language Use Hurford's Diachronic Spiral Typological patterns may be the result of evolutionary biases over a number of generations, not simply a restriction on the kinds of language which are learnable. Explanations of Colour Term Typology Is colour term typology due to innate learning biases/UG which restricts the kind of colour term systems which can be learned. Or are there just evolutionary pressures which tend to favour the development of some kinds of colour categories over a number of generations. How to test this:(1) Create a computational model of colour term acquisition. - Can this model learn attested colour term systems but not unattested ones. (2) Create multiple copies of the model. - Each copy will be an artificial person. - They can talk to each other over several generations, and learn from one another. (Every so often an older person will die and be replaced by a new person who doesn’t know any language.) - Do the languages which emerge in these simulations have the same properties as real languages? Computational Evolutionary Linguistics: Expression-Induction Models - These models should be distinguished from models of biological evolution. - They simulate historical/diachronic/cultural/glossogenetic change. There have been several such models of syntax. Kirby (1999) :- An individual tries to express a set of meanings. If they have constructions in their language that allow a meaning to be expressed then they use that. Otherwise they just make up a new language form. - A new learner then takes this set of utterances and tries to learn the underlying language. - The learner then becomes the speaker, and tries to express a new set of meanings to a new learner. This process is repeated over many generations, and after a time a compositional syntactic system emerges. Kirby used phrase structure rules and logical predicates and symbols to represent language, but Batali (1998) produced similar results using neural nets. Tony Belpaeme (2002): Factors influencing the origins of colour categories A previous expression-induction model of colour term evolution. A population of agents was created, each of which could learn colour categories using an adaptive network (a type of neural network). The agents try to communicate by attaching words to the colour categories – they try to use a word that discriminates a target colour from its context. - If communication is successful then the hearer will strengthen the association between the target colour and the colour word used. - If communication is unsuccessful, then the speaker will tell the hearer the correct colour. Periodically an agent will die and be replaced by a new agent, so as to simulate evolution over several generations. - Over time a coherent language evolves. - But the languages don’t conform to typological restrictions. A Bayesian Model of Colour Term Acquisition The model learns using Bayesian inference. - Bayesian Inference is a statistical procedure based on Bayes’ rule, which was proved by Bayes (1763). - Though Bayesian inference was developed much more recently. Bayesian inference allows us to calculate the probability of a hypothesis given some relevant data. - But we can only so this if we know (1) how likely each hypothesis is a priori. (2) How likely we would have been to observe the data if the hypothesis is correct. - If we know both (1) and (2), we can calculate exactly how probable each hypothesis is using Bayes’ rule. P(h) P(d | h) P(h | d) =---------------P(d) General consequences: - Hypotheses with the highest a priori probabilities also have the greatest a posteriori probabilities. - Hypotheses which accurately predict the data are more likely than those which don’t. Using Bayesian Inference as a Psychological Model In this case the a priori probabilities assigned to hypotheses will correspond to a person’s belief in how likely each possibility is before they begin learning. Data can be anything people observe from which they could learn. But why is it likely that people learn colour words with Bayesian inference? - It’s arguably the optimal way to learn. - Bayesian learning replicates much of the empirical evidence concerning colour terms. - There’s already some evidence to suggest that people are Bayesian from other computational models. Tenenbaum and Xu (2000) showed that a Bayesian model of learning the meanings of concrete nouns made very similar generalisations to those made by people. - So this provides empirical support for the proposal that people learn word meanings using Bayesian inference. Griffiths and Tenenbaum (2000) showed that people seem to use Bayesian inference when judging the frequency of periodic events from examples. What Evidence do Children use to Learn Colour Words? Children aren't taught language explicitly. They learn by observing other peoples speech. But to learn meanings they must be able to work out what a word was used to mean. This would be the particular colour a word was used to identify. So the input to learning must be examples of colours and the words that were used to identify those colours. Learning consists of generalising from these examples to the full range of colours which a colour word can be used to identify. The Conceptual Colour Space Physically light varies from red with the longest wavelengths, to blue with the shortest. But that's not how we perceive colour. Perceptually colour has a three dimensional structure. Hue Saturation Lightness red purple orange yellow blue turquoise yellow-green green At present the model is concerned only with the hue dimension. The Bayesian Model Children a priori assume that each word denotes a continuous range of colour, so a possible hypothesis is like this: extent of denotation of word All such ranges of the colour space are considered to be equally likely a priori. But we can't be sure that all the examples are accurate: Accurate examples must come within the hypothesis. Erroneous examples appear anywhere. The learner must decide on a probability which corresponds to how confident they are that each example is accurate (in the results reported here that’s 0.5). high probability hypothesis low probability hypothesis To work out just how likely it is that a colour word can denote any particular colour, we can just add up the probability of all the hypotheses which include that colour in their denotations. - This is called hypothesis averaging. - And the model is a Bayes’ Optimal Classifier. We can equate the probability that a word denotes a colour with the colours degree of membership in the colour category. - This will define a fuzzy set, where the degree of membership can vary between 1 (full membership) and 0 (not a member at all). Learning the English Colour Words degree of membership in colour category 1 0.9 0.8 0.7 RED 0.6 ORANGE YELLOW 0.5 GREEN BLUE 0.4 PURPLE 0.3 0.2 0.1 0 hue (red at left to purple at right) The fuzzy denotations of the English colour words after five examples of each. Prototypes Gradation of membership Fuzzy boundaries Learning Berinmo Colour Words degree of membership in colour category 1 0.9 0.8 0.7 0.6 Kel Mehi 0.5 Nol Wor 0.4 0.3 0.2 0.1 0 Hue (red at left, to purple at right) Denotations of Berinmo colour words after ten examples of each. Berinmo has 5 color words, but 'wap' only includes light colours, so doesn't appear on the graph. Green and blue are both included in the term 'nol' The dark term, 'kel' extends into much lighter colours, but only for purple hues. degree of membership in colour category Learning with Unreliable Evidence 1 0.9 0.8 5 accurate 0.7 5 accurate with 5 random 0.6 10 accurate 10 accurate with 10 random 0.5 First 5 Accurate Examples 0.4 First 5 Random Examples Next 5 Accurate Examples 0.3 Next 5 Random Examples 0.2 0.1 0 0 20 40 60 80 100 hue (red at left to purple at right) Learned denotations for English 'green' When 50% of the data is random noise: With 10 examples Prototype is roughly correct But the category boundary is very unclear With 20 examples Performance approaches that with only accurate examples. So the model can learn in realistic and not just idealised situations. Does the Model Sufficiently Constrain the Learnable Languages? - The model can learn real colour systems from natural languages. - But it can equally well learn colour term systems that don’t have any of the properties of those typically seen in real languages. 1 Degree of membership in colour category 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Colour (red at left, to purple at right) This colour term system: - doesn’t partition the colour space - and the term on the right doesn’t have prototype properties because the person has seen so many examples of it. So the acquisitional model can’t explain why colour term systems have these properties. Evolving Colour Term Systems Ten copies of the model were made, to simulate a community of ten speakers. Each person starts knowing one example of a colour word, but each person knows a different word. A speaker is chosen A hearer is chosen A colour is chosen The speaker says the word which they think is most likely to be a correct label for the colour based on all the examples which they have observed so far. The hearer hears the word, and remembers the corresponding colour. Occasionally older people will die and be replaced by new people. One time in a thousand the speaker will make up a completely new word. Evolutionary Results A coherent colour term system emerges which is shared by all people, although the exact ranges of colour which each person uses each word to denote vary a bit. Degree of membership in colour category The following graph shows the denotations learned by one person in a typical simulation. On average people observed 60 colour words during their life, and this person was near to the end of his lifespan. 1 0.8 0.6 0.4 0.2 0 Colour (red at left, to purple at right) - The colour words partition the colour space. - And each colour word has prototype properties. But what about Typological Patterns? MacLaury (1999) – colour terms are predominantly focussed on certain colours (data from World Colour Survey). Heider (1972) found that focal colours were named more rapidly than non-focal colours. When a Munsell chip was shown to a subject for 5 seconds, and then 30 seconds later the subject was asked to select the chip s/he had been shown from an array of Munsells, subjects were more accurate for focal colours than for non-focal colours. - This was true both for American undergraduates and monolingual Dani speakers, even though Dani only has two basic colour terms. When Dani were taught names for colours, they made more mistakes on learning labels for non-focal colours than for focal colours. So, focal colours are easier to remember, and are more salient than non-focal colours. Neurophysiologically Determined Foci There are six elemental sensations, corresponding to focal white, black, red, yellow, green and blue. - This is supported by neurophysiological studies, measuring the response rates of cells in the retina of macaque monkeys using electrodes. (Although Kay and Maffi (1999) claim that those studies don’t put the foci in quite the right places.) - And by psychological studies, showing that some colours are seen as special. - And also by studies which show clusters of colour term foci on certain Munsell chips. There is some evidence that focal green and blue appear more similar to each other than do red and yellow, yellow and green, or blue and red (MacLaury, 1997). Adding Learning Biases A new version of the model was created in which people were more likely to remember focal examples than non-focal ones. - All focal colours would be remembered. - But only 5% of non-focal colours were remembered. Changes were made to the model so it would allow for the uneven frequencies of colour examples. Firstly the foci were evenly spaced, and 20 simulations were carried out in each of the conditions of people remembering on average 20, 30, 60, 90, and 120 colour examples during their lifetimes. For the purposes of the analyses, only terms for which a person had seen at least four examples were included. - A colour term was considered to name all those colours for which the person considered it to be the best colour word out of all those that they knew. Number of colour terms focussed on this colour 180 160 140 120 100 80 60 40 20 0 Colour (red at left to purple at right). The colour terms are predominantly focussed on the innate foci. Number of colour terms focussed on this colour 50 40 30 20 10 0 Colour (red at left to purple at right) This is the same graph, but for simulations without innate foci – there are no colours which are predominantly chosen as colour term prototypes. Number of colour terms which have their prototypes and innate foci in these positions 400 350 300 250 200 150 100 50 0 Category prototype Innate focus Shows whether the prototype and foci are in the left, middle, right or intermediate fifths of the colour terms range The category prototype tends to be in the middle of the term, as does the innate focus. Number of colour terms with prototype and innate foci in these positions 400 350 300 250 200 150 100 50 0 Category prototype Innate focus prototype and innate foci location With no innate foci, the prototype still tends to be in the middle of the term, but the foci locations can be anywhere. Average number of colour terms known by an adult 12 10 8 6 4 With innate foi Without innate foci 2 0 30 60 90 120 All Average number of examples remembered during a person's lifetime People who use colour terms more often tend to have more colour terms in their languages. - Though adding foci seems to reduce the number of colour terms. But why do we see typological patterns? Languages have red, yellow, green and blue terms before any derived terms emerge (though quite often derived terms will emerge before blue and green split). We should see yellow-red composites in systems with only two terms, and green-blue terms should be seen very frequently. We should see some yellow-green terms, but no red-blue ones. The only three colour composite should be yellow-green-blue. The most common derived term should be purple. orange should be less frequent, but can appear before purple. We shouldn’t see lime or turquoise terms at all. Changing Innate Foci Locations The hues are numbered from 1 to 40. - Red was placed at hue 5. - Yellow at 17 - Green at 24 - Blue at 30 The simulations were repeated, again 20 runs were done in each condition, plus 20 extra ones with a life expectancy of 150. D D D D D*D D D D C C C C C C C C*C C A A A A A*A A B B B B*B B B B B B B D D D D D D D D*D D D D D D A A C C C C*C A A A A A A*A A A B B B*B B B B B B B B B D B D D D D*D D D D D A A C C C C C*C C A A A A A*A A A B B B*B B B B B B B B*B B D D D D D*D D D D D D C C C C C C*C C C C A A A A A*A B B B*B B B B B D D D D D The languages were then analysed as follows - Only people over half the average age were included. - The name that they would give to each hue was found (as shown above – each row is one person). - Each term was then classified. If it contained innate foci, it would be classed as red, yellow-green, blue-red-yellow etc. - Terms which were in between innate foci were classed as orange, lime, turquoise or purple. - If people disagreed, the classification supported by the most people was chosen, and, if this was tied, a classification containing fewer foci was preferred over one with more. Frequencies of Term Types Red 116 Yellow 97 Green 86 Blue 90 Red-Yellow 4 Yellow-Green 4 Green-Blue 16 Blue-Red 1 Yellow-Green-Blue 11 Green-Blue-Red 2 Purple 34 Orange 25 Lime 4 Turquoise 4 System Type Frequencies Only system types which occur more than once are included. 6 colour systems: orange, purple, red, yellow, green, blue 7 5 colour systems: purple, red, yellow, green, blue 8 orange, red, yellow, green, blue 7 4 colour systems: red, yellow, green, blue 42 3 colour systems: red, yellow, green-blue 9 red, blue, yellow-green 2 red, blue, yellow-green-blue 2 2 colour systems: red, yellow-green-blue 8 red-yellow, green-blue 4 yellow, green-blue-red 2 Only the systems with green-blue-red categories are unattested. Grue Category Foci Where is the prototype in green-blue composite terms? Is it sometimes mainly in green, sometimes mainly in blue, and sometimes there is no strong bias either way? There were 16 grue categories in the simulations: - In 8, at least 75% of people put the focus nearest blue. - In 1, at least 75% of people put the focus nearest green. - In seven there was no clear majority as to which focus was preferred. Sometimes a person would choose both a green and blue prototype as almost equally good examples of the category. - But usually the innate focus nearest to the centre of the category is chosen as the best example. Is the Evolutionary Hypothesis Supported? Do languages add colour terms but never lose them? - With an average of 90 examples being observed during a person’s lifetime, after 18000 iterations (20 lifetimes) we have red, yellow, green, blue and green-blue. - Then from 27000 till 72000 we have red, yellow, green, blue. - At 81000 purple is added. - But at 90000 it’s been lost. If we reduce the life expectancy so people see an average of 60 examples, we still get mainly red, yellow, green, blue systems, but at one stage we briefly get a red, yellow, green-blue system. So, it seems that the number of colour terms depends mainly on how often people use colour terms during their lifetimes. But we should expect random drift, where sometimes colour terms will be gained, and sometimes lost. So maybe the evolutionary hypothesis only holds when societies are increasing in technological complexity. What about the Future? Can we have more than 11 Basic Colour Terms? If people use colour words more often in the future, will new terms become basic, such as turquoise and lime? New simulations were done, were people would observe on average 180, 210 or 240 colour examples during their lifetimes. We see 74 orange, 30 lime, 24 turquoise and 114 purple. There are 60 red, 63 yellow, 60 green and 63 blue. There are also 2 yellow-green and one each of green-blue and blue-red and yellow-green-blue. So we do see lime and turquoise terms emerging. But if we look at overall systems, the most common system is orange, purple, purple, red, yellow, green, blue. - So maybe we’re more likely to get another basic purple term before lime or turquoise become basic. There aren’t many systems with lime or turquoise which don’t have at least two purple or orange terms. References Batali J. (1998). Computational simulations of the emergence of grammar. In James R. Hurford, Michael Studdert-Kennedy and Chris Knight (Eds.) Approaches to the Evolution of Language: Social and cognitive biases. Cambridge: Cambridge University Press. Bayes (1763). An Essay Towards Solving a Problem in the Doctrine of Chances. Philosophical Transactions, Volume 53, pages 370-418. Belpaeme, Tony (2002) Factors influencing the origins of colour categories. Ph.D. Thesis, Artificial Intelligence Lab, Vrije Universiteit Brussel. Berlin, B. & Kay, P. (1969). Basic Color Terms. Berkeley: University of California Press. Chomsky, N. (1972). Language and Mind. New York, NY: Harcourt Brace Jovanovich Inc. Dowman, M. (2001). A Bayesian Approach to Colour Term Semantics. Lingu@scene, Volume 1. Griffiths, T. L. & Tenenbaum, J. B. (2000). Teacakes, Trains, Taxicabs and Toxins: A Bayesian Account of Predicting the Future. In L. R. Gleitman & A. K. Joshi (Eds.) Proceedings of the Twenty-Second Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates. Heider, E . R. (1972). Universals of Color Naming and Memory. Journal of Experimental Psychology, 93:10-20. Hurford, J. R. (1987). Language and Number The Emergence of a Cognitive System. New York, NY: Basil Blackwell. Kay, P. & Maffi, L. (1999). Color Appearance and the Emergence and Evolution of Basic Color Lexicons. American Anthropologist, Volume 101, pages 743-760. Kirby, S. (1999) Learning, Bottlenecks and the Evolution of Recursive Syntax, in Briscoe, Edward, Eds. Linguistic Evolution through Language Acquisition: Formal and Computational Models. Cambridge University Press. MacLaury, R. E. (1997). Color and Cognition in Mesoamerica: Construing Categories as Vantages. Austin, Texas: University of Texas Press. Tenenbaum, J. B. & Xu, F. (2000). Word Learning as Bayesian Inference. In L. R. Gleitman & A. K. Joshi (Eds.) Proceedings of the Twenty-Second Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates.