A Corpus-Linguistic Approach to the colour naming debate Luigi SQUILLANTE Sapienza – Università di Roma (Italy) Stiftung Universität Hildesheim (Germany) Abstract Since the publication of the famous work by Berlin & Kay (1969) on universality and evolution of basic colour terms, several studies on colours have tried to support or reject the hypothesis that language is shaped by universals of perception. In fact, colours are a privileged subject in this issue because of their twofold nature of both results of biological perceptions and lexical items of language. This work presents a new contribution to the so called “colour naming debate” from a point of view that, as far as we know, has not been taken yet into account in studies on this subject: the phraseological perspective. Our analysis, carried out on a large set of Italian nominal multiword expressions (MWEs) including colour terms, shows that colours are not equally distributed in this kind of expressions and, if only idiomatic MWEs are considered, the order arising from the quantitative distribution of colour terms strongly reproduces Berlin & Kay's hierarchy. In this sense our results support the idea that perception can influence language and show how also phraseological and corpusbased studies can shed new light on this subject. 1. Introduction It is well known that different languages develop different lexical categories in order to refer to certain areas of the perceivable spectrum that are commonly called colours. The focus on the difference between the terms used to refer to colours in all the languages is not new to linguistic studies and can be brought back to the nineteenth century, when works like those by Gladstone (1858), Allen (1879) and Geiger (1880) first appeared. The initial debate focused on whether it was possible to infer an evolution in the human perception of hues from a philological analysis of the usage of colour terms in the human literature, from the ancient epic poems to nowadays. The answer to such a question was soon provided by the study of Magnus (1880), showing that lexical distinctions or the lack of terms for expressing certain colours did not seem to imply different or deficit perceptions in the speakers: the fact that some population had just one term to identify a hue, which another population was used to refer to by means of two or more words, was rather due to different ways of categorizing the same physical reality1. The investigation on the correlation between colour terms and perception was pursued by several studies during the twentieth century (Ray 1952, 1953; Conklin 1955; Nida 1959 among the others), all of which insisted on the arbitrary possibility of languages of segmenting the spectral continuum. However, it was not until the publication of the famous work by Berlin & Kay (1969) on universality and evolution of basic colour terms that this topic started to gain much attention from the scientific community, leading to what today can be defined as the "colour naming debate", spreading throughout the fields of anthropology, cognitive sciences, linguistics and philosophy. The great importance of Berlin & Kay's work is that it represents one of the most influential criticisms to the Sapir-Whorf hypothesis (e.g. Sapir, 1921:219; Whorf 1956 [1940]:212) and to linguistic relativism, which, during the first half of the twentieth century, had broadly dominated the research 1 This concept will be fully developed in general terms some years later by Saussure, with the notion of arbitrarity of signs. In the structuralist frame, Hjemslev (1968[1943]:57-58) will explicitly take the lack of biunivocal corrispondence between colour terms in English and Welsh as an example to show how languages choose arbitrarly how to categorise the same physical entities. approaches to social sciences. As Kay and Maffi (1999:744) underline, "with the ascendance in the 1920s, '30s and '40s of linguistic and cultural relativity [...] color came to be singled out as the parade example of a lexical domain in which the control of language over perception is patent [...]". Against this interpretation, according to which language shapes perception, Berlin & Kay proposed a universalistic point of view, holding that language is shaped by universals of perception. Their experiment on 98 different languages showed that there exists a simple rule defining a universal hierarchy for the appearance of the basic color terms in every language: [white, black] < red < [green, yellow] < blue < brown < [purple, pink, orange, grey]2. The hierarchy is developed on the base of the presence of the colour terms within the analyzed languages. In fact, according to the order above, it was seen that if a language has one term, then all the terms on its left are attested in the same language. The eleven colours included in the hierarchy were chosen according to several principles (Berlin & Kay, 1969:6) intended to define what an ideal basic colour term is3. The results of Berlin & Kay's experiment were discussed and tested in the ensuing years, leading to a strong polarization between universalists and relativists. As Kay & Maffi (ibid.) recall, psychologists tended to welcome and support by new empirical testings Berlin & Kay's findings (among the others Bronstein 1973a,b; Brown 1976, Collier 1976; Shepard 1992), while anthropologists raised doubts on methodological issues (e.g. Hickerson, 1971; Durbin, 1972; Collier 1973; Conklin 1973), especially those regarding the concept of "basic colour term" (Saunders 1995, 1997; Lucy 1996). However, empirical and theoretical considerations raised by the following studies produced several modification of the universal model originally proposed in 1969 (Kay and McDaniel, 1978; Kay and Maffi, 1999, among the others) which, although not invalidating the hierarchy shown above, led to different interpretations of the mechanisms governing the evolution of the appearance of colour terms. One of the main issue concerned the fact that "the evolutionary sequence that views the development of basic color-term lexicon [is] not [seen] as the successive encoding of foci, but as the successive differentiation of previously existing basic color categories 4" (McDaniel, 1974, recalled in Kay and McDaniel, 1978:640). Moreover "the Kay and McDaniel model emphasizes [...] the six primary colors of opponent theory (black, white, red, yellow, green, blue)" (Kay and Maffi, 1999:745). Although the colour naming debate is still active nowadays, in recent years the availability of new data and the evolution of complex numerical algorithms have shown new ways of testing the hypothesis, confirming the existence of universality in color naming systems (e.g. Baronchelli et al., 2010). 2. Colour terms and Multiword Expressions In 1879, when the colour naming debate was just at the beginning, Allen replied to Gladstone's opinions on the ambiguity of the use of colour terms in Homer's poems in these terms: "Mr. Gladstone tells us that they [the Homeric Greeks] could not have understood real colour by their apparent colour terms, because the words are used so loosely. Here, 2 3 4 Here "<" indicates implication. Square brackets indicate that any of the terms included can appear first, but all the terms included must be present before the appearance of the next term on the right. The principles force the basic term to be monolexemic, not semantically included in any other colour term (as in the case of crimson perceived as a shade of red), not collocationally restricted (as the case of blond) and psychologically salient. In case some term is a doubtful case of these criteria, it can be put through other restrictions, as it must not be a form generated by derivation from other colours or be something that indicates a colour but as well prototypical objects of that hue. While in the first interpretation the appearance of a new colour term would attest the need to recognise a new focal point in the spectral continuum, now the sequence is seen as a gradual differentiation of lexical macro-categories representing light-warm and dark-cool perceptions. green means green: there, it means fresh or young. [...] Do Englishmen never talk of green old age or Americans of green corn, which is really pale yellow? Is not red blood confronted with sangre azul and red wine with petit vin bleu? [...] In short, are not colour terms always vague, and are they not vaguer in the idealized language of poetry than anywhere else?" (Allen, 1879:267, cited in Berlin & Kay, 1969:137, italics added by the author). Although Allen's point was just to argue that one can not abstract the concept of colours from their figurative and sometimes very vague meanings (especially in poetry), all his examples mention colour terms appearing in what nowadays are generally referred to as multiword expressions (hereafter MWEs). MWEs are phenomena of preeminent interest in phraseology and contemporary corpus linguistics. They include a great variety of entities lying on a continuum between lexicon and syntax, whose typical features include morpho-syntactic fixedness, semantic restrictions, semantic unpredictability, non-grammatical constructions, conventionality and institutionalization. Their interpretation generally crosses the boundaries between words (Sag et al., 2002) and one of the most useful definition, able to comprehend a great number of phenomena, is that proposed by Calzolari, Fillmore et al. (2002:1934), according to whom a MWE is "a sequence of words that acts as a single unit at some level of linguistic analysis". The phenomenon of MWEs has been long studied in the linguistic tradition because of its relevance to every language. Despite their apparent anomalous behavior, MWEs are "an important and frequent phenomenon in human language" (Ramisch et al., 2010:1), as Sinclair (1991) definitely attested by the formulation of his famous idiom principle, which stated that idiomatic and morpho-syntactically restricted constructions are as normal and natural in discourse as free combinations. Apart from the great amount of theoretical works on MWEs developed within the major linguistic frameworks throughout the twentieth century, in recent years the computational approach to this phenomenon has become one of the dominant lines of research in this field. In fact, although none of the features cited above appears as a necessary and sufficient condition to attest the presence of a MWE, the components of this kind of entities exhibit a strong tendency to co-occur in texts more frequently than they separately appear with other words. This led to the development of several statistic approaches and association measures in order to identify, study and automatically extract MWEs from texts (just to mention some: Evert and Krenn, 2001; Evert, 2004; Kilgarriff, 2006; Ramisch et al., 2010). The great amount of structured textual data, available nowadays in large corpora, allows researchers to deepen the studies on MWEs in new testable and empirical ways. For example, starting from Allen's considerations cited above, one can study the role of colour terms appearing as components of MWEs. The present work, in fact, is intended to contribute to the colour naming debate from the phraseological point of view that, as far as we know, has not been examined yet in the studies on this subject. Since MWEs represent a very important linguistic phenomenon and colour terms can occur in such entities as part of the lexicon, it is reasonable to establish if there are preferences in the choice of colours in the creation of this kind of expressions. The study exposed below is focused on the Italian language, although it has potentially relevant crosslinguistic implications. 3. Data and Methodology In our work we consider the Italian equivalents of the eleven basic colour terms of Berlin & Kay's hierarchy (bianco, eng. white; nero, eng. black; rosso, eng. red; verde, eng. green; giallo, eng. yellow; blu, eng. blue; marrone, eng. brown; viola, eng. purple; rosa, eng. pink; arancione, eng. orange; grigio, eng. grey), plus the colour azzurro (eng. light blue) which in Italian, as well as in other non-Germanic languages such as Russian or Turkish, is considered to be distinct from blue (Philip, 2003:12). In addition violetto and arancio (two variants of purple and orange) are also taken into account for their potential competitiveness with their synonymicous forms. Our first reference is GRADIT (1999-2007), the most comprehensive lexicographic resource for the Italian language. This dictionary includes about 130.000 MWEs that have been selected according to one or more of the following criteria (De Mauro, 2005:88-89): - the existence of an unpredictable semantic addition to the meanings of the component words; - syntactic and/or lexical fixedness with respect to lexical or structural variations that would result in the loss of the idiomaticity of the expression; - significant presence in a specialized language, where MWEs typically form terminology. These criteria are able to include both typically figurative expressions generally called idioms and expressions in which the components are interpreted according to one of their basic sense, with no further unpredictable semantic addition (especially in the case of terminology). At the same time, this definition do not consider more flexible expressions like collocations, since GRADIT has not been developed as a combinatory dictionary. We start by extracting from GRADIT all the nominal MWEs in which a colour term appears as an adjective modifying the nominal head, obtaining a list of 943 entities (such as "scatola nera", eng. black box). In order to have a more complete resource to work on, we also consider MWEs from the Italian corpus PAISÀ (2012). The PAISÀ corpus is a large resource for the Italian language, composed of ca. 380.000 documents and 250 million tokens. It collects different types of texts extracted automatically from the web on the base of word pairs elicited from GRADIT and used as a seed list. Since it is morphosyntactically annotated, it allows for queries of combinations of part-of-speech categories. In order to consider MWE candidates we use several scripts of the computational tool mwetoolkit (Ramisch et al., 2010) to extract noun + adjective 5 combinations, using five statistical association measures provided by the tool (maximum likelihood estimator, pointwise mutual information, loglikelihood, Dice's coefficient, Student's t-score). Once we have sorted the candidates according to their scores, we consider only those containing the colour terms we analyze as adjectives (thus appearing as the second component of the candidates) and which appear among the top-500.000 candidates for each of the five association measures. Then we filter out manually false positives that do not satisfy the requirements of our definition, as well as MWEs also attested in GRADIT. At the end of this process we retrieve 99 new MWEs from PAISÀ, such that our material reaches a total amount of 1042 entities. This set of over 1000 MWE types provides wide coverage and can be considered a reliable and relatively complete set to represent our phenomenon. 4. First results Our analysis showed the following frequency order for the colour terms used in Italian nominal MWEs: bianco (white, ~22,2%), nero (black, ~20,6%), rosso (red, ~18,5%), verde (green, ~10,3%), giallo (yellow, 9,5%), grigio (grey, ~5,8%), azzurro (light blue, ~4,5%), blu (blue, ~3,7%), rosa (pink, ~3,4%), violetto (purple, ~1,1%), marrone (brown, ~0,3%), arancione (orange, ~0,2%), viola (purple, 0,1%), arancio (orange, 0%), as shown in details by Table 1. It is evident that only for the five most frequent colours the universal hierarchy is reproduced in an exact way, while for the remaining terms there is no evidence for a correspondence with the hierarchy6. 5 6 [noun + adjective] is the general structure for the unmarked noun phrase in Italian. It is interesting to note that combining the occurrences of the MWE types including blu and azzurro in a unique set (labeled as blue), we retrieve the universal hierarchy order up to the sixth colour. However, since our study focuses on Colour term # of MWEs % of MWEs Bianco 231 22,17 abete bianco, bandiera bianca, camice bianco Nero 215 20,63 caffè nero, cintura nera, lavoro nero Rosso 193 18,52 falco rosso, filo rosso, globulo rosso Verde 107 10,27 anni verdi, pollice verde, tavolo verde Giallo 99 9,50 bocca gialla, fiamme gialle, melone giallo Grigio 60 5,76 corpo grigio, lupo grigio, sostanza grigia Azzurro 47 4,51 alga azzurra, pesce azzurro, telefono azzurro Blu 38 3,65 auto blu, fifa blu, sangue blu Rosa 35 3,36 cronaca rosa, fiocco rosa, quarzo rosa Violetto 11 1,06 camaleonte violetto, tartufo violetto Marrone 3 0,29 cintura marrone, lemure marrone Arancione 2 0,19 bandiera arancione, contrassegno arancione Viola 1 0,10 gallinella viola americana Arancio 0 0,00 / 1042 100,00 Totale Examples Table 1: Distribution of colour terms into Italian nominal MWEs. Colour term PAISÀ ItTenTen Repubblica Bianco 27.914 324.273 51.512 Nero 26.836 339.472 71.154 Rosso 20.156 230.685 39.051 Blu 8.046 90.023 16.938 Verde 6.749 171.839 24.890 Giallo 6.425 70.434 14.622 Azzurro 6.392 86.214 18.555 Grigio 3.780 2.399 10.258 Viola 2.635 17.257 2.373 Rosa 1.797 74.392 10.876 Arancione 1.681 12.103 1.003 Marrone 1.174* 11.547* 1.088* Arancio 268* 3.689* 65* Violetto 181 3.121 414 Table 2: Number of occurrences of basic colour terms (tagged as adjectives) in three Italian corpora. The occurrences marked with an asterisk are just a projection based on a set of 100 manually processed random samples. In fact, “marrone” and “arancio” were tagged only as nouns in the three corpora, although they obviously appear as adjectives as well. the salience of the colour terms used to refer to hues and not on the hues themselves, this result is not of preeminent interest. This result could already be of some interest, showing that the most cognitively salient colours of Berlin & Kay's experiment seem to be preferred also in the choice of constructing MWEs. In this way perception seems to influence the speakers' choices in creating new expressions that become particularly sedimented in language. However a consideration must be done about the fact that the result shown above could also be due to the frequency distribution in the general use of colour terms in Italian. Chiari (2012), on the base of empirical evidences, suggests that Zipf's law on the relation between frequency and meanings of a word7 (Zipf, 1949) can be expanded to MWEs as well, in the sense that there exists a proportional relation between the frequency of a word and the number of MWEs it forms. Nevertheless, Chiari's empirical observation considers such relation only for the nominal heads of MWEs, while in our work colour terms only appear as modifiers. In order to shed light on such a question, the number of occurrences of every basic colour term in several Italian corpora are considered. Apart from the PAISÀ corpus itself, the ItTenTen (2010) and the Repubblica (2004) corpora were chosen to perform the check. The first of the two new resources is built by processes of web crawling, comprises about 3.1 billion tokens and it is available inside the Sketch Engine interface (Kilgarriff et al., 2004). The second corpus is a collection of articles from one of the major Italian newspaper, and includes about 380 million tokens. Both corpora are morphosyntactically annotated so that queries with POS-categories are allowed. We choose to search only for the occurrences of the basic colour terms classified as adjectives. The results are shown in Table 2. Table 3 shows the different orders for the basic colour terms based on the number of occurrences found in each of the analyzed corpora. The adaptation of Zipf's law to MWEs seems more or less supported MWE frequency order PAISÀ ItTenTen Repubblica Bianco Bianco Nero Nero Nero Nero Bianco Bianco Rosso Rosso Rosso Rosso Verde Blu Verde Verde Giallo Verde Blu Azzurro Grigio Giallo Azzurro Blu Azzurro Azzurro Rosa Giallo Blu Grigio Giallo Rosa Rosa Viola Viola Grigio Violetto Rosa Arancione Viola Marrone Arancione Marrone Marrone Arancione Marrone Arancio Arancione Viola Arancio Violetto Violetto Arancio Violetto Grigio Arancio Table 3: Comparison between the frequency order for basic colour terms found from MWEs and in three Italian corpora. 7 Zipf's law about frequency/meanings relation states that words occurring with higher frequencies are more generic and thus have a higher number of meanings (senses) with respect to less frequent ones. for the first three colours8; verde appears as the fourth term in two of the three corpora, according to the MWE order, while the remaining terms do not show any significant correspondence. On one hand, then, the order obtained from MWE type frequencies seems to be relevant independently from the adapted Zipf's law only from giallo on; on the other hand the lower part of the chart doesn't show a correspondence with the universal hierarchy. It is also possible that, for the more frequent colour terms, both universals of perception and the Zipf's law act in order to make them appear in a great number of MWEs, but at this point we are not able to discriminate between the two causes. 5. Colour terms and idioms An interesting result, however, comes out if we take into account only those MWEs that can be classified as idioms, that is when one of the components is used in a metaphorical or metonymical way or when the global meaning includes an unpredictable semantic addition. In this way most of the terminology of specialized languages is ruled out (e.g. "alga rossa", eng. red algae; "abete bianco", eng. silver fir) because in these cases colours are mostly used just to denote the actual hue they refer to. Such process of filtering only saves 295 of the original 1024 MWEs (see annex below), producing the following distributions of colour terms: nero (~25.1%), bianco (20%), rosso (~16.6%), verde (~10.5%), giallo (~7.1%), blu (~6.4%), azzurro (~5.4%), rosa (~5.1%), grigio (~2.7%), arancione (~0.7%), marrone (~0.3%), arancio (0%), viola (0%), violetto (0%). The results are shown in details in Table 4. It is possible to note two things: (i) this time the universal hierarchy is reproduced in an exact way, except for marrone which seems to be shifted in the last group of the hierarchy; (ii) azzurro can also be included in the last group. Colour term # of MWEs % of MWEs Nero 74 25,08 aristocrazia nera, pecora nera, uomo nero Bianco 59 20,00 acque bianche, calore bianco, morte bianca Rosso 49 16,61 basco rosso, bollino rosso, croce rossa Verde 31 10,51 carta verde, onda verde, numero verde Giallo 21 7,12 febbre gialla, pagine gialle, romanzo giallo Blu 19 6,44 banana blu, caschi blu, colletto blu Azzurro 16 5,42 arma azzurra, parco azzurro, pesce azzurro Rosa 15 5,08 balletto rosa, foglio rosa, punto rosa Grigio 8 2,71 corpo grigio, eminenza grigia, materia grigia Arancione 2 0,68 bandiera arancione, contrassegno arancione Marrone 1 0,34 cintura marrone Arancio 0 0,00 / Viola 0 0,00 / Violetto 0 0,00 / 295 100,00 Totale Examples Table 4: Distribution of colour terms into Italian nominal idioms. 8 Although the order of bianco and nero seems unstable, Table 2 shows that their number of occurrences is very close in terms of magnitude. The occurrences for red, instead, appear quite well separated from the first two terms. With regard to observation (i), the possibility of downgrading the brown colour term is not so problematic since already Kay & McDaniel's modification of the original universal model underlined the preeminence of only white, black, red, green, yellow and blue as fundamental categories in the partition of the perceivable spectrum. On the other hand, conclusion (ii) is fully reasonable and in some way expected: first, azzurro is a derivation from blue in the same way as pink derives from red, and one expects the recourse to such terms, which are just subtle hues of more definite and fundamental colours, after that these are already developed and available; secondly, and most important, azzurro can be seen as one more basic term that a language like English has not developed yet: in this way, if the English-based hierarchy is valid, it can not appear anywhere else but in the last group. The choice of considering idioms only is grounded in studies related to the cognitive theory of metaphor (e.g. Lakoff & Johnson, 1980), as the work by Casadei (1996) on the Italian language, which leads back the interpretation of both idioms and metaphors to cognitive schemes and physical/perceptual experiences. One of the underlying hypotheses in such works is that idioms (and metaphors, in general) can be seen as the result of the speaker's need for expressing abstract concepts in terms of concrete and perceivable elements that can be related to our senses. In our case, we can add that idioms may represent one of the phenomena in language that reflect unsupervised and instinctive links between cognitive schemes and linguistic production in order to express, in the case of colours, more complex concepts via the sense of sight9. Starting from these considerations we can suppose that some colours are more likely to be chosen when expressing figurative meanings because of their cognitive salience in representing certain abstract states or features10. At the same time, the different levels of success for basic colours to be institutionalized into idioms can provide a proof of their different cognitive relevance regardless of the existence of any explicit metaphor based on them (which Casadei, 1996:262, is not able to recognize apart from those for white and black) or conventional figurative meanings. In order to throw light on this point we consider the figurative meanings for each basic colour term that are attested in GRADIT. Table 5 shows the number of meanings concerning colour adjectives that do not refer to the primary denotative meaning indicating their prototypical hue 11. It is possible to see that there is no correlation between the number of institutionalized conventional meanings for the colours and the number of idioms produced. The case of blu is the most interesting: although there seems to be no figurative meaning conventionally associated with this colour, it appears anyway in the sixth rank of the chart in Table 4 with 19 idiomatic MWEs. In this sense, also the hypothesis that the number of idioms including a specific colour can depend on how many conventional figurative meanings are associated with the colour is falsified. Thus, the fact that the frequency order of the idiom types follows Berlin & Kay's hierarchy can provide further support to the univeralistic hypothesis. 6. Conclusion and future works This work has presented a new contribution to the colour naming debate from the phraseological perspective. The analysis on the presence of basic colour terms in Italian nominal MWEs has shown that colour terms are not equally distributed in this kind of expressions, and the frequency ordering of idiomatic MWEs strongly reproduces the order of the universal hierarchy first proposed by Berlin & 9 In this sense it is useful to consider the example of foglio rosa (lit. pink sheet) which is an Italian certificate printed on pink paper that allows to practise driving a car before obtaining a driving license. The institutionalization of such expression shows how speakers tend to express the abstract meaning of the document with a reference to its colour: a feature directly connected to the sense of sight. 10 With regard to the cognitive status of the opposition of black and white Casadei (1996:264) points out Lakoff's opinion, according to which white is related to positive meanings and black to negative ones because of the intuitive experience that darkness (black) implies danger, while light (white) is connected to visibility and safety. 11 To be more explicit, in the case of bianco, for example, the meaning considered are: (i) pale, (ii) clear, (iii) covered by snow, (iv) typical of europoid races (in the case of skin), (v) clean, (vi) pure, (vii) blank, (viii) typical of a christian association, (ix) typical of antirivolutionary movements; while the excluded meaning is "of the colour of snow or milk". Colour term # of figurative meanings (adj.) Nero 12 Bianco 9 Verde 6 Grigio 5 Rosso 4 Giallo 3 Arancione 2 Azzurro 2 Rosa 1 Arancio 0 Blu 0 Marrone 0 Viola 0 Violetto 0 Table 5: Number of figurative meanings attested in GRADIT for the Italian basic colour terms. Kay. In general we can attest that the distribution of colour terms in Italian idioms cannot be exhaustively explained neither on the base of the Zipf's law on frequency and meanings adapted to MWEs, nor by the consideration that a higher number of conventional figurative meanings for some colour imply a higher number of idioms with the same colour. The fact that the order arising from the distribution of colour terms into idioms follows Berlin & Kay's hierarchy can thus be an additional proof to their conclusions. The idea that colour terms appear into nominal idioms according to the universal hierarchy can also suggest that perceptual preferences related to colours can be seen not only in an evolutionary or crosslinguistic perspective, but also in the linguistic uses within the same language. Finally this analysis has shown how also phraseology and corpus-based studies can shed new light on the subject. Future works on this line of research can include the extension of the set of MWEs to those including basic colour terms as the nominal head (e.g. “azzurro cielo”, eng. sky blue) or to other grammatical categories such as verbal or adverbial MWEs (e.g. "vedere rosso", eng. to see red; "essere al verde", lit. "to be at green" meaning to be without any money; "di punto in bianco", lit. "of point in white" meaning all at once). Moreover it is desirable to compare this kind of results in a cross-linguistic perspective, especially between unrelated languages. Annex – List of idioms by colour term Nero: abito nero, acque nere, Africa nera, angelo nero, anima nera, aristocrazia nera, bandiera basco nero, bestia nera, borsa nera, borsaro nero, brigate nere, buco nero, caffè nero, camicia carne nera, cintura nera, continente nero, corpo nero, cravatta nera, cronaca nera, effluente eversione nera, febbre nera, fiamme nere, fumarola nera, fumata nera, gabinetto nero, giacca giornata nera, giovedì nero, goccia nera, guelfo nero, humor nero, irraggiamento nero, lavoro libro nero, lista nera, luce nera, magia nera, maglia nera, male nero, maniera nera, mano nera, nera, nero, nera, nero, nera, marciume nero, marea nera, mercato nero, messa nera, morbo nero, morte nera, musica nera, nobiltà nera, numero nero, onda nera, oro nero, pane nero, papa nero, pecora nera, peste nera, pietra nera, polvere nera, pozzo nero, punto nero, scatola nera, settembre nero, specchio nero, tavola nera, tavoletta nera, testa nera, testina nera, umore nero, umorismo nero, uomo nero, vedova nera. Bianco: acque bianche, albero bianco, arma bianca, arte bianca, bandiera bianca, bianca signora, caffè bianco, camice bianco, cappello bianco, carne bianca, Casa Bianca, circo bianco, clown bianco, colletto bianco, cravatta bianca, cronaca bianca, effluente bianco, elettrodomestici bianchi, fratello bianco, frittura bianca, fumata bianca, globulo bianco, golpe bianco, infarto bianco, libro bianco, luce bianca, lupara bianca, magia bianca, mal bianco, materia bianca, matrimonio bianco, monte bianco, morte bianca, nana bianca, nota bianca, notte bianca, omicidio bianco, pan bianco, perdite bianche, pizza bianca, risultato bianco, rumore bianco, scheda bianca, sciopero bianco, semestre bianco, serie bianca, settimana bianca, sostanza bianca, strada bianca, striscia bianca, telefoni bianchi, terrore bianco, treno bianco, tuta bianca, vedova bianca, voce bianca. Rosso: ala rossa, armata rossa, bandiera rossa, basco rosso, biennio rosso, bollino rosso, brigate rosse, brigatista rosso, calore rosso, camicia rossa, carne rossa, cartellino rosso, clausola rossa, code rosse, croce rossa, debito rosso, disco rosso, febbre rossa, fiamme rosse, filo rosso, gamba rossa, gambe rosse, gambi rossi, gigante rossa, giubba rossa, globulo rosso, guardia rossa, infarto rosso, khmer rosso, libretto rosso, linea rossa, macchia rossa, mal rosso dei suini, mezzaluna rossa, nana rossa, nonna rossa, numero rosso, papa rosso, partito rosso, passaporto rosso, perdite rosse, polpa rossa, punto rosso, serie rossa, soccorso rosso, stella rossa, tappeto rosso, telefono rosso, toghe rosse. Verde: anni verdi, archeoastronomia verde, balletto verde, basco verde, benzina verde, berretto verde, biglietto verde, bollino verde, camicia verde, carta verde, croce verde, disco verde, fiamme verdi, libro verde, lira verde, maggese verde, maglia verde, moneta verde, numero verde, onda verde, pasdaran verde, pollice verde, polmone verde, potatura verde, raggio verde, regime verde, tappeto verde, tavolo verde, treno verde, valuta verde, zona verde. Giallo: bandiera gialla, bocca gialla, cartellino giallo, febbre gialla, fiamme gialle, fumata gialla, maglia gialla, marciume giallo, morbo giallo, nana gialla, oro giallo, pagine gialle, pan giallo, pericolo giallo, pioggia gialla, romanzo giallo, signorina gialla, sindacato giallo, stampa gialla, stella gialla, striscia gialla. Blu: auto blu, bambino blu, banana blu, basco blu, bollino blu, casco blu, colletto blu, fifa blu, gigante blu, luna blu, macchia blu, morbo blu, parco blu, sangue blu, scettico blu, striscia blu, tuta blu, uomo blu, zona blu. Azzurro: arma azzurra, camicia azzurra, croce azzurra, fiamme azzurre, maglia azzurra, malattia azzurra, morbo azzurro, nastro azzurro, parco azzurro, partito azzurro, pesce azzurro, pietra azzurra, principe azzurro, sangue azzurro, signorina azzurra, telefono azzurro. Rosa: balletto rosa, bollino rosa, cartolina rosa, colletto rosa, cronaca rosa, dente rosa di Mummery, fiocco rosa, foglio rosa, maglia rosa, marciume rosa, punto rosa, quote rosa, romanzo rosa, salsa rosa, telefono rosa. Grigio: corpo grigio, eminenza grigia, lettera grigia, letteratura grigia, marciume grigio, materia grigia, mercato grigio, sostanza grigia. Arancione: bandiera arancione, contrassegno arancione. Marrone: cintura marrone. REFERENCES ALLEN, G. (1879) - The Colour-Sense. London, Trubner and Company. BARONCHELLI, A., GONG, T., PUGLISI, A. & LORETO, V. (2010) - Modeling the emergence of universality in color naming patterns. Proceedings of the National Academy of Sciences of the United States of America. 107(6):2403-2407. BERLIN, B. & KAY, P. (1969) - Basic Color Terms: Their Universality and Evolution, University of California Press. BORNSTEIN, M. H. (1973a) - The psychophysiological component of cultural difference in color naming and illusion susceptibility. Behavioral Science Notes. 1:41-101. BORNSTEIN, M. H. (1973b) - Color vision and color naming: A Psychological hypothesis of cultural difference. Psychological Bulletin. 80:257-285. BROWN, R. W. (1976) - Reference. Cognition. 4:125-153. CALZOLARI, N., FILLMORE, C., GRISHMAN, R., IDE, N., LENCI, A., MACLEOD, C. & ZAMPOLLI, A. (2002) - Towards best practice for multiword expressions in computational lexicons. Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002). Las Palmas, Canary Island. 1934-40. CASADEI, F. (1996) - Metafore ed espressioni idiomatiche. Uno studio semantico sull'italiano. Roma, Bulzoni Editore. CHIARI, I. (2012) - Collocazioni e polirematiche nel lessico musicale italiano. In R. Nikodinovska (ed.). "Lingua, letteratura e cultura italiana". Atti del convegno Internazionale 50 anni di studi italiani, Phylology Faculty "Blaze Koneski", Skopje. 165-190. COLLIER, G. A. (1973) - Review of Basic Color Terms. Language. 49:245-248. COLLIER, G. A. (1976) - Further evidence for universal color categories. Language. 52:884-890. CONKLIN, H. C. (1973) - Color categorization: Review of Basic Color Terms, by Brent Berlin and Paul Kay. Language. 75:931-942. CONKLIN, H. C. (1955) - Hanunóo Color Categories. Southwestern Journal of Anthropology. 11:339344. DE MAURO, T. (2005) - La fabbrica delle parole, UTET. DURBIN, M. (1972) - Review of Basic Color Terms. Semiotica. 6:257-278. EVERT, S. (2004). The Statistics of Word Cooccurrences: Word Pairs and Collocations. Stuttgart, University of Stuttgart. EVERT, S. & KRENN, B. (2001) - Methods for the qualitative evaluation of lexical association measures. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. Toulouse, France. 188-95 . GEIGER, L. (1880) - Contributions to the History of the Development of the Human Race. London, Tubner and Company. GLADSTONE, W. E. (1858) - Studies on Homer and the Homeric Age. London, Oxford University Press. GRADIT (1999-2007) Grande Dizionario Italiano dell'Uso, a cura di T. De Mauro, UTET. HICKERSON, N. (1971) - Review of Berlin and Kay (1969). International Journal of American Linguistics. 37:257-270. HJEMSLEV, L. (1968) [1943] - I fondamenti della teoria del linguaggio. Introduction and translation by Giulio C. Lepschy. Torino, Einaudi. Originally published as "Omkring sprogteoriens grundlaeggelse", Copenhagen. ITTENTEN (2010) - Italian Web Corpus available at Sketch Engine. www.sketchengine.co.uk. KAY, P. & MAFFI, L. (1999) - Color Appearance and the Emergence and Evolution of Basic Color Lexicons. American Anthropologists. 101:743-760. KAY, P. & MCDANIEL, C. K. (1978) - The linguistic significance of the meanings of basic color terms. Language. 54:610-646. KILGARRIFF, A., RYCHLY, P., SMRZ, P. & TUGWELL, D. (2004) - The Sketch Engine. Proceedings of EURALEX 2004. Lorient, France. 105-116. KILGARRIFF, A. (2006). Collocationality (and how to measure it). Proceedings of the 12th EURALEX International Congress. E. Corino, M. C. and C. Onesti. Torino, Edizioni dell’Orso, Alessandria. 997-1004. LAKOFF, G. & JOHNSON, M. (1980) - Metaphors we live by. Chicago, The University of Chicago Press. LUCY, J. A. (1996) - The scope of linguistic relativity. In J. J. Gumperz and S. C. Levinson (eds.). "Rethinking Linguistic Relativity". Cambridge University Press. MAGNUS, H. (1880) - Untersuchungen über den Farbensinn der Nâturvölker. Jena, Fraher. NIDA, E. A. (1959) - Principles of translation as exemplified by Bible translating. In Reuben A. Brower (ed.). "On Translation". Cambridge, Harvard University Press. pp. 11-31. PAISÀ (2012) - Corpus dell'Italiano, realizzazione comune dell'Università di Bologna (S. Scalise, C. Borghetti), CNR Pisa (V. Pirrelli, A. Lenci, F. Dell'Orletta), Accademia Europea di Bolzano (A. Abel, C. Culy, H. Dittmann, V. Lyding), Università di Trento (M. Baroni, M. Brunello, S. Castagnoli, E. Stemle), www.corpusitaliano.it. PHILIP, G. S. (2003) - Collocation and Connotation: A corpus-based investigation of Colour Words in English and Italian. Birmingham, University of Birmingham. RAMISCH, C., VILLAVICENCIO, A. & BOITET, C. (2010) - mwetoolkit: a Framework for Multiword Expression Identification. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valetta, Malta. RAY, V. F. (1952) - Techniques and Problems in the Study of Human Color Perception. Southwestern Journal of Anthropology. 8:251-259. RAY, V. F. (1953) - Human Color Perception and Behavioral Response. Transactions. New York Academy of Sciences (ser. 2). 16:98-104. REPUBBLICA (2004) - Corpus dell'italiano. Descripted in M. Baroni, S. Bernardini, F. Comastri, L. Piccioni, A. Volpi, G. Aston, M. Mazzoleni (2004) - "Introducing the la Repubblica Corpus: A large, annotated, TEI(XML)-compliant corpus of newspaper Italian. .Proceedings of LREC 2004. SAG, I., BALDWIN, T., BOND, F., COPESTAKE, A. & FLICKINGER, D. (2002) - Multiword expressions: A pain in the neck for NLP. Proceedings of the 3rd CICLing (CICLing-2002), vol. 2276/2010 of LNCS, Mexico City, Mexico, 1-15. SAPIR, E. (1921) - Language. New York, Harcourt, Brace. SAUNDERS, B. (1995) - Disinterring Basic Color Terms: a study in the mystique of cognitivism. History of the Human Sciences. 8 (7):19–38. SAUNDERS, B. (1997) - Are there non-trivial constraints on colour categorization? Behavioral and Brain Sciences. 20:167-228. SHEPARD, R. (1992) - The perceptual organization of colors. In J. Barkow, L. Cosmides and J. Tooby (eds.). "The Adapted Mind". Oxford, Oxford University Press. SINCLAIR, J. (1991) - Corpus, Concordance, Collocation. Oxford, Oxford University Press. WHORF, B. L. (1956) [1940] - Science and Linguistics. In John B. Carroll (ed.). "Language, Though and Reality: The Collected Papers of Benjamin Lee Whorf". Cambridge, Massachussetts: MIT Press. Originally published in Technology Review. 42:229-231, 247-248. ZIPF, G. K. (1949) - Human Behavior and the Principle of Least Effort. Addison-Weasly Press.