What is linguistics? The scientific study of human language, including: Phonetics (physical nature of speech) Phonology (use of sounds in language) Morphology (word formation) Syntax (sentence structure) Semantics (meaning of words & how they combine into sentences) Pragmatics (effect of situation on language use) Or, carving it up another way: Theoretical linguistics (pure and simple: how languages work) Historical linguistics (how languages got to be the way they are) Sociolinguistics (language and the structure of society) Psycholinguistics (how language is implemented in the brain) Applied linguistics (teaching, translation, etc.) Computational linguistics (computer processing of human language) Some linguists also study sign languages, non-verbal communication, animal communication, and other topics besides spoken language. Does linguistics tell people how to speak or write properly? No. Linguistics is descriptive, not prescriptive. Linguistics can often supply facts which help people arrive at a recommendation or value judgement, but the recommendation or value judgement is not part of linguistic science itself. What are some good books about linguistics? (These are cited by title and author only. Full ordering information can be obtained from Books in Print, available at most bookstores and at even the smallest public libraries.) Cambridge Encyclopedia of Language, by David Crystal (1987) is a good place to start if you are new to this field. Language, by Edward Sapir (1921), is a readable survey of linguistics that is still worthwhile despite its age. Some good surveys of linguistics: o An Introduction to Language - Fromkin and Rodman (1974) o The Social Art - Ronald Macaulay (1995) o o The Language Web - Jean Aitchison Language: The Basics - R.L. Trask (1996) Cambridge Textbooks in Linguistics (a series) consists of good, modestly priced introductions to all the areas of linguistics. Any encyclopedia will give you basic information about widely studied languages, alphabets, etc. Of course, you Web geeks don't know from books. How did language originate? Nobody knows. Very little evidence is available. What is known about prehistoric language? Quite a lot, if by 'prehistoric' you'll settle for maybe 2000 years before the development of writing. (Language is many thousands of years older than that.) Languages of the past can be recovered by comparative reconstruction from their descendants. The comparative method relies mainly on pronunciation, which changes very slowly and in highly systematic ways. If you apply it to French, Spanish, and Italian, you reconstruct late colloquial Latin with a high degree of accuracy; this and similar tests show us that the method works. Also, if you use the comparative method on unrelated languages, you get nothing. So comparative reconstruction is a test of whether languages are related (to a discernible degree). The ancient languages Latin, Greek, Sanskrit, and several others form a group known as Indo-European. Comparative reconstruction from them gives a language called ProtoIndo-European which was spoken around 2500 B.C. Many Indo-European words can be reconstructed with considerable confidence (e.g., *ekwos 'horse'). The grammar was similar to Homeric Greek or Vedic Sanskrit. Similar reconstructions are available for some other language families, though none has been as thoroughly reconstructed as IndoEuropean. How do linguists decide that languages are related? When linguists say that languages are related, they're not just remarking on their surface similarity; they're making a technical statement or claim about their history-- namely, that they can be regularly derived from a common parent language. Proto-languages are reconstructed using the comparative method. The first stage is to inspect and compare large amounts of vocabulary from the languages in question. Where possible we compare entire paradigms (sets of related forms, such as the those of the present active indicative in Latin), rather than individual words. The inspection should yield a set of regular sound correspondences between the languages. By regular, we mean that the same correspondences are consistently observed in identical phonetic environments. Finally, sound changes are formulated: languagespecific rules which specify how the original common form changed in order to produce those observed in each descendent language. Applying the comparative method to the Romance languages, we might find 'I sense' /sjEnto/ Sard /sento/ French /sa~/ Italian /sento/ Spanish 'sleep' /suEn^o/ /sonnu/ /som/ /sonno/ 'hundred' /sjEnto/ /kentu/ /sa~/ /tSento/ 'five' /sinko/ /kimbe/ /sE~k/ /tSinkwe/ 'I run' /korro/ /kurro/ /kur/ /korro/ 'story' /kuEnto/ /kontu/ /ko~t@/ /(rak)konto/ and hundreds of similar examples. We see some correspondences-(1) Sard /s/ French /s/ Italian /s/ Spanish /s/ (2) /k/ /s/ /tS/ /s/ (3) /k/ /k/ /k/ /k/ but they seem to conflict: does Sard /k/ correspond to Spanish /s/ or /k/? Does French /s/ correspond to Italian /s/ or /tS/? In fact we will find that the correspondences are regular, once we observe that (2) is seen before a front vowel (i or e), while (3) is seen in other environments. Alternations within paradigms, such as It. /diko/ 'I say' vs. /ditSe/ 'says', will help us make and confirm such generalizations. We may interpret these now-regular correspondences as indicating that an initial /s/ in the proto-language has been retained in all four languages, and likewise initial /k/ in Sard; but that /k/ changed to /s/ or /tS/ in the other languages in the environment of a front vowel. Actually, this process is iterative. For instance, at first glance we might think that German haben and Latin habere 'have' are obvious cognates. However, after noting the regular correspondence of German h to Latin c, we are forced to change our minds, and look to capere 'seize' as a better cognate for haben. Thus, similarity of words is only a clue, and perhaps a misleading one. Linguists conclude languages are related, and thus derive from a common ancestor, only if they find regular sound correspondences between them. To complicate things, derivations may be obscured by irregular changes, such as dissimilation, borrowing, or analogical change. For instance, the normal development of Middle English kyn is 'kine', but this word has been largely replaced by 'cows', formed from 'cow' (ME cou) on the analogy of word-pairs like stone : stones. Analogy often serves to reduce irregularities in a language (here, an unusual plural). Borrowing refers to taking words from other languages, as English has taken 'search' and 'garage' from French, 'paternal' from Latin, 'anger' from Old Norse, and 'tomato' from Nahuatl. How do we know that English doesn't derive from French or Nahuatl? The latter case is easy to eliminate: regular sound correspondences can't be set up between English and Nahuatl. But English has borrowed so heavily from French that regular correspondences do occur. Here, however, we find that the French borrowings are thickest in government, legal, and military domains; while the basic vocabulary (which languages borrow less frequently) is more akin to German. Paradigmatic correspondences like sing/sang/sung vs. singen/sang/gesungen also help show that the Germanic words are inherited, the French ones borrowed. If you want more, Theodora Bynon's Historical Linguistics (1977) is very good, and not long; R.L. Trask's Historical Linguistics (1996) is very readable and covers more recent developments. Anthony Fox's Linguistic Reconstruction: An Introduction to Theory and Method (1995) concentrates on the reconstruction process itself, and assumes some knowledge of linguistics. What is Noam Chomsky's transformational grammar all about? Several things; it really comprises several layers of theory: (1) The hypothesis that much of the structure of human language is inborn ('built-in') in the human brain, so that a baby learning to talk only has to learn the vocabulary and the structural 'parameters' of his native language -- he doesn't have to learn how language works from scratch. The main evidence consists of: The fact that babies learn to talk remarkably well from what seems to be inadequate exposure to language; it is claimed that babies acquire some rules of grammar that they could never have 'learned' from what is available to them, if the structure of language were not partly built-in. The fact that the structure of language on different levels (vocabulary, ability to connect words, etc.) can be lost by injury to specific areas of the brain. The fact that there are unexpected structural similarities between all known languages. For detailed exposition see Cook, Chomsky's Universal Grammar (1988), Newmeyer, Grammatical Theory: Its Limits And Possibilities (1983), and Pinker, The Language Instinct (1994). This theory is by no means accepted by all linguists, though many would agree that some core part of language is innate. (2) The hypothesis that to adequately describe the grammar of a human language, you have to give each sentence at least two different structures, called deep structure and surface structure, together with rules called transformations that relate them. This is hotly debated. Some theories of grammar use two levels and some don't. Chomsky's original monograph, Syntactic Structures (1957), is still well worth reading; this is what it deals with. (3) Chomsky's name is associated with specific flavors of transformational grammar. The model elaborated over the last few years is called GB (government and binding) theory, which however has been heavily modified by the approach described in the recent The Minimalist Program (1995). (4) Some people think Chomsky is the source of the idea that grammar ought to be viewed with mathematical precision. (Thus there are occasional vehement anti-Chomsky polemics such as The New Grammarian's Funeral, which are really polemics against grammar per se.) Although Chomsky contributed some valuable techniques, grammarians have always believed that grammar was a precise, mechanical thing. They are highly divided, however, on the nature and function of those mechanisms! What is a dialect? Dialect is any variety of a language spoken by a specific community of people. Most languages have many dialects. Everyone speaks a dialect. In fact everyone speaks an idiolect, i.e., a personal language. (Your English language is not quite the same as my English language, though they are probably very, very close.) A group of people with very similar idiolects are considered to be speaking the same dialect. Some dialects, such as Standard American English, are taught in schools and used widely around the world. Others are very localized. Localized or uneducated dialects are not merely failed attempts to speak the standard language. William Labov and others have demonstrated, for example, that the speech of inner-city blacks has its own intricate grammar, quite different in some ways from that of Standard English. It should be emphasized that linguists do not consider some dialects superior to others-though speakers of the language may do so; and linguists do study people's attitudes toward language, since these have a strong effect on the development of language. Linguists call varieties of language dialects if the speakers can understand each other and languages if they can't. For example, Irish English and Southern American English are dialects of English, but English and German are different languages (though related). This criterion is not always as easy to apply as it sounds. Intelligibility may vary with familiarity and interest, or may depend on the subject. A more serious problem is the dialect continuum: a chain of dialects such that any two adjoining dialects are mutually intelligible, but the dialects at the ends are not. Speakers of Belgian Dutch, for instance, can't understand Swiss German, but between them there lies a continuum of mutually intelligible dialects. Sometimes the use of the terms 'language' or 'dialect' is politically motivated. Norwegian and Danish (being mutually intelligible) are dialects of the same language, but are considered separate languages because of their political independence. By contrast, Mandarin and Cantonese, which are mutually unintelligible, are often referred to as 'dialects' of Chinese, due to the political and cultural unity of China, and because they share a common written language. At this point we usually quote Max Weinreich: "A language is a dialect with an army and a navy." Because of such problems, some linguists reject the mutual intelligibility criterion; but they do not propose to return to arguments on political and cultural grounds. Instead, they prefer not to speak of dialects and languages at all, but only of different varieties, with varying degrees of mutual intelligibility. What is Noam Chomsky's transformational grammar all about? Several things; it really comprises several layers of theory: (1) The hypothesis that much of the structure of human language is inborn ('built-in') in the human brain, so that a baby learning to talk only has to learn the vocabulary and the structural 'parameters' of his native language -- he doesn't have to learn how language works from scratch. The main evidence consists of: The fact that babies learn to talk remarkably well from what seems to be inadequate exposure to language; it is claimed that babies acquire some rules of grammar that they could never have 'learned' from what is available to them, if the structure of language were not partly built-in. The fact that the structure of language on different levels (vocabulary, ability to connect words, etc.) can be lost by injury to specific areas of the brain. The fact that there are unexpected structural similarities between all known languages. For detailed exposition see Cook, Chomsky's Universal Grammar (1988), Newmeyer, Grammatical Theory: Its Limits And Possibilities (1983), and Pinker, The Language Instinct (1994). This theory is by no means accepted by all linguists, though many would agree that some core part of language is innate. (2) The hypothesis that to adequately describe the grammar of a human language, you have to give each sentence at least two different structures, called deep structure and surface structure, together with rules called transformations that relate them. This is hotly debated. Some theories of grammar use two levels and some don't. Chomsky's original monograph, Syntactic Structures (1957), is still well worth reading; this is what it deals with. (3) Chomsky's name is associated with specific flavors of transformational grammar. The model elaborated over the last few years is called GB (government and binding) theory, which however has been heavily modified by the approach described in the recent The Minimalist Program (1995). (4) Some people think Chomsky is the source of the idea that grammar ought to be viewed with mathematical precision. (Thus there are occasional vehement anti-Chomsky polemics such as The New Grammarian's Funeral, which are really polemics against grammar per se.) Although Chomsky contributed some valuable techniques, grammarians have always believed that grammar was a precise, mechanical thing. They are highly divided, however, on the nature and function of those mechanisms! What is a dialect? How can I represent phonetic symbols in ASCII? This summary is presented for convenience only, and is not intended to forestall discussion of alternative systems. blb-- -lbd-- --dnt-- --alv-- -rfx- -pla-- --pal--- --vel-- ----uvl----- nas n" m stp G p b frc g" F V apr g" M f v n[ n n. t[ d[ t d t. d. T s z s. z. r[ r r. j j<vel> l[ l l. l^ L D r<lbd> lat trl r" b<trl> * ejc p` t[` t` clk p! t! c! b` d` ---- lbv ---- nas z<lat> c S Z J C C<vcd> N k g q x Q X r<trl> flp imp G` n^ --phr-- *. c` c! d` J` t<lbv> d<lbv> frc w<vls> w k! g` q` ---glt--- n<lbv> stp k' alv lat frc: s<lat> H H<vcd> ? lat flp: *<lat> h<?> lat clk: l! apr w h ----- unr ----- unr ----- rnd ----- fnt cnt fnt cnt bck cnt bck rzd hgh i smh I umd e i" @<umd> mid u- o- y R<umd> @ u" u I. U Y o R @. lmd E V" V W O" O low & a A &. a. A. Diacritics: Vowels: Consonants: + = ad hoc diacritic ~ nasalized velarized [ dental : long ! click - unrounded syllabic <H> pharyngealized . rounded retroflex <h> aspirated ` ejective/implosive <o> unexploded or voiceless ^ palatal <r> rhotacized ; palatalized <w> labialized uvular <?> murmured " centered Other symbols: $ % ad hoc segment [] phonetic transcription // phonemic transcription # syllable or word boundary space word/segment separator ' , primary and secondary stress 0-9 tones What are phonemes and why's it so hard to lose a foreign accent? [--markrose] The sounds (phones) humans can make are infinite; there's (almost always) a continuum of phones between any two phones. In any one language, however, phones are grouped into 20 to 60 or so discrete groups of sounds called phonemes. The range of variation for each phoneme is discounted by speakers and hearers of the language, who perceive the entire range as "the same sound." The diversity of phones, and their grouping into phonemes, can be clearly seen on this chart from William Labov's Principles of Linguistic Change (1994). The chart is a graph of formant frequencies F1 against F2 for the main vowels of fifty words as spoken by a single person-- in effect, a plot of fifty actual phones. (The words on the chart-- beat, bait, etc.-- are not the words being spoken, but just examples of words with those vowel sounds.) (Most of the sounds plotted are diphthongs, which are glides between two sounds; this accounts for some of the overlaps on the diagram (and for the little arrows on the symbols). For instance, the sounds Labov calls ay and aw start in about the same place, but ay heads 'northwest' toward [i] and aw heads 'northeast' toward [u].) The English phoneme /p/ has two phonetic realizations or allophones: aspirated [ph] beginning a word and non-aspirated [p] elsewhere. But since the two types of /p/ never distinguish one word from another, speakers of English generally don't even perceive the difference. (Linguists write phonemic transcriptions between /slashes/, and phonetic transcriptions in [brackets].) If we can find two words with different meaning but only one difference in sound between them-- a minimal pair-- then we've found distinct phonemes; e.g. /p/ and /b/ in English 'pit' and 'bit'. If two sounds never occur in the same phonetic environment (e.g. English [p] and [ph])-- if they're in complementary distribution-- then they're probably allophones of a single phoneme. Other languages do not divide up the phonetic space in the same way. For instance, /p/ and /ph/ are separate phonemes in Mandarin Chinese (as in /pa1/ 'eight' and /pha1/ 'flower'). And the vowels of late and let, phonemes in English, are allophones of a single phoneme /e/ in Spanish. We're trained from childhood to make the phonetic distinctions our language uses to keep its phonemes apart, and to ignore those that lie within phonemes. Learning to make different distinctions in a foreign language is quite difficult-- usually harder than making new sounds our native language lacks entirely. We'll continue to have an accent in the new language so long as we hear its sounds through our native language's phonemic filter.