LING 001 Introduction to Linguistics Spring 2010 Writing systems II Chinese writing system Reading Mar. 31 Origins of Chinese characters • Legend has it that Cangjie, a historian-official who lived in the time of Huangdi (the Yellow Emperor), created Chinese characters at the inspiration of such natural objects as the sun, the moon, the stars, and footprints of animals and birds. • Historical records and archeological finds reveal that Cangjie might, in fact, have been the first person to study and index the Chinese characters. LING 001 Introduction to Linguistics, Spring 2010 2 Secrets discovered in Medicine • In 1899, Chinese scholar Wang Yirong discovered from “dragon bones” - an ingredient of traditional Chinese medicine - some peculiar symbols. • The symbols were Jiaguwen, or script written on oracle bones (tortoise shells and animal bones), and these particular bones were found to date back 3,000 years. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. LING 001 Introduction to Linguistics, Spring 2010 3 Evolution of Chinese characters Oracle bone script Bronze script Small seal script QuickTime™ and a TIFF (Uncompress ed) dec ompres sor are needed to s ee this pic ture. Clerical script Quick Time™a nd a TIFF ( Unco mpre ssed ) dec ompr esso r ar e nee ded to see this pictur e. Standard script Grass script Running script Simplified script LING 001 Introduction to Linguistics, Spring 2010 4 Structure of Chinese characters • • • • Pictograms: 日 ‘sun, day’, 人 ‘person’ Indicatives: 上 ‘up’ 下 ‘down’ 凹 ‘concave’ 凸 ‘convex’ Semantic-semantic compounds: 休 ‘rest’ = 人 + 木 (person + tree) Semantic-phonetic compounds: 蝗 (虫 + 皇) ‘locust’ = ‘INSECT + huang2’ LING 001 Introduction to Linguistics, Spring 2010 5 Structure of Chinese characters • Most semantic-phonetic compounds (also called phonetic compounds) have a left-right structure, having their semantic radicals on the the left and phonetic radicals on the right. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. LING 001 Introduction to Linguistics, Spring 2010 6 Chinese Rebus: Phonetic Loans I • • • • • can 來 lái = wheat lái = come 來 = wheat/come 麥 = wheat 來 = come see you QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. LING 001 Introduction to Linguistics, Spring 2010 7 8 Chinese characters P ict o g ram Ind ica tive 1 40 0 110 0 BC 22 7 (23%) 20 0 AD 20 (2 %) 12 5 (1 %) th 18 cent u ry 7, 00 0 f r e que nt ch ar a ct ers 150 0 (3 %) 19 % 3 64 (4 %) Se man tic - 39 6 se mant ic (41%) 116 7 (13%) Se man tic - 3 34 (34 %) phonetic 769 7 (82%) 47141 (97%) 81 % Note: The 3,500 most frequently used characters in Modern Chinese would cover 99.48% of a 2 million character corpus, where the first 2,500 characters accounted for 97.97%. LING 001 Introduction to Linguistics, Spring 2010 Computer coding of characters • Decimal (base 10): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 e.g., 107 = 1*102 + 0*101 + 7*100 • Binary (base 2): 0, 1 e.g., 1101011 = 1*26 + 1*25 + 1*23 + 1*21 + 1*20 (107) • Bit: the basic unit in computer, represents either 1 or 0, on or off. • Byte: A sequence of eight bits. It is used as a fundamental unit in modern computers. Also called octet. • Characters are represented by binary numbers in computer LING 001 Introduction to Linguistics, Spring 2010 9 Computer coding of characters • American Standard Code for Information Interchange (ASCII) • a 7-bit character set: range from 0-127 • 0-31 - control codes and formatting: Escape, Tab, Space,... • 32-126 – punctuations , numbers, and English letters : !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFG HIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklm nopqrstuvwxyz{|}~ e.g., “A” - 1000001 (65) • ISO-8859-1 • It uses 8 bits, and contains ASCII as a subset. • In addition to the ASCII characters, ISO-8859-1 contains various accented characters and other letters needed for writing languages of Western Europe, and some special characters. LING 001 Introduction to Linguistics, Spring 2010 10 Computer coding of characters • Some languages (Chinese, Japanese, Korean, etc.) have more than 256 characters. • Encoding standards for these languages use sequences of bytes for characters. • Because different standards use different numbers of bytes, the computer can’t tell whether a given byte is a whole character or part of a character; corruption of one byte can corrupt the whole data stream. • Unicode: 21-bit encoding space allows for 1,114,112 characters; 95,156 code point values assigned to characters in Unicode 3.2; 879,626 code point values reserved for future character assignments. LING 001 Introduction to Linguistics, Spring 2010 11 UTF-8 • For ASCII characters, the 21-bit value is truncated to 8 bits: • For other characters, the 21-bit value is turned into a sequence of two, three, or four 8-bit values: LING 001 Introduction to Linguistics, Spring 2010 12 Phonology in reading development • Phonological processes are generally considered to be important for developing word reading skills. • “The best predictor of reading difficulty in kindergarten or first grade is the inability to segment words and syllables into constituent sound units (phonemic awareness).” (Lyon, 1995) • “Reading and phonemic awareness are mutually reinforcing: Phonemic awareness is necessary for reading, and reading, in turn, improves phonemic awareness still further.” (Shaywitz, 2003) QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. LING 001 Introduction to Linguistics, Spring 2010 13 14 Dyslexia • • • Phonological deficits are the most significant and consistent cognitive marker of dyslexic children. Auditory Analysis Test: it asks a child to segment words into their underlying phonological units and then to delete specific phonemes from the words, e.g., say ‘block’ without ‘buh’. Even in high school students, phonological awareness was the best indicator of reading ability. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. LING 001 Introduction to Linguistics, Spring 2010 Phonological activation in reading • Skilled readers activate phonological representations in reading. • Lexical decision: In a lexical decision paradigm, subjects are presented with a written stimulus, and are asked to answer whether or not the stimulus in question is a word of their language. A typical finding is that participants take more time to reject pseudohomophones foils than controls foils. • Naming: In the naming paradigm, subjects are again presented with a written stimulus, but this time they are asked to pronounce the stimulus aloud. — to “name” the word that is on the screen. A typical finding is that participants take shorter time to name the word if a homophone (prime) is presented before the target word. LING 001 Introduction to Linguistics, Spring 2010 15 Phonological activation in reading • Eye movement: Target words were read faster (shorter fixation duration) when a phonologically similar word, e.g., homophone, was presented briefly at the onset of fixation on the target region (Rayner et al. 1995). • The prime for a given target (e.g., beach) was either identical to the target (beach), a phonologically similar word (the homophone beech), a visually similar nonhomophone (bench), or a dissimilar word (noise). • Comparing fixation times on the target when it was preceded by the homophone versus the visually similar word. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. LING 001 Introduction to Linguistics, Spring 2010 16 Phonological activation in reading • ERP (event-related potential): Target words had smaller ERPs (averaged electrical activity in the brain) when a phonologically similar word or syllable was presented before the target word. (Ashby 2010). QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. LING 001 Introduction to Linguistics, Spring 2010 17 Phonological activation in reading • • The morphemic nature of Chinese writing leads easily to the assumption of a close connection between graphic form and meaning. characters are not alphabetic. On average, 11characters share a single pronunciation if disregarding tone, about four homophones for each character if tone is considered. 石室诗士施氏, 嗜狮, 誓食十狮。 氏时时适市视狮。 十时, 适十狮适市。 是时,适施氏适市。 氏视是十狮, 恃矢势, 使是十狮逝世。 氏拾是十狮尸, 适石室。 石室湿, 氏使侍拭石室。 石室拭, 氏始试食是十狮。 食时, 始识是十狮, 实十石狮尸。 试释是事。 LING 001 Introduction to Linguistics, Spring 2010 18 Phonological activation in reading • Like in English, participants in Chinese take shorter time to name the target word if a homophone (prime) is presented before it (Perfetti & Tan 1998). • Graphic information begins the identification process and it the first to show a priming effect • Phonological information precedes semantic information in primed naming QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. LING 001 Introduction to Linguistics, Spring 2010 19 Phonological activation in reading • Analysis of 19 published brain mapping studies (fMRI) of phonological processing in reading, six with Chinese and 13 with alphabetic languages, found significant differences between languages (Tan et al. 2005) • The left middle frontal gyrus is responsible for addressed phonology in Chinese. • Left temporoparietal regions mediate assembled phonology in alphabetic languages. • More on language and brain later. LING 001 Introduction to Linguistics, Spring 2010 20