IPA and Unicode I have read the IPA handbook and looked at the IPA and Unicode sites on the Internet in order to find out what the recommendations are for coding IPA in Unicode. The IPA handbook includes a list (Appendix 2) for computer codings of the symbols. The codings of symbols have been updated to 1998. After that there has only been one revision of the IPA charts in 2005. In this revision only one symbol was introduced (the voiced labiodental tap or flap) and none were removed. The IPA handbook p.31-32: “One problem for those devising IPA character sets which has hindered the interchangeability of data containing phonetic symbols was the lack of an agreed standard coding for the symbols. The International Phonetic Association, through its Workgroup on Computer Coding, has worked with the International Standards Organization in its project to set up a universal character set (UCS [=Universal Character Set = Unicode]) for all alphabets. An agreed set of UCS 16-bit codes is included in the list in appendix 2.” I would read this as a recommendation from the International Phonetic Association to use Unicode as a standard for coding the IPA symbols. On p. 165 in Appendix 2 there is a note: “The publication of these lists of coding assignments should not be construed as an endorsement by the IPA of every character in the list, but as a convenient reference to the location of any potential character in the coding tables as currently constituted.” This would mean that other codings of the characters are possible, too. Still, I think that the lists in the handbook are a good basis for the processing type ‘strict IPA’ in our web application. The last version of the handbook is from 1999, which means that changes in the recommendations may have occurred. Because of this I also looked at the website of the association (http://www.langsci.ucl.ac.uk/ipa/). Under ‘Alphabet’ (http://www.langsci.ucl.ac.uk/ipa/ipachart.html) there is a link ‘IPA and Unicode’. This link goes directly to the site of the Unicode consortium (www.unicode.org) to a sub-site ’Links to Unicode Resources.’ (http://www.unicode.org/resources/index.html) with a further link to ’Linguistics and Script Specialty Sites’. On this site there is a link with the title ’International Phonetic Alphabet in Unicode’ which brings us to a site authored by John Wells (emeritus professor of Phonetics in the University of London, formerly secretary of the International Phonetic Association and editor of its Journal, in 2003 elected as its president). On this site there is a list with the Unicode codings of the IPA symbols. Since this list is linked from the IPA site via the Unicode site I have trusted it as a recommendation and compared it with the list in the handbook. Here are my conclusions: Comparison of the handbook and John Wells’ website [comments within brackets refer to the attached list of codings] Vowels and consonants The two lists are almost completely identical. Differences: 025D (rhotacized open-mid central vowel) is present in Wells’ list but is not listed in the handbook. The symbol cannot be found in the IPA charts. The same symbol can be written as the vowel symbol + the diacritic for rhoticity. [excluded] There are two affricates in Wells’ list. In the handbook there are a number of affricates but all of them with the comment that they have been superseded by the use of two symbols instead of the ligature. The updated IPA charts clearly state that “Affricates and double articulations can be represented by two symbols joined by a tie bar if necessary.” [affricates excluded] Comments the symbol added to the chart in 2005 (voiced labiodental tap or flap) is missing in both lists: the code of this symbol is U+2C71 [included] ‘g’ can be coded as 0261 or 0067 according to both lists Diacritics and suprasegmental symbols The lists are mostly identical. Differences: Wells’ list includes three symbols which are neither found in the handbook nor in the latest IPA charts (U+2192, U+02B1, U+02B4) [excluded] a number of intonational and suprasegmental symbols and the lateral and nasal release are missing in Wells’ list [included] the handbook clearly gives the wrong code for ’tie bar below’. The handbook gives U+203F which is not a combining mark but a punctuation mark. Wells gives 035C which is a combining mark and should be the right symbol [035C] The handbook gives two alternative codings for the diacritics ‘lowered’ (031E/02D5) and ‘raised’ (031D/02D4). Wells only gives 031D and 031E which are combining diacritical marks while 02D4 and 02D5 are spacing modifier letters. All other diacritics are combining marks. For some diacritics where spacing modifier letters were used up to 1989 (for example, U+02D6) the handbook gives clear recommendations to use the combining mark. It is unclear to me why two alternatives are maintained for these two symbols. [031D and 031E] Comments some intonational symbols (contours) in the chart are missing in Unicode three symbols that are part of the chart are missing in both lists (U+1DC4, U+1DC5, U+1DC8) [included] General comments length (of vowels and consonants) can be marked in IPA either by using the length mark or by writing double symbols (p. 22 in the handbook) some diacritics have two different codings depending on if they are placed above or below the symbol (tie bar, voiceless) there is also a set of symbols for disordered speech. I guess we are not planning to include these symbols in the web application since the purpose is to analyze dialects and not disordered speech how are the tone symbols treated by the Levenshtein algorithm? The tone symbols should be included, especially since the IPA handbook states that most of the languages in the world have tone distinctions Principle 4(a) of the IPA: “When two sounds occurring in a given language are employed for distinguishing one word from another, they should wherever possible be represented by two distinct symbols without diacritics. Ordinary roman letters should be used as far as is practicable, but recourse must be had to other symbols when the roman alphabet is inadequate.” p. 18: in the handbook: ”When a symbol is said to be suitable for the representation of sounds in two languages, it does not necessarily mean that the sounds in the two languages are identical.” The above citations indicate that the IPA follows the phonemic principle and is not strictly phonetic, which we should keep in mind. However, there is also a section in the handbook about broad vs. narrow transcriptions. The phonemic principle holds for broad transcriptions and any user of the alphabet can chose how broad or narrow transcriptions should be made. The IPA offers the tool for broad and narrow transcriptions, but one should keep in mind that the IPA symbols don’t have a fixed phonetic value but the phonetic value of each symbol depends on the level of transcription.