IPA and Unicode

advertisement
IPA and Unicode
I have read the IPA handbook and looked at the IPA and Unicode sites on the Internet in
order to find out what the recommendations are for coding IPA in Unicode. The IPA
handbook includes a list (Appendix 2) for computer codings of the symbols. The codings
of symbols have been updated to 1998. After that there has only been one revision of the
IPA charts in 2005. In this revision only one symbol was introduced (the voiced
labiodental tap or flap) and none were removed.
The IPA handbook p.31-32:
“One problem for those devising IPA character sets which has hindered the
interchangeability of data containing phonetic symbols was the lack of an agreed standard
coding for the symbols. The International Phonetic Association, through its Workgroup
on Computer Coding, has worked with the International Standards Organization in its
project to set up a universal character set (UCS [=Universal Character Set = Unicode])
for all alphabets. An agreed set of UCS 16-bit codes is included in the list in appendix 2.”
I would read this as a recommendation from the International Phonetic Association to use
Unicode as a standard for coding the IPA symbols.
On p. 165 in Appendix 2 there is a note:
“The publication of these lists of coding assignments should not be construed as an
endorsement by the IPA of every character in the list, but as a convenient reference to the
location of any potential character in the coding tables as currently constituted.”
This would mean that other codings of the characters are possible, too. Still, I think that
the lists in the handbook are a good basis for the processing type ‘strict IPA’ in our web
application.
The last version of the handbook is from 1999, which means that changes in the
recommendations may have occurred. Because of this I also looked at the website of the
association (http://www.langsci.ucl.ac.uk/ipa/). Under ‘Alphabet’
(http://www.langsci.ucl.ac.uk/ipa/ipachart.html) there is a link ‘IPA and Unicode’. This
link goes directly to the site of the Unicode consortium (www.unicode.org) to a sub-site
’Links to Unicode Resources.’ (http://www.unicode.org/resources/index.html) with a
further link to ’Linguistics and Script Specialty Sites’. On this site there is a link with the
title ’International Phonetic Alphabet in Unicode’ which brings us to a site authored by
John Wells (emeritus professor of Phonetics in the University of London, formerly
secretary of the International Phonetic Association and editor of its Journal, in 2003
elected as its president). On this site there is a list with the Unicode codings of the IPA
symbols. Since this list is linked from the IPA site via the Unicode site I have trusted it as
a recommendation and compared it with the list in the handbook. Here are my
conclusions:
Comparison of the handbook and John Wells’ website
[comments within brackets refer to the attached list of codings]
Vowels and consonants
The two lists are almost completely identical.
Differences:
 025D (rhotacized open-mid central vowel) is present in Wells’ list but is not listed
in the handbook. The symbol cannot be found in the IPA charts. The same symbol
can be written as the vowel symbol + the diacritic for rhoticity. [excluded]
 There are two affricates in Wells’ list. In the handbook there are a number of
affricates but all of them with the comment that they have been superseded by the
use of two symbols instead of the ligature. The updated IPA charts clearly state
that “Affricates and double articulations can be represented by two symbols
joined by a tie bar if necessary.” [affricates excluded]
Comments
 the symbol added to the chart in 2005 (voiced labiodental tap or flap) is missing in
both lists: the code of this symbol is U+2C71 [included]
 ‘g’ can be coded as 0261 or 0067 according to both lists
Diacritics and suprasegmental symbols
The lists are mostly identical.
Differences:
 Wells’ list includes three symbols which are neither found in the handbook nor in
the latest IPA charts (U+2192, U+02B1, U+02B4) [excluded]
 a number of intonational and suprasegmental symbols and the lateral and nasal
release are missing in Wells’ list [included]
 the handbook clearly gives the wrong code for ’tie bar below’. The handbook
gives U+203F which is not a combining mark but a punctuation mark. Wells
gives 035C which is a combining mark and should be the right symbol [035C]
 The handbook gives two alternative codings for the diacritics ‘lowered’
(031E/02D5) and ‘raised’ (031D/02D4). Wells only gives 031D and 031E which
are combining diacritical marks while 02D4 and 02D5 are spacing modifier
letters. All other diacritics are combining marks. For some diacritics where
spacing modifier letters were used up to 1989 (for example, U+02D6) the
handbook gives clear recommendations to use the combining mark. It is unclear to
me why two alternatives are maintained for these two symbols. [031D and 031E]
Comments
 some intonational symbols (contours) in the chart are missing in Unicode
 three symbols that are part of the chart are missing in both lists (U+1DC4,
U+1DC5, U+1DC8) [included]
General comments




length (of vowels and consonants) can be marked in IPA either by using the
length mark or by writing double symbols (p. 22 in the handbook)
some diacritics have two different codings depending on if they are placed above
or below the symbol (tie bar, voiceless)
there is also a set of symbols for disordered speech. I guess we are not planning to
include these symbols in the web application since the purpose is to analyze
dialects and not disordered speech
how are the tone symbols treated by the Levenshtein algorithm? The tone symbols
should be included, especially since the IPA handbook states that most of the
languages in the world have tone distinctions
Principle 4(a) of the IPA:
“When two sounds occurring in a given language are employed for distinguishing one
word from another, they should wherever possible be represented by two distinct symbols
without diacritics. Ordinary roman letters should be used as far as is practicable, but
recourse must be had to other symbols when the roman alphabet is inadequate.”
p. 18: in the handbook:
”When a symbol is said to be suitable for the representation of sounds in two languages,
it does not necessarily mean that the sounds in the two languages are identical.”
The above citations indicate that the IPA follows the phonemic principle and is not
strictly phonetic, which we should keep in mind. However, there is also a section in the
handbook about broad vs. narrow transcriptions. The phonemic principle holds for broad
transcriptions and any user of the alphabet can chose how broad or narrow transcriptions
should be made. The IPA offers the tool for broad and narrow transcriptions, but one
should keep in mind that the IPA symbols don’t have a fixed phonetic value but the
phonetic value of each symbol depends on the level of transcription.
Download