Slides, draft (PowerPoint)

advertisement
1
COLANG2014
Institute on Collaborative Language Research
Orthography Development:
The ‘Midwife’Approach
Mike Cahill
Colleen Fitzgerald
Keren Rice
Gwen Hyslop
Kristine Stenzel
Contents of Power Point
• Introductory discussion (slides 5-9)
• Introduction to ‘Midwife’ approach (slides 10-28)
• Overview of linguistic issues (slide 29)
• Dealing with allophones (slides 30-42)
• Dealing with allomorphs (slides 43-48)
• Suprasegmental problems (slides 49-52)
• “New” sounds: Dene and Kurtöp (slides 53-79)
• Variation and standardization (slides 80-97)
• Review of Methodology (slides 98-101)
• Further issues (102-119)
• A final political example (120-129)
• Summary (slides 130-131)
• References/Contact info (slides 132-133)
2
Some background
• These slides were developed for a course at InField,
taught in 2008 by Keren Rice and Kristine Stenzel, in
2010 by Gwen Hyslop and Keren Rice, in 2012 by
Colleen Fitzgerald and Keren Rice, and now in 2014
by Keren Rice and Mike Cahill.
• All have first-hand experience in orthography
development (detailed at the end of this presentation).
3
Our Goals:
Discuss important questions and parameters
(socio-political, technical-linguistic, psychocognitive) related to orthography development
2. Consider an approach to orthography
development (o.d.) based on community
involvement, writing practice, and analysis
3. Provide opportunities for hands-on analysis
4. Exchange experiences, brainstorm, expand
resources
4
1.
Initial Discussion Questions
• What is an orthography?
• How would you define the role of the linguist in
the process of orthography development?
• What do you think a language community expects
from the linguist and from the orthography
development process in general?
• What are the features of a ‘good’ orthography and
what kinds of things do we need to know in order
to develop one?
5
What is an orthography? Some
thoughts for discussion
• Agreed upon system to represent
sounds/words/concepts of a language
• Practical tool for communication
•…
6
What is the role of the linguist?
• Facilitator
• Mediator
•…
7
What does the community expect
from the linguist?
• Intervention around different spellings and
competing orthographies
• Expertise and connections that are not present
in the community
• Legitimacy of the language
•…
8
What are features of a ‘good’
orthography?
• Easy to learn and to produce
• Minimize number of characters, maximize what
they represent
• Culturally relevant
• Transfer from matrix language
• Visually contrastive
•…
9
The ‘Midwife’ approach
What is it?
10
The ‘midwife’ approach to the
development of an orthography
• overall goal: to approach o.d. as a process
• based on exchange and integration of knowledge and experiences of
linguist and language community (LC)
• with LC as active participant, sharing ‘joint responsibility’ for final
outcome
• methodology: practice of writing and analysis of the language
feed into each other
• linguist’s role:
facilitator/guide in the practice - analysis dialectic
• What kind of practice can help identify and focus the issues so that the
analysis becomes more clear?
• What kinds of appropriate metaphors can be useful tools?
11
Basic principles of the approach
• notion of o.d. as a process whereby members of a
language community (LC) come to analyze aspects of
their own language and develop a new practice: writing
• during the process (which may continue over an extended
period of time), orthographic variation is ok
• continuous and reflective practice (LC writing and
reading) is always the primary input to language-analysis
activities
• LC linguistic knowledge and social interpretations are
also a fundamental input
12
An overview of the ‘midwife’
approach to orthography
development
Getting started 1: Discussion with LC
Getting started 2: Types of writing systems
Getting started 3: Learnability
13
Getting started in developing a
writing system – 1: discussion with the LC
Why do we need to study our own
language in order to think about
writing it?
Discussion:
How are oral language and written
language similar and how are they
different?
14
15
Oral Language
Written Language
•Communication between people in
same place/time (immediate).
Communication between people in
different places/times (extended).
Allows for reductions, use of body
language and abbreviated deictic
references, because misunderstandings /
doubts can be resolved then and there.
Requires more complete forms and
additional symbols to aid understanding –
tools to make sure that the writer’s
message will reach the reader intact.
Is where innovations and change
appear first.
Tends to reflect stable forms, changes more
slowly.
Always includes more types of
variation, which may show different
origins, group affiliations, or contexts
requiring different registers (e.g.
formal/informal).
•May include (or not) variation that
represents differences between groups of
speakers of the same language, especially
during initial phases;
•May be unified (or not) as a result of
process of practice, analysis (discussion of
variations and what they represent), and
political decision-making.
15
Getting started – 2: presentation/discussion
of types of writing systems and the symbols
they use
What do symbols represent in different types of writing
systems?
1. ‘Morphographic’ / ‘Logographic’ representations of
words or morphemes
16
2. ‘Phonographic’ systems: representations
of syllabic combinations
Cree
17
3. ‘Alphabetic’ representations of
individual sounds
The traditional thought in o.d. is that each symbol in an alphabet
should represent a phonological segment, (ideally) corresponding
(as directly as possible) to the phonemes of the language
Consonantal alphabets: symbols represent consonants
Full alphabets: symbols represent consonants and
vowels (e.g. Greek and Latin alphabets)
18
Mayan
make by
friction
he light-fire
Itzamna our God
‘Our God Itzamna made his fire using
friction.’
19
All orthographies change over time
Roman Alphabet (2,600+ years old)
20
Getting started – 3: discussion of
‘Learnability’
Who is the writing system for? Will it be
used primarily by native speakers?
Learners of the language?
What kinds of orthographic features might
help increase ‘learnability’ for each of
these target groups?
21
An important assumption
• There is a writing system to begin from. (For
instance, learners are literate in another language
such as English or Spanish.)
• In such cases, familiarity with an existing system
will probably lead the LC to adopt a similar type
of orthographic representation, but will require
analysis so that they can recognize where
adjustments need to be made.
22
A ‘getting started’ exercise
for the LC
This type of exercise works well in workshoptype situations, and will likely provide activities
for many days of work. It is a good way to get a
large variety of members of the LC involved in
the discussions. If activities are organized in
groups, literate and non-literate individuals and
speakers with varying degrees of fluency can
have input.
23
A. LC participants choose a theme (or
themes) and write short texts
(individually or in groups)
B. Participants exchange texts to
read, making lists of doubts they
encounter or alternative ways to
write specific words
24
C. Participants present their
doubts and suggestions to the
entire group – this is the data
that will guide the analysis
and inform decision-making
25
What kinds of information are likely to be
revealed by this initial exercise?
In terms of orthographic symbols, that:
• Various symbols are being used for the same sound
• No symbol is available for a sound in the language
Both cases may result from the effect of literacy in a different
language or from alternate existing orthographies. Recognition
of where the problems lie is a first step in analysis.
26
27
In terms of phonology:

Sets of examples of important phonological elements
and indications as to their ‘functional loads’

Evidence of allophonic variation

Indication of variation between speakers of different
ages or from different regions
In terms of morphology:

questions as to word boundaries and other morphological
issues such as what to do about compound words or
27
complex constructions
28
Continuing the exercise . . .
D. As the participants present the results of these
activities, the linguist should be able to recognize
and group together the different categories of
‘doubts’ and begin to think about how to work on
them with the LC
E. Subsequent activities should focus on individual
issues, analyzing them with the LC so that informed
decisions can be made collectively
28
29
Linguistic issues: what to do
about . . .
• Allophones
• Allomorphs
• Suprasegmentals
• Sounds in the language that are
not represented in a known writing
system
29
Representing allophones
30
31
Allophones in English
pool
spool
[ph]
[p]
Allophones have the same
representation in the orthography.
31
32
An example of allophones and their
representation in the orthography in o.d.:
Kotiria (Eastern Tukanoan) [d] and [r]
In this language, as onset consonants, these
sounds occur in complementary distribution: [d]
word-initially and [r] word-internally
dukuri ‘manioc roots’
duhire ‘you/he/she/they sat’
diero ‘a dog’
 What decision was made in this case?
32
33
Analysis with the LC
a) Participants in the language workshop compiled a list
of words containing the two sounds from their own
written texts
b) All occurrences of the sounds were highlighted, so that
participants could visually observe their distribution
c) Participants were asked if they could think of other
words with different sounds in the positions of [d] and
[r] (in other words, to find minimal pairs), leading to
analysis and recognition of /d/ as a ‘basic sound’
(phoneme) and [r] as a ‘variant’ (allophone)
33
34
d) Once speakers had observed and analyzed for
themselves that [r] was a variant of /d/ in a specific
position, it was possible to discuss whether or not to
represent it with a different symbol
e) Collectively, several of the texts were re-written
using only ‘d’ and speakers were asked to evaluate
how they felt, as writers and readers, about the use of
a single symbol
34
35
Coming to conclusions
f)
While recognizing /d/ as underlying sound, use of the symbol ‘d’
in both positions felt uncomfortable to the participants. They
argued that it contradicted a well-established surface distribution
of sounds, making the written and spoken versions of the
language look too different. Additionally, use of ‘d’ in wordinternal position made the written texts look like they represented
the pronunciation of closely-related languages in the family, in
which the d-r distribution does not occur.
g)
Thus, the LC has opted to use different symbols for the ‘d’ and
‘r’ sounds in the orthography, a decision informed by linguistic
analysis but respectful of input from the LC as the end users of
the system.
35
What if…?
• Kɔnni (Gur, northern Ghana) has a similar
distribution of [d] and [r]:
• dàáŋ
• kʊ́rʊ́bâ
‘stick’
‘bowl’
dígí
chʊ̀rʊ́
• These appear to be allophones of /d/.
‘to cook’
‘husband’
But, some complications:
[d] is intervocalic when it’s
• lexeme-initial (in a compound word)
jùò-dìkkíŋ
‘cooking room’ (cf. digi ‘to
cook’)
• in borrowed words and ideophones
‘banana’ (Twi)
‘dung-beetle’
• Discuss: Does this make a difference?
What other questions would you ask?
kòdú
bìn-dúdù
Factors to check:
• Speakers’ preferences
• Other neighboring languages
• Any other linguistic or psycholinguistic evidence?
•…
• Almost totally illiterate group, not informed enough to
express a preference
• All related languages have both <d> and <r>. Sometimes
separate phonemes, sometimes not.
Also, influence of English.
• And…
A test…
• Other voiced stops lenite intervocalically; couldn’t
/d/ also? Stops occur in careful speech, fricatives in
casual.
• bɔ́bɪ́ ~ bɔ́βɪ́
‘to tie’
• hɔ̀gʊ́ ~ hɔ̀ɣʊ́
‘woman’
• However, Kɔnni speakers can tell the difference in [d]
and [r], and corrected my pronunciation when I
attempted *[hààdɪ́ŋ] rather than [hààrɪ́ŋ]’boat’
• Conclusion: /d/ and /r/ have recently become separate
phonemes, and conforming to other languages, are
written with two symbols.
40
Another Allophone Example
• Choctaw, a Native American language in
Mississippi and Oklahoma has had a number of
orthographies.
• The language has three vowel phonemes:
/i o a/,which can be short, long, or nasalized.
• However, the writing systems often use six
symbols, following how the language was
written in the 19th century, associated with
Cyrus Byington.
40
41
Choctaw Vowels
(from Alphabet links at Choctaw Language School online)
• Two of the allophones of phoneme /a/ get represented in
the writing system. Allophone [a] tends to appear in open
syllables, written as a.
• chaha
'tall'
• taloowa
'sing'
• The other allophone, [ə], tends to appear in short closed
syllables and is written using a symbol not used as a vowel
in English, ν.
• anνmpa
'word, language'
• kνllo
'hard'
41
42
Conclusions from the Choctaw
allophones
• This example using allophones shows that sometimes
the choice is made to write allophones.
• Not all Choctaw vowel allophones are represented
with unique orthographic symbols, but some are.
• We will see some parallels in the upcoming allomorph
examples from English, where some of the variation
can be chosen to be written overtly in the writing
system.
42
43
Questions of allomorphy
43
44
Problems of allomorphy
Shallow vs. deep orthographies
Shallow: close to pronunciation
Deep: preserves graphic identity of
meaningful elements
44
45
English allomorphy
A combination of deep:
cats [s]
dogs [z]
And shallow:
intangible [n] impossible [m]
45
46
Allomorphs: Dene voicing
alternations
sa ‘watch’
xa ‘hair’
shá ‘knot’
Shallow orthography?
(“phonetic”)
Deep orthography?
(“phonemic”)
sezá
seghá
sezhá
‘my watch’
‘my hair’
‘my knot’
sa
sezá
sa
sesá
46
47
The process 1
An orthography standardization
committee was established to make
decisions about the orthography.
A few decisions involved symbols;
most involved spelling conventions.
The committee considered basic
principles – audience, goals of
writing, transfer from English, …
47
48
The process, continued
The committee identified areas of
concern with the different choices.
People experimented with the different
ways of writing words with these
alternations.
Decision: shallow orthography
Why? Easier to figure out from the
pronunciation
48
49
Beyond the segment:
suprasegmental problems
49
50
Nasalization in Tukanoan
languages
In Tukanoan languages, nasalization is a
property of the morpheme rather than of
individual segments, thus it functions as a
suprasegment and the question quickly arises
as to how it should be represented in the
orthography of these languages
50
51
Analysis of nasalization with
the LC:
Finding metaphors to help speakers understand how nasalization
works in different kinds of languages . . .
‘umbrella’ nasalization –
‘covers’ the entire morpheme
(Tukanoan languages)
‘raincoat’ nasalization –
‘covers’ individual segments
(e.g. in Portuguese)
51
52
Some nasalization proposals
Over time, after analyzing and understanding how nasalization
operates in Tukanoan languages, a number of different proposals for
how to mark nasalization were ‘tested’ by participants in language
workshops. In each case, writing and reading exercises using the
different possibilities were proposed, practiced, and then evaluated.
Eventually, it was collectively decided that:
•Morphemes with nasal consonants (m, n, ñ) require no further
marking, the nasal C being sufficient to identify the morpheme as
+nasal
•In morphemes with no nasal C̃, the first vowel is marked with a
tilda: ṽ to indicate the morpheme as +nasal
52
53
‘New’ sounds:
sounds not distinctly represented
in a known writing system
53
54
‘New’ sounds 1: Dene mid front
open and closed vowels
• Early orthography used the symbol {e} for
both an open and closed front vowel.
• Both these vowels exist in some dialects.
54
55
The process
• A question: Should these vowels be
differentiated?
• The answer: yes!
• Why?
•Accurate representation of sounds of the
dialect
•Ease of reading
55
56
The process: What symbol to
use?
• A new symbol is needed.
• The open vowel is more common than the closed
vowel.
• Choice: symbol {e} for the open vowel; schwa
(‘upside down e’) for the closed vowel
• This choice was made because the open vowel is more
common, and it meant fewer changes in how people
were already writing.
• This decision was a surprise for some of the linguists
involved, but people liked it because they knew that
schwa was used in linguistics.
56
57
‘New’ sounds 2: The case of
Kurtöp
• Kurtöp is a Tibeto-Burman language of Bhutan
• About 15,000 speakers
• Speakers who are literate are usually familiar
with 1) English and 2) Dzongkha
• Roman orthography was a natural product but
the ’Ucen system was suggested by community
and Dzongkha Development Commission
57
58
Which versions of ‘Ucen?
<tshugs.yig> tshui
<mgyogs.yig> joyi
•We opted to begin with joyi, since it was what children learned
and was purely Bhutanese (as opposed to tshui, which is
shared with Tibetan).
58
59
The ’Ucen syllable
In the Classical Tibetan Orthography,
an abugida derived from Brahmi, and
devised in 632 AD, syllables are
represented according to this diagram.
The “R” represents a simple onset, or
in the case of an onset-less syllable,
the vowel. C1, C2, and C4 may be
used to add consonants to the onset,
making it complex. The V slots are for
vowels (i, e, o go above; u goes
below). C3 represents a single coda (if
present) and C5 makes a complex
coda (rarely occurs).
59
60
The ’Ucen syllable
<bsgrubs>
For example, this is how the Classical
Tibetan word /bsgrubs/ was written. The
complex onset is represented by <b> in
C1 position, <s> in the C2 position, <g>
in the root position, and <r> in the C4
position. The vowel /u/ is represented
below the C4. <b> in C3 and <s> in C5
indicate the complex coda.
60
61
The ’Ucen syllable
Traditionally, there is a fixed number
of symbols available for each slot. C1
may be one of five symbols.; R may be
one of 30; C2 may be one three (one of
which is modified from its occurrence
elsewhere); C3 draws from ten possible
symbols; C4 draws from a set of five
(mainly) ‘half’ symbols; and C5 may
be one of two. The top V may be one
of three vowel diacritics and the lower
V is reserved for one diacritic.
In Joyi, various combinations of C2
with R, or C4 with R, lead to unique
symbols reserved for the exclusive
representation of the combination,
similar to ‘conjucts’ in devanagari.
61
62
’Ucen and Tibetan
• Classical Tibetan phonology had around 28
consonants (labial, dental, palatal velar).
• And complex onsets
• And five vowels
• No tone
• ’Ucen was designed for this phonology
62
63
’Ucen and Tibetan
• However, after almost 1,400 years of change,
Lhasa Tibetan (the prescribed standard) has:
• A new series of retroflex consonants
• Two new vowels (front high and mid rounded)
• High and low tonal registers; level and falling
tonal contours
• Changes in voicing/aspiration contrasts
• Simplified onsets
• Words are NOT pronounced as written!
63
64
’Ucen and Bhutan
• The modern use of ’Ucen assumes the 1400 years of
change from Classical Tibetan to modern Lhasa Tibetan.
• ’Ucen is used this way in Bhutan; for example, words
with complex onsets in Classical Tibetan are still written
as such in modern Tibetan/Dzongkha, but not
pronounced as such.
• Representing any pronunciation using ’Ucen entails the
reader to infer the sound change.
• There is no way to represent various aspects of the
phonology – such as the complex onsets – in the history
of Bhutanese education.
64
65
’Ucen and Tibetan
<bsgrubs>
•For example, the
spelling <bsgrubs> is
pronounced: ɖùp
65
66
’Ucen and Kurtöp
• Kurtöp is not a descendent of Classical Tibetan.
• The phonology of Kurtöp is different from the
phonology of Classical Tibetan or Dzongkha.
• Kurtöp tone, vowel length, and complex onsets
are particularly difficult to represent.
• The following is an illustration of how we
chose to represent complex onsets.
66
67
Kurtöp phonology
Kurtöp complex onsets
67
68
The problem
<pr-> is pronounced as a voiceless retroflex, but
in Kurtöp /pra/ = ‘monkey’
68
69
Midwife process
• So what do you do with the previously
unwritten Kurtöp?
• We presented ideas to a small group of literate
Kurtöp speakers;
• Consulted local teachers
• Consulted highly educated speakers of related
languages with similar phonologies
69
70
Midwife process
Idea 1:
Use ’Ucen in
a way similar
to Roman.
<pra>
But the following problem developed:
How to represent vowels other than /ɑ/?
70
71
Midwife process
This would be confused
with /lé/ in Dzongkha/Tibetan
conventions
<ble>
<bele>
This leads people to tend to pronounce the
word correctly, but does not follow the
traditional conventions and is unattractive.71
72
Midwife process
• In 2009 we organized a workshop with the
Dzongkha Development Commission, Scott
DeLancey, local leaders and interested
community members to address all the issues
72
73
Proposed solution
•We will add ‘half’ letters to be used directly below the root
consonant.
•Based on existing (but rarely used) conventions established
in Tibetan to represent different languages.
•Should not affect Dzongkha transference issues
•Aesthetically pleasing
73
•Kurtöp speakers find it intuitive and easy to read
74
Proposed solution –
not whole slide
• Existing computer fonts do not allow the
needed combinations
• Chris Fynn, DDC font developer, agreed
to adapt the Bhutan ’Ucen fonts (joyi and
tshui) to accommodate the new
combinations
• In addition to the complex onsets, the
adapted fonts will be able to mark tone
74
75
Proposed solution
•Tshui font is finished but
the Joyi font has been held
up indefinitely for
unknown reasons.
•In addition to handling
the ‘new’ complex onsets,
we also have a way (marks
above the other symbols in
top row) to mark tone,
another ‘new’ sound.
75
76
Moving forward (the midwife
process continues)
•The Kurtöp/English/Dzongkha
dictionary is expected to be
published in 2013.
•Kurtöp entries will use the new
font and proposed combinations,
in Joyi if it is made soon, or else
using Tshui.
•Testing will continue…
76
Complex scripts
• SIL’s “Non-Roman Script Initiative” (NRSI) works to develop
computer solutions for complex scripts.
(http://scripts.sil.org/cms/scripts/page.php?item_id=Welcome)
Also see scriptsource.org for a participative site.
78
What have we seen – and not
• Linguistically, we looked at orthography choices with
respect to the implications of representing:
• allomorphs, allophones, suprasegmentals, and sounds in the
language that are not represented in a known writing system
• With sounds that are not represented in a known
writing system, different choices might be made, with
different pros and cons to each choice.
• Let's consider Choctaw, which has a voiceless lateral,
IPA symbol /ɬ/.
78
79
Considering implications of symbol
choice and the language's phonology
• Using IPA:
• Pro: linguistic representation, new representation for unfamiliar sound
• Con: font, no transference
• Adapt English symbols:
• Pro: familiar symbols
• Con: symbols used in unfamiliar ways
• We could imagine lh and hl. Choctaw uses both, in different
environments.
• lh (pνlhki 'fast') before a consonant and hl before a vowel (hlampko 'strong')
• Pro: uses familiar symbols, no font challenges
• Con: Confusion with phonemes [h] and [l] with words like (mahli 'wind',
asil.hah 'to request')
79
80
The realities of language:
variation, and standardization
80
81
Orthographic Variation
Discussion questions:
•
Is orthographic variation a problem
and if so, why?
• What kinds of variation are we likely
to encounter?
• What kinds of things can variation
represent?
81
82
Standardization
• What are some of the advantages and
disadvantages of ‘standardization’ or
‘unification’ of an orthography?
82
83
Kinds of variation
Variation at a regional level
Variation at a local level
How can these be dealt with?
83
84
Between community variation: an
example from Dene dialects
South Slavey
-tthí
tth’ih
tha
-dhe
Mountain
-pí
p’ih
fa
-ve
Déline
-kwí
kw’ih
wha
-we
Hare
-fí ‘head’
w’i ‘mosquito’
wa ‘sand’
-we ‘belt’
Should there be a common spelling for the different dialects?
84
85
The process
- Discussion of dialects: systematic
differences
- Discussion of spelling possibilities
- one spelling for all dialects?
- different spellings for each dialect?
85
86
The decision
Write each dialect with its own
symbols (e.g., tth’ih in South Slavey
and w’i in Hare)
Reasons
- transferability from English
- dialect identity
86
87
Within community variation: an
example from Dene
zha
zhú
-zhíi
ya
yú
-yíi
‘snow’
‘clothing’
‘inside’
• Some questions to ask
What might underlie this variation?
Is the variation really free?
87
88
The first decision
We began with a discussion of variation
and the different ways of dealing with it.
The first decision: standardization
-Write zh if it is ever used in that
word.
-If only y is used, write y.
88
89
And the development over time
This did not work in practice
-variation among individuals
-no resource materials
Consequence: Both zh and y are used.
Lesson: Early decisions might have to be
changed based on practice.
89
90
From related dialects to related
languages
• The Dene example shows how different sounds
are treated in closely related varieties.
• What choices might be made in representing
similar sounds in closely related languages?
• One possibility would be to choose the same
symbol.
• Another would be to represent the same sound
in different ways. This is what has happened
in Muskogean languages.
90
91
How to represent similar sounds in
closely related languages?
• The Muskogean languages include Muscogee (Creek),
Seminole Creek, Choctaw, Chickasaw, Alabama and
Coushatta/Koasati.
• All have a phoneme /ɬ/, a voiceless lateral, but the languages
make different orthographic choices.
• Choctaw uses lh (pνlhki 'fast') before a vowel and hl before a
•
•
•
•
•
consonant (hlampko 'strong')
Chickasaw uses lh consistently (hilha 'dance')
Muscogee (Creek) uses r (rvrŏ fish)
Alabama uses ɬ (ɬaɬo 'fish')
Coushatta uses th (thatho 'fish')
Linguists vary in documentation, mostly lh or ɬ
91
92
Writing and Variation in
O'odham
• The O'odham varieties include Tohono O'odham (formerly
Papago), 'Akimel O'odham (formerly Pima), and the Mexican
variety, Sonoran O'otam.
• Multiple writing systems in use, which were developed in a
variety of contexts.
• Tohono O'odham Nation and the Salt River community use the Alvarez
and Hale orthography, which was developed as a linguist-native speaker
collaboration.
• The Saxton orthography is a practical orthography and was tested out with
native speakers, and is used in the Gila River Indian Community.
• The influence of Spanish as a transfer is leading the Sonoran O'otam to
consider another option.
• Linguist Madeleine Mathiot uses yet another system in linguistic
documentation.
92
93
Some differences in the four writing
systems for shared sounds
A&H
Saxton
Mathiot
Sonoran proposal
Long vowels
a:
ah
aa
aa
Palatals
ñ
ni
ñ
ñ
Retroflexes
ḍṣ
d sh
ḍx
sh th
Voiceless vowels ĭ
n/a
ï
n/a
Palatal affricate
c
ch
c
ch
Glottal stop
'
'
ˀ
'
Lateral flap
l
l
l
r
93
94
Sounds which vary across
dialects
• [w] vs. [v]/[ʋ] –
• Alvarez and Hale goes with w
• Saxton goes with w
• Mathiot goes with v
• Sonoran proposal goes with v
• Dialect variation within Tohono O'odham dialects for
certain vowel sequences, like io or eo hiosig vs. heosig
'flower'
• These are acknowledged and both end up being used.
94
Two types of standardization
• “Unilectal” – the most prestigious speech variety is
chosen. The rest adapt to this.
• “Multilectal” – some elements are chosen from several
dialects. No dialect is favored.
• What are some advantages and challenges of each?
95
Pros and cons
• Unilectal
• Advantage – simplicity. Once the dialect is chosen, don’t
have to focus on the others.
• Challenge – picking the dialect! What counts for “most
prestigious?”
• Appropriate when everyone can agree on “the dialect”
• Multilectal
• Advantage – doesn’t favor one group over another.
• Challenge – doesn’t represent anyone’s actual speech
• Appropriate when no clear “prestige dialect”
96
97
Standardization
Standardization often emerges as the
writing system is used; it may not be
the best starting point.
What do potential users want from
writing?
97
98
A review of the method
98
99
Methodology: a review
The ‘midwife’ approach views input from LC as
fundamental, this input consisting of:
• practice (written material produced by the LC
that concretely reveals issues for analysis,
discussion, and decision-making)
• LC insights (about the language itself,
socio-political issues, and their experiences)
• Do members of the LC regularly write/read in any
language? Are writing/reading themselves new
experiences for them? How can these new practices
be expanded and reinforced?
99
100
• The approach also relies on interwoven activities of
analysis leading to periods of experimentation of
whatever ‘decisions’ have been agreed upon, with
ongoing evaluation by the LC in both the roles of
writers and readers.
• The LC may be viewed more broadly, as in Bhutan, in
which the government is necessarily involved.
• Throughout the O.D. process, the linguist should build
ongoing written record with explanations and
examples of the analysis and discussion that went into
each decision.
100
101
The role of the Linguist
interpret LC
feedback,
looking for
clues as to:
• the functional
loads of phonol.
features
•other important
cognitive issues
• interference
issues
•socio-political
issues
monitor and interpret
written input
practice
analysis
evaluation
LC
analysis
choices
organize analysis
and discuss
options
practice
suggest further practice
and record decisions
101
102
Some further issues
functional load
cognitive needs
socio-political issues
technological issues
who is the audience?
102
103
Evaluating the ‘functional load’ of suprasegmental features: examples from Kotiria
In Kotiria, three suprasegmentals are associated to root
morphemes : nasalization, glottalization, and tone
• Minimal pairs are found for all three:
wãhã
do’a
kóró
kórò
‘drag/row’
‘kill’
doa
hu
hũ
maa
ma’a
khòá
khóá
sa’a
sã’ã
waa
wa’a
sóà
sóá
kha
khã
wama
wa’ma
báa
baá
waha
‘smoke’
‘dig’
‘hawk’
‘worm’
‘electric eel’
‘chop’
‘envy’
‘stream’
‘give’
‘name’
‘cook’
‘be small’
‘go’
‘rain’
‘leave’
‘grind’
‘young/new’ ‘decompose’
‘umbrella’
‘part/half’
‘rest’
‘swim’
103
104
However, despite shared phonemic status, each
suprasegmental feature has a different functional load.
This variation is manifested in spontaneous writing and has been
discussed throughout the o.d. process.
nasalization
•
•
•
•
glottalization
***
++ salient
(Roots / Suffixes)
++Min.Pairs
value unaffected by
morphological
processes
• always marked in
spontaneous writing
**
•
•
•
•
+ salient
(Roots only)
++M.Ps
reductions occur in
morphological
processes
• marked most, but not
all, of the time in
spontaneous writing
tone
*
• + salient
• (Roots, few
Suffixes)
• +M.Ps
• melody variable
in morphological
processes
• not marked in
spontaneous
104
writing
105
Recognizing cognitive issues:


In Kotiria root morphemes, internal voiceless Cs are
always pre-aspirated, a regular allophonic variation.
From the purely linguistic perspective, this aspiration
would not need to be represented in the orthography.
Thus, the words
could be written as:
[dahpo] ‘head’
[mahsa] ‘people/beings’
[tuhti] ‘to bark’
[puhka] ‘blowgun’
[dahʧo] ‘day’
dapo
masa
tuti
puka
dacho
105
106
However, given the salience of this aspiration
and the fact that when written, it helps readers
identify the root morpheme in a word, the decision was
made to represent this pre-aspiration in the orthography.
+ articulatory
salience
+ root recognition in reading
Thus, the words
[dahpo]
‘head’
[mahsa] ‘people/beings’
[tuhti] ‘to bark’
[puhka] ‘blowgun’
[dahʧo] ‘day’
are written as:
dahpo
mahsa
tuhti
puhka
dahcho
106
107
Examples of symbolic-political choices
in o.d. in Kotiria
 Use of the symbol ‘k’ over ‘c/q(ui/e) – a macro-level choice, to
distinguish the writing system of the indigenous language from
those of the national languages (Spanish/Portuguese)
 Use of the symbol ‘ʉ’ over ‘ɨ’ – a regional-level choice, to
differentiate the orthography of a minority indigenous language
from that of the locally dominant indigenous language (Tukano
proper)
 Variation between use of the symbols ‘w/v’ among the Kotiria
from different regions – a group-internal choice distinguishing
sub-groups within the Kotiria population
107
An attempt to standardize
• In the 1980’s, the Ghana Alphabet Standardization Committee
was formed to standardize the set of symbols that could be used
in Ghanaian language alphabets.
• Case: [tʃ] sound was written as:
<tʃ> , <ts> ,
<c> , <ch> , <ky> , <tsch>
• Which one to choose? The answer was obvious, both to me (just
observing) and to others on the Committee…
• I thought “of course, <ch>. Why?
• Committee member said “The choice is obvious:
<ky> !”
• That was used in his language, Akan, the biggest language in
Ghana. “Obviousness”: depends on your background.
109
Cognitive/social issues: Phonemic-based
systems might prove unpopular
Choosing English-based writing systems over
phonemic systems
Navajo code talkers
wol-la-chee
shush
moa-si
klizzie
‘ant’
‘bear’
‘cat’
‘goat’
Young and Morgan
dictionary
wóláchíí’
shash
mósí
tliízí
109
110
Familiarity with English-based
systems
Eastern Pomo
phonemic
orthography
káli
do:l
lé:ma
local English
-based orthography
caw lee
‘one’
dole
‘four’
leh ma
‘five’
110
111
One more factor
• What is the writing system for? What does a
writer/reader want from it?
primacy or written text?
valuable information about the speaker?
symbolic system?
something else?
111
112
Writing Systems for Endangered Language
Communities
• Issues when literacy is used for second language
teaching because of transfer effects.
• O'odham has a high central vowel, IPA /ɨ/. All the writing
systems in the U.S. use the symbol e to represent this.
The language uses l to represent a flap (IPA /ɺ /), another
possible point of confusion.
• Muscogee (Creek) uses r for the voiceless lateral
(IPA /ɬ/).
• Can hinder learner awareness of the unique sounds of
the endangered language because of literacy in the
majority language.
112
113
Parameters: Socio-political
• need for community involvement in o.d. process
• acceptability of orthography (locally and in larger
context)
• relationship with dominant language – use of
conventions
• symbolic issues (±differentiation)
• literacy transference issues (±learnability)
• standardization / variation
113
114
Parameters: Techno-linguistic
Representation: what to represent, how to represent it,
where to represent it
• choice of script, symbols, conventions
• identification of phonemes/allophonic
processes/other phonological processes/
morphological processes
• evaluation of functional loads
• evaluation of resources where information can be
registered, if not in the orthography itself (practical
grammar, dictionary, etc.)
114
115
Parameters: Psycho-cognitive
‘Learnability’ (Orthographic depth)
• shallow O: (close to pronunciation)
•
+ learnability for beginners and non-(fluent) speakers
•
- readability (may obscure morpheme identities)
• harder to standardize dialect variation
• deep O: (preserves graphic ID of meaningful elements)
• - learnability for beginners and non-(fluent) speakers
• + readability
• easier to standardize dialect variation
115
More on reading and writing
• Underrepresentation – using fewer symbols
than phonemes that exist in the language
• Can you think of an example?
• Example: Akan (Ghana) has contrastive nasalization
on vowels, contrastive tone, and 9 phonemic vowels.
Tone and nasalization are not marked, and 7 vowels
are represented in the orthography (developed over a
hundred years ago).
• Underrepresentation
• What are the general implications for reading?
• Since you can’t distinguish phonemic contrasts,
reading is more difficult
• For writing?
• Writing could be easier, since you don’t have as many
choices to make
• What can complicate this picture?
• Reading can be more difficult, but context often can
disambiguate, and fluent readers may be able to cope
with this.
• Overrepresentation – using more symbols than
phonemes that exist in the language
• Can you think of an example?
• Koteria <d>, <r> for /d/.
• Choctaw <a>, <v> for /a/.
• All cases where different allophones are represented
• Overrepresentation
• What are the general implications for reading?
• Need to be taught two symbols for a phoneme, but
the shallow orthography can be easier to read
• For writing?
• Writing could be harder, since you have to deliberately
think about which symbol to use.
• What can complicate this picture?
• The salience of different allophones can make a big
difference. If speakers are aware of the allophones, then
fewer problems.
More on Politics: SE Asia
(condensed from Adams, Larin. 2014. Case studies of orthography decision making in
Mainland Southeast Asia. In Cahill & Rice (eds.), Developing Orthography for Unwritten Languages. )
• Scripts are not neutral. Commonly:
• i) use a variation of the national script, sometimes by
governmental decree
• ii) use a romanized script
• Complications
• But languages can cross borders, complicating matters.
Which national script?
• Competing religious identities: Buddhist, Christian, Muslim,
Animist.
Buddhist and Christian (Protestant and/or Catholic) often
have local associations.
Case study: E and H
• A man, “J” was sent to the capital to find help in
developing an orthography, starting a literacy program
(train teachers, provide production workshops, pay for
publishing) and translating the Bible into H. Contacted
SIL, as a known organization.
• At first, no way to verify J as legitimate rep of H. (He
was.)
• J said the project should include H and E (he said E
was a very close dialect of H)
• H had formed a literacy committee
• 3 people from H were invited to a literacy workshop
E and H: money
• After the workshop, participants given funds to
promote literacy in their villages: teaching nonreaders, publication of ‘literate’ by-products such as
calendars or brochures for special events.
• Difficult to monitor how these funds are actually used.
• One effect of participants returning from the workshop
in the capital with money was to create interest.
However, in this case the interest now appears to have
been more about money and less about literacy.
E and H: contact by E
• The next literacy workshop a few months later
included a new delegation of E speakers. They
claimed to represent the E group mentioned by J.
• E had no organizational equivalent of H that could
have deputized this delegation of E speakers.
However, this was not known and they were treated as
co-owners to the language project, in the workshop.
E and H: conflict
• During the workshop, differences developed between H
and E. The E deputation demanded their own project (and
funding). Attempts to mediate failed.
• In retrospect, the E deputation probably cared more about
money than literacy. However, an outside organization like
SIL could not know that and instead opted to fund both legitimizing the E deputation.
• While SIL accepted the E, conflicting information led SIL
to seek more objective evidence by surveying the “E” and
“H” villages.
E and H: survey
• In a survey, one needs willing involvement of the groups. The contact
for the E group eventually agreed to the survey but said that the E
villages should be done last. The survey (a wordlist collection,
collecting some sociolinguistic data, comprehension testing)
proceeded in the H area.
• Surprise: some H villages had a substantial S minority, with only 30%
lexical similarity with H and E. H speakers understood S only if they
had been raised around S speakers. S was clearly another language.
• During survey some S speakers said that they were really H people
and the H were just a splinter of the E people. Soon it was apparent
that some E people were trying to influence the survey outcome by
running ahead and planting information in H villages with S people.
E and H: the rise of S
• The national government has a finite number of categories for
minority groups. H and E both had an official government identity,
but S did not. If S took over H’s identity then it would now be
identifiable to the government and to NGOs like SIL. So the S went
along with the attempt of some E people to skew the survey results.
• Survey found no pure E villages; they are always part of a village
whose majority is another ethnic group – M. Further, E children
primarily speak M.
• There are a number of H-only villages, and a long-standing cultural
committee which has representatives of the major religious groups.
The E group has nothing like this. Further, the S are actually a group
whose language is like the M language.
E and H: Decline into conflict
• For some time both H and E came to literacy workshops—
eventually accompanied by an S group demanding their own
language development project separate from the M language.
• What once looked like a viable single language development
project had now devolved into 4 different groups, at least 2 of
which were probably not represented by legitimate community
members.
• High level of conflict between the groups. This conflict was
either started or accelerated by beginning a language
development project—and the resulting fragmentation actually
is creating disunity, delaying literacy for all the groups.
E and H: End of involvement
• Given this situation and a growing number of legal and
physical threats against SIL personnel if they did not
meet demands of one or more of the groups, SIL
decided to cease working with any of the groups.
• Thus, language development that was stimulated by
external involvement resulted in accentuating division
in a group that needs to work together if it is to survive
in the face of a growing national culture.
E and H: Observations
1. Unity matters
2. Know who you’re working with
3. An orthography cannot extend group identity
beyond any pre-existing political or social
organization.
4. Literacy and orthographic decisions are often a
proxy forum for other social, religious or political
issues.
5. Most of the time money creates more problems than
it solves.
130
Summary: the goals
130
131
End goals of the ‘midwife’
approach
For the LC:
• a practical orthography that is a comfortable tool for both
writers and readers
• a new means of expression developed collectively, with
their own input
• empowerment, incorporating skills and resources for
future decision-making
For the Linguist:
• an experience where some of the ‘heat’ is taken off, but
where creativity is crucial
• a richer analysis, the result of L’s technical knowledge +
LC input
131
132
Some references
Good starting places:
Cahill, Michael, and Keren Rice (eds.) 2014. Developing Orthographies for Unwritten
Languages. Dallas: SIL International.
Grenoble, Lenore and Lindsay Whaley. 2006. Orthography. Chapter 6 in Saving
languages. An introduction to language revitalization. Cambridge: Cambridge
University Press. 137-159.
Hinton, Leanne. 2001. New writing systems. In Leanne Hinton and Ken Hale (editors).
The Green Book of language revitalization in practice. San Diego. Academic Press.
239-250.
Lüpke, Frederike. 2011. Orthography development. In Peter K. Austin and Julia
Sallabank (editors). The Cambridge handbook of endangered languages. Cambridge:
Cambridge University Press. 312-336.
Sebba, Mark. 2007. Spelling and society: The culture and politics of orthography
around the world. Cambridge: Cambridge University Press.
Seifart, Frank. 2006. Orthography development. In Jost Gippert, Nikolaus P.
Himmelmann, and Ulrike Mosel. Essentials of language documentation. Berlin:
Mouton de Gruyter. 275-299.
132
A more exhaustive list can be obtained from the CoLang course website.
133
About us
• Mike Cahill (mike_cahill@sil.org) worked on the Kɔnni orthography in
•
•
•
•
•
Ghana in the 1980’s, and has advised on several African languages since,
especially in the Gur family.
Keren Rice (rice@chass.utoronto.ca) has been working on Dene languages
in northern Canada since the 1970’s, and served on an orthography
standardization committee in the 1980’s.
Colleen Fitzgerald (cmfitz@uta.edu) has been working on Tohono O'odham
for nearly 2 decades, and on Native languages of Oklahoma since 2009.
Gwen Hyslop (gwendolyn.hyslop@anu.edu.au) has been working on
languages in Bhutan since 2006, including development of ’Ucen
orthographies for Bhutan’s endangered languages.
Kris Stenzel (kris.stenzel@gmail.com) has been working on Kotiria and
Wa’ikhana, two Eastern Tukanoan languages spoken in northwestern
Amazonia since 2000.
We welcome your feedback/comments/questions!
133
Download