Characters with Glow

advertisement
“Geiriau Saesneg yn slipio i fewn”:
Investigating the integration of
English-origin verbs in Welsh
Jonathan Stammers
8 March 2010, Bilingualism Centre
Overview

The Siarad corpus

Code-switching vs. borrowing controversy

Poplack approach: “Nonce Borrowing”

English-origin verbs in Welsh

Analysis: Soft mutation on verbs (2 attempts)

Dealing with word frequency effects

Summary
The Siarad Corpus


40 hours of Welsh/English bilingual speech recorded
& fully transcribed in CHAT format
69 Naturalistic recordings of informal conversations,
typically between 2 speakers, & 30 minutes long;

151 speakers of varying age, sex and background

456,266 words (tokens)

Every word tagged for language


Recordings & transcription done by project team
(Elen Robert, Peredur Davies, Marika Fusser &
myself; Margaret Deuchar – project director)
Freely available to researchers online
Examples in Siarad: Borrowings?
ond mae o mor cheesy mae’n funny yndy ?
“but it’s so cheesy it’s funny isn’t it?” [Fusser29:217]
hynna ’dy’r exam dw i gorod eistedd fory
“that’s the exam I have to sit tomorrow.” [Stammers6: 1273]
Code-switching or Borrowing? Criteria:
Criterion
no more than one word
adaptation: phonological
morphological
syntactic
frequent use
replaces own word
recognised as own word
semantic change
Borrowing
Code-mixing
+
±/+
±/-
+
-
+
+
+
+
+
(Muysken 2000: 73)
Additional Criteria suggested:
“Core/Cultural” distinction: “Cultural” items are not switches
Flagging: self-correction, repetition, hesitation or stammering flags up a switch
Dictionary
Poplack’s approach




Code-switching and borrowing can be
distinguished absolutely
“Free morpheme constraint”  no wordinternal switching
Variationist approach: Comparing morphosyntactic patterning of donor-language items
with native items
“Nonce Borrowing hypothesis”
The Nonce borrowing hypothesis
“One of the goals of these studies is to develop operational criteria
for distinguishing loanwords from codeswitches. Thus, for the
Puerto Rican data, a working hypothesis was that loanwords from
English were phonologicaly, morphologically, and syntactically
integrated into Spanish, were recurrent and widespread, and that
an English word not satisfying these criteria could only occur in
English monolingual discourse or in code-switches from Spanish to
English.
In general, however, borrowing is a much more
productive process and is not bound by all of these constraints. In
particular, phonological integration and the “social” characteristics
of recurrence (in the speech of an individual) and distribution
(across the community) need not be satisfied. This type of
borrowing is sometimes called “nonce” borrowing.”
(Sankoff, Poplack & Vanniarajan 1990: 74)
Study
Language pair
studied
Elements
analysed
Linguistic features studied in
analyses
Conclusion
Sankoff, Poplack
& Vanniarajan 1990
Tamil-English
Lone English
nouns
Case inflections
All are
borrowings
Poplack &
Meechan 1995
Wolof-French;
Fongbe-French
Lone French
nouns
Definite/indefinite reference ; NP
word order
All are
borrowings
Adalar &
Tagliamonte 1998
Turkish-English
Lone English
nouns
Vowel harmony; Plural affixation; NP
word order
All are
borrowings
Budzhak-Jones
1998
Ukrainian-English Lone English
Case inflections
All are
borrowings
Eze 1998
Igbo-English
nouns
Samar & Meechan Persian-English
1998
Lone English
verbs;
Lone English
nouns
Affix distribution; serial
All are
constructions; vowel harmony (verbs) borrowings
; determiners; type of nominal
reference; NP word order (nouns)
Lone English
nouns
Definite/indefinite reference; VP
word order; case inflections
All are
borrowings
Determiners; NP word order; plural
marking; discourse flagging
Most are
borrowings;
Minority are
switches
Turpin 1998
(Acadian) French- Lone English
nouns
English
Arroyo & Tricker
2000
Catalan-Spanish
Lone Spanish
nouns
Definite/indefinite reference; plural
marking; gender
All are
borrowings
Shin 2002
Korean-English
Lone English
nouns
Case inflections
All are
borrowings
Cacoullos &
Aaron 2003
Spanish-English
Lone English
nouns
Determiners
All are
borrowings
English verb insertions (1)
More “established English borrowings”:
pasio (to pass), trio (to try), setlo (to settle), canslo (to
cancel), meindio (to mind), cysidro (to consider)
sut mae o’n cope-io efo (.) hynna i gyd?
“how is he coping with all that?” [Fusser29:635]
pan dach chi’n defnyddio wide-angle lenses dach
chi’n emphasize-io ’r foreground.
“when you use wide-angle lenses, you emphasize the
foreground.” [Fusser17: 792]
English verb insertions (2)
bysai hi’m ’di gwisgo helmet ’sai pen
hi ’di cael ei crush-o to bits
“if she hadn’t worn a helmet, her head would have been
crushed to bits.”
[Robert3: 898]
a mae ’di cael ei ºgonnect-io i’r printer
yr computer, de
“and it’s been connected to the computer printer, right.”
[Roberts2: 627]
English verb insertions (3)
anyway, ges i ’yn gazump-io ar hwnna
“anyway, I got gazumped on that one” [Fusser29:700]
maen nhw’n (.) exfoliate-io chdi gynta
(.) ac yn spwnjo chi drosodd gynta
“they exfoliate you first, and sponge you over first”
[Fusser30:27]
Soft Mutation in Welsh
Soft mutation on verbs: Environments
(1)

After "i" particle
e.g. oedd e’n

[Fusser6:524]
After "ei" possessive (with masculine
subject)
e.g. fyswn

mynd i ºgostio pres
i licio ei ºfenthyg o
[Fusser9:375]
After various other particles: heb, am, cyn,
gan, ar, neu; dy possessive
e.g. sut
mae o am ºfihafio
[Fusser15:510]
Soft mutation on verbs: Environments
(2)

With gwneud (or ddaru) auxiliary +
Subject
e.g. wnest

ti ºdrio?
[Stammers5:708]
After "i" + (non-overt) Subject
e.g. mae’n gwneud
i chdi ºgofio rywbeth dydy?
[Stammers7:139]

After Finite Verb + Subject
e.g. sut
fedra i ºddeud?
[Fusser4:257]
Soft Mutation: Variation
E.g. Welsh verb “cerdded” (to walk):
a maen nhw’n mynd i ºgerdded am
tua dwy, dair milltir
“and they’re going to walk for 2 or 3 miles”
[Roberts2: 32]
But frequently mutation doesn’t happen where expected
(especially in informal spoken Welsh):
a (.) does dim byd i poeni amdano
“and there’s nothing to worry about”
[Fusser14: 40]
Three groups of verbs compared in
st
this study (1 Analysis):

Native Welsh: cofio (remember), defnyddio (use), cwyno
(complain), pwyso (push), cneifio (shear), treiglo (mutate), twtio
(tidy)
talu (pay), penderfynu (decide), poeni (worry), lladd (kill), cwrdd
(meet), cau (close), dal (hold), dechrau (start), cael (have), mynd
(go), gweld (see). [irregular verbs and non –(i)o suffix included]

Listed English: trio, cario, clirio, dreifio, sbwnjo, clariffeio, pinsio,
bargeinio, pipo, dipio, trotio, manejio, tsiecio, titso, protestio,
cidnapio
twtsiad, dripian, [non –(i)o suffix included]

Unlisted English: text-io, download-io, brief-io, quote-io, bulkio, ban-io, bypass-io, crush-o, trample-o, base-io, connect-io,
babysit-io, decorate-io, concentrate-io, mollycoddle-io, powerwalk-io
Method



Text-based searches through corpus (and using
word frequency lists) for possible verbs,
extracting examples where mutation expected
(and where consonant can be mutated!)
Coded each verb as mutated or not
First attempt: used a random sampling
technique to find the native Welsh verbs
Results (First Attempt): (1)
Results (First Attempt): (1)
% Mutation where expected
100
80
60
40
20
0
1-9
10-99
100-999
Frequency per million words of verb (grouped data)
Absolute
Freq.
Freq./million
words
1-4
1000-9999
%Mut. Overall
AVG
Freq
log(AVG
freq)
1-9
34.69%
2.21
0.3452
5-45
10-99
52.68%
16.44
1.2160
46-450
100-999
75.29%
161.56
2.2083
451-4500
1000-9999
89.63%
1962.67
3.2928
0.7752
0.9936
Correlation coefficient with overall % mutation:
Analysis: 1st & 2nd Attempts
Earlier Analysis
Later Analysis
Corpus
46 transcript subset (66%) of Siarad Whole Siarad corpus
corpus; 301,072 word tokens
(69 transcripts; 456,266 word tokens)
Instances
selected
(where soft
mutation
expected)
All English-origin verbs (with any
suffix or none); Sample of native
tokens: 5 randomly distributed tokens
per transcript of any Welsh verbs,
including irregular verbs
466 tokens altogether (230 native
Welsh; 198 listed English; 38 unlisted
English)
All English-origin and native tokens
ending in the –(i)o suffix (regular
verbs only)
No. of verb
types (overall
and by verb
status and
frequency
band)
147 types overall
159 types overall
native Welsh: 65; listed English:
62; unlisted English: 20
native Welsh: 44; listed English:
81; unlisted English: 34
1-9 words per million: 42; 10-99: 54;
100-999: 41; 1000-9999: 10
Verbs starting with /p/,/t/,/k/,/b/,/d/,/m/,
/ɬ/ and /g/ included; /rʰ/ excluded
1-9 words per million: 79; 10-99: 72;
100-999: 7; 1000-9999: 1
Verbs starting with /p/,/t/,/k/,/b/,/d/ and
/m/ included;
Initial
consonants
506 tokens altogether (143 native
Welsh; 302 listed English; 61 unlisted
English)
/ɬ/, /g/ and /rʰ/ excluded
Other Possible Variables: (1) Mutation
Environment
% Mutation where expected
70
65.3%
65.4%
60.5%
62.5%
(A) "i" particle
58.1%
60
50
(B) "gwneud" auxiliary +
Subject
44.8%
40
30
(C) "i" + (non-overt)
Subject
20
10
0
A
B
C
D
E
F
(D) "ei" possessive
Mutation Environment
(E) Fin Verb + Subject
(F) other particle
Other Possible Variables: (2) Initial
Consonant
% Mutation where expected
80
68.8%
70
59.3% 61.5%
60
50
64%
55.8%
47.5%
40
30
20
10
0
b
d
m
p
Initial Consonant of verb
t
k
Three groups of verbs compared in
nd
this study (2 Analysis):



Native Welsh: cofio (remember), defnyddio (use),
cwyno (complain), pwyso (push), cneifio (shear),
treiglo (mutate), twtio (tidy)
Listed English: trio, cario, clirio, dreifio,
sbwnjo, clariffeio, pinsio, bargeinio, pipo, dipio,
trotio, manejio, tsiecio, titso, protestio, cidnapio
Unlisted English: text-io, download-io, brief-io,
quote-io, bulk-io, ban-io, bypass-io, crush-o,
trample-o, base-io, connect-io, babysit-io,
decorate-io, concentrate-io, mollycoddle-io,
power-walk-io
% Mutation where expected
100
90
80
70
60
50
40
30
20
10
0
Results: Second Analysis
1-9
10-99
100-999
1000-9999
Word frequency per million words (grouped values)
Results: First & Second Analyses
250
66%
200
Mutated
Not
Mutated
150
73%
100
50
34%
84%
27%
16%
0
Native
Listed Eng.
Unlisted Eng.
Results: 1st & 2nd Attempts
Earlier Analysis
Later Analysis
%
Mutation
by verb
status
native Welsh 85.6%;
listed English 61.1%;
unlisted English 18.4%
native Welsh 72.7%;
listed English 66.2%;
unlisted English 16.4%
%
Mutation
by
frequency
band
1-9 words per million
34.7%;
1-9 words per million
40.9%;
10-99
52.7%;
10-99
58.9%;
100-999
75.3%;
100-999
74.9%;
1000-9999 89.6%
1000-9999 86.7%
Results (First Analysis)
Results (Second Analysis)
Statistical Testing: 1st & 2nd Analyses
Earlier Analysis
Later Analysis
Results of
statistical
testing
(logistic
regression)
with raw
frequency
values
Raw frequency marginally significant
(p=.044) or not quite significant
(p=.072) as a predictor of mutation,
depending upon baseline category
Raw frequency marginally significant
(p=.042) or not at all significant (p=.682)
as a predictor of mutation, depending
upon baseline category
Differences between all verb
categories significant, including
between native Welsh and listed
English (where p<.0005)
Differences between verb categories
significant, except between native
Welsh and listed English (where
p=.174)
Results of
statistical
testing
(logistic
regression)
with log
values of
frequency
Log frequency significant (p=.019 or
.01 with native and listed English as
baseline respectively) as a predictor of
mutation
Log frequency highly significant as a
predictor of mutation (p=.001) with listed
English as baseline, but not at all
significant (p=.549) with unlisted English
as baseline)
Differences between native Welsh
and unlisted English, and between
listed and unlisted English
significant (p=.019 and .03,
respectively), but difference between
native Welsh and listed English not
at all significant (p=.448)
Differences between native Welsh and
unlisted English, and between listed
and unlisted English highly significant
(p=.001 and .005, respectively), but
difference between native Welsh and
listed English not at all significant
(p=.186)
Summary




English-origin verbs in Welsh – highly productive
(―(i)o) suffix). Almost certainly be considered a
simple case of borrowings according to Poplack
Subset of them based on a dictionary criterion found to
be significantly less integrated morpho-syntactically
(with respect to soft mutation) : could be considered
“switches”
Strong (log-linear) relationship between word frequency
and rate of mutation
This goes against Poplack’s “nonce borrowing”
hypothesis: “nonce” items pattern significantly
differently from “established” items, based on either
dictionary criterion OR frequency
Download