TALC2014_abstract - Linguistics Notes

advertisement
The BUiD Arab Learner Corpus:
Explaining Second Language Writing
Systems
within
a
Markedness
Framework
Yasemin Yildiz
The British University in Dubai
yasemin.yildiz@buid.ac.ae
1
Introduction
This paper has a two-fold goal. The first goal is
to contribute to the literature of Second
Language Writing Systems (L2WS) by focusing
on the British University in Dubai Arab Learner
Corpus (BALC). The second goal is to
demonstrate the close relationship between
phonology and orthography in L2WS and
critically address the issue of reform in a script.
Unlike previous studies which provide a holistic
and descriptive analysis of all possible spelling
errors of Arabic-speaking learners of English
(e.g. Randall and Groom, 2009; Haggan, 1991;
Hassan, 2010) this study is different in two
kinds: 1) As a first attempt BALC will be
interpreted within a markedness linguistic
framework 2) Particular emphasis will be given
to the erroneous spelling forms which appear in
lexical items with complex onset and coda
clusters at phonological level only (e.g. stamped
[stæmpt]).
The existing theories explaining L2WS
range from Contrastive Analysis Hypothesis
(Lado 1957; herafter CAH), which compared
the areas where the L2 differed from the L1 to
determine what would be difficult for the
learner, to Error Analysis (Corder 1967;
hereafter EA), which advocates looking only at
the developing grammar of the learner to
ascertain where difficulties exist. Moreover,
although both of these theories may be able to
foresee or account for the linguistic difficulties
of the learners, they exhibit two shortcomings.
First, CAH relies only on native language
transfer. Second, Error Analysis misses the
relationship between L1 transfer and universal
processes. As an alternative model, this study
attempts to explain how the Markedness
framework, can also be a useful tool in
modelling the first language and universal
constraints in L2WS. In fact, according to
Spolsky (1989) the markedness condition is
necessary as a linguistic ground for language
learning.
2
Theoretical framework
Trubetzkoy and Jakobson were the first
linguists to introduce the idea of ‘markedness’
in the 1930s and is treated as a languageparticular phenomenon. Trubetzkoy approached
the term markedness within a descriptive
framework and it was initially confined to
phonetics.
Jakobson
(1968),
however,
approached the term markedness within the
perspective of language acquisition. The
underlying principle of Jakobson’s theory is that
there is a universal order of acquisition, largely
based on phonological oppositions and phonetic
properties of segments. Based on the structural
contrasts in his theory, Jakobson suggested that
the unmarked forms would be the earliest
acquired and would also occur in all the world’s
languages.
3
The study
The BUiD Arab Learner Corpus (BALC)
consists of 1,865 texts written by either first
year university students or secondary school
students (year/grade 12 – the last year of
schooling). It comprises 287,227 word tokens
and 20,275 word types. The texts themselves
fall into three types: texts collected by MEd
students in secondary schools, retired first year
university test essays, and texts sourced from
the
Common
Educational
Proficiency
Assessment (CEPA) examinations (All school
students in the United Arab Emirates need to
take CEPA as a university entrance exam). The
scripts were all hand written and then converted
into text files for incorporation into the corpus.
4
Instrumentation and procedure
The misspelling data which exhibit consonant
clusters will be identified and categorized by
using the
Wmatrix3 program (Rayson
2003, Rayson 2005), which is an online
integrated
corpus
linguistic
software
environment in which texts can be loaded and
analyzed for word frequency profiles and
concordances, annotated in terms of part-of-
speech (using the well-known CLAWS tagger,
see Garside et al. 1997) and word-sense
(semantic content and word sense tagger). The
semantic content component, named the
UCREL Semantic Analysis System (or USAS),
contains a multi-tier structure with 21 major
discourse categories.
These 21 categories are further refined and
categorized. A particular refinement within the
'Z' category identifies the unmatched items (or
those items not recognized by the system) and is
categorized as 'Z99'. The data elicitation will be
sourced from the Z99 category, as this category
can identify all the spelling errors and provide
the frequency distribution. The quantitative
analysis will be conducted by using the findings
from the Z99 category. A further qualitative
analysis will be conducted within the
markedness framework.
5
Research questions
This study takes up the following three
questions for investigation:
1) What modification strategies do the learners
use in the production of consonant clusters?
2) To what extent are L2 syllables constrained
by allowable L1 syllable structure and to
what extent do universal principles apply or
even prevail?
3) What is the role of markedness for the
production of consonant clusters?
References
Corder, S. P. 1967. “The Significance of Learners`
Errors”. International Review of Applied Linguistics 5:
161-169.
Garside, R., Leech, G. and McEnery, T. 1997. (eds). The
Computational Analysis of English. London: Longman.
Gnanadesikan, A. E. 2004. “Markedness and faithfulness
constraints in child phonology”. In R. Kager, J. Pater and
W. Zonneveld (eds.) Constraints in Phonological
Acquisition. Cambridge: Cambridge University Press. pp.
73–108. [ROA-76]
Haggan, M. 1991. “Spelling errors in native Arabicspeaking English majors. A comparison between remedial
students and fourth year students”. System 19(1): 45-61.
Lado, R. 1957. Linguistics across cultures: Applied
linguistics for language teachers. University of Michigan
Press: Ann Arbor.
Spolsky, B. 1989. Conditions for Second Language
Learning: Introduction to a General Theory. Oxford
University Press.
Trubetzkoy, N. 1939. Grundzüge der Phonologie
(Principles of Phonology). Travaux du cercle linguistique
de Prague 7.
Randall, M. 2007. Memory, psychology and second
language learning. Philadelphia: Benjamins Publishing
Company.
Randall, M. and Groom, N. 2009. “Introducing the BUiD
Arab Learner Corpus: a resource for studying the
acquisition of L2 English spelling”. In M. Mahlberg, V.
González-Díaz and C. Smith (eds.) Proceedings of the
Corpus Linguistics Conference CL2009, University of
Liverpool, UK, 20-23 July 2009.
Rayson, P. 2003. Matrix: A Statistical Method and
Software Tool for Linguistic Analysis through Corpus
Comparison. Ph.D. thesis, Lancaster University.
Available
online
at http://ucrel.lancs.ac.uk/people/paul/publications/phd20
03.pdf
Rayson, P. 2005. Wmatrix: A Web-based Corpus
Processing Environment. Computing Department,
Lancaster
University.
Available
online
at
http://www.comp.lancs.ac.uk/ucrel/wmatrix/
Yildiz, Y. and Ozek, Y. 2009. The Role of Markedness in
Vocabulary Learning. In Proceeding of the International
Conference of Technology, Education and Development
(ICERI 2009).
Download