Corpora in language education

advertisement
Corpora in language education
Corpus Linguistics
Richard Xiao
lancsxiaoz@googlemail.com
Aims of this session
• Lecture
– The state of the art of using corpora in language education
– Issues of using corpora in language teaching
– Case study: Using contrastive corpus linguistics to inform
SLA research
• Lab session (Home work)
– Semantic prosody and DDL
Corpus revolution
• An increasing interest since the early 1990s in applying
the findings of corpus-based research to language
pedagogy
 10 well-received biennial international conferences
Teaching and Language Corpora (TaLC, 1994-2012)
 At least 30 authored or edited books, covering a wide
range of issues concerning the use of corpora in language
pedagogy, e.g. corpus-based language descriptions,
corpus analysis in classroom, and learner corpus research
Wichmann et al (1997), Partington (1998), Bernardini (2000), Burnard and
McEnery (2000), Kettemann and Marko (2002, 2006), Aston (2001),
Ghadessy, Henry, and Roseberry (2001), Hunston (2002), Granger et al
(2002), Connor and Upton (2002), Tan (2002), Sinclair (2003, 2004), Aston
et al (2004), Mishan (2005), Nesselhauf (2005), Römer (2005), Braun, Kohn
and Mukherjee (2006), Gavioli (2006), Scott and Tribble (2006), Hidalgo,
Quereda and Santana (2007), O’Keeffe, McCarthy and Carter (2007),
Aijmer (2009), Bennett (2010), Campoy, Gea-valor and Belles-Fortuno
(2010), Cunningham (2010), Harris and Jaén (2010), Jaén,Valverde and
Pérez (2010), Reppen (2010), Volodina (2010)
Corpus revolution
• Books published in China
– 杨达复(2000), 濮建忠(2003), 何安平(2004a,
2004b), 华南师范大学外国语学院(2005), 卫乃兴,
李文中, 濮建忠(2005), 杨惠中(2005), 王立非, 梁
茂成等(2007)
Teaching and corpora: A convergence
• Leech’s (1997) three focuses of the convergence
– Indirect use of corpora in teaching (e.g. reference
publishing, materials development, language testing, and
teacher training)
– Direct use of corpora in teaching (e.g. teaching about,
teaching to exploit, and exploiting to teach)
– Development of teaching-oriented corpora (e.g. LSP and
learner corpora)
• Corpus analysis can be illuminating ‘in virtually all
branches of linguistics or language learning’ (Leech 1997:
9)
Direct vs. indirect uses
• Indirect uses
– Largely relating to what to teach
• Direct uses
– Primarily concerning how to teach
• Development of teaching oriented corpora
– Can relate to both
Reference publishing
• Corpus revolution in reference books (at least for English)
– Nearly unheard of for dictionaries and reference grammars
published since the 1990s not to claim to be based on corpus
data; even people who have never heard of a corpus are using
the product of corpus research
• Corpus-based dictionaries
– Learner dictionaries
– Frequency dictionaries
• Corpus-based reference grammars
– Longman Grammar of Spoken and Written English
– Collins COBUILD series
– Hunston’S Pattern Grammar
Syllabus design and materials development
• Previous research has demonstrated that the use of
grammatical structures in TEFL textbooks differs
considerably from the use of these structures in
native English
– ‘a kind of school English which does not seem to exist
outside the foreign language classroom’ (Mindt 1996: 232)
• The order in which those items are taught in noncorpus-based syllabi ‘very often does not correspond
to what one might reasonably expect from corpus
data of spoken and written English’ (ibid: 245-6)
Syllabus design and materials development
• Corpora can be useful in this area - a simple yet
important role of corpora in language teaching is to
provide more realistic examples of language usage
reflecting the nuances and complexities of natural
language
• Corpora can also provide data, especially frequency data,
which may further impact on what is taught, and in what
order
• Touchstone book series (McCarthy et al 2005-2006)
– Based on the Cambridge International Corpus
– Aiming at presenting the vocabulary, grammar, and language
functions that students encounter most often in real life
Syllabus design and materials development
• Hunston (2002: 189): ‘The experience of using corpora
should lead to rather different views of syllabus design.’
• The Lexical Syllabus (Willis 1990), as implemented in the
Collins COBUILD English Course (Willis, Willis and Davids
1988-1989)
– Three focuses of a lexical syllabus: ‘(a) the commonest word
forms in a language; (b) the central patterns of usage; (c) the
combinations which they usually form’ (Sinclair and Renouf
1988)
– Not a syllabus for vocabulary items only, but rather covering
‘all aspects of language, differing from a conventional
syllabus only in that the central concept of organization is
lexis’ (Hunston 2002: 189)
Language testing
• An emerging area of language teaching which has started
to use the corpus-based approach
• Alderson (1996) envisaged the following possible uses of
corpora in language testing
– test construction, compilation and selection, test
presentation, response capture, test scoring, and
calculation and delivery of results
– ‘The potential advantages of basing our tests on real
language data, of making data-based judgments about
candidates’ abilities, knowledge and performance are clear
enough. A crucial question is whether the possible
advantages are born out in practice’ (Alderson 1996: 258259)
Language testing
• The concern raised in Alderson’s conclusion appears to
have been addressed satisfactorily 10 years later
– Nowadays, computer-based tests are considered to be comparable to
paper-based tests (cf. Choi, Kim and Boo 2003), as exemplified by
computer-based versions of TOFEL tests
• Major test service providers like UCLES have recently used
corpora in testing (cf. Ball 2001; Hunston 2002: 205)
–
–
–
–
–
–
As an archive of examination scripts
To develop test materials
To optimize test procedures
To improve the quality of test marking
To validate tests
To standardize tests
Teacher development
• Corpora have been used recently in language teacher
training to enhance teachers’ language awareness and
research skills
– Rationale: For students to benefit from the use of corpora,
teachers must first of all be equipped with a sound knowledge
of the corpus-based approach
• The integration of corpus studies in language teacher
training is only a quite recent phenomenon (cf. Chambers
2007)
– It may take more time, and ‘perhaps a new generation of
teachers, for corpora to find their way into the language
classroom’ in secondary education (Braun 2007: 308)
Direct uses of corpora
• Leech’s (1997) three direct uses of corpora in teaching
– 1) Teaching about
• Teaching corpus linguistics as an academic subject
– Part of the curricula for linguistics and language related degree
programs at both postgraduate and undergraduate level
– 2) Teaching to exploit
• Providing students with ‘hands-on’ know-how so that they
can exploit corpora as student-centred learning activities
– 3) Exploiting to teach
• Using the corpus-based approach to teaching language and
linguistics courses, which would otherwise be taught using
non-corpus-based methods
• (1) and (3) are mainly associated with language / linguistics
programmes
From three P’s to three I’s
• The traditional three-P approach
– Presentation – Practice – Production
• The exploratory three-I approach (cf. Carter
and McCarthy 1995)
– Illustration: looking at real data
– Interaction: discussing and sharing opinions and
observations
– Induction: making one’s own rule for a particular
feature, which ‘will be refined and honed as more
and more data is encountered’ (ibid 1995: 155)
Data-driven learning (DDL)
• Direct use of corpora in pedagogy is essentially DDL
• Johns (1991): ‘research is too serious to be left to the
researchers’
– The language learner should be encouraged to become ‘a
research worker whose learning needs to be driven by
access to linguistic data’ (Johns 1991)
• Johns (1997: 101) compares the learner to a language
detective: ‘Every student a Sherlock Holmes!’
• His DDL website gives some very good examples of datadriven learning
– www.lancs.ac.uk/fass/projects/corpus/Kibbitzers/Kibbitzers.chw
Data-driven learning (DDL)
• The DDL approach involves three stages of inductive
reasoning with corpora (Johns 1991)
– Observation (of concordanced evidence)
– Classification (of salient features)
– Generalization (of rules)
• Roughly corresponding to Carter and McCarthy’s
(1995) three I’s in the exploratory corpus-based
approach, but fundamentally different from the
traditional three-P approach
– Three-P approach: top-down deduction
– Three-I / DDL approach: bottom-up induction
Data-driven learning (DDL)
• Can be either teacher-directed or learner-led (i.e.
‘discovery learning’) to suit the needs of learners at
different levels, but basically learner-centred
• Leech (1997: 10): The autonomous learning process ‘gives the student
the realistic expectation of breaking new ground as a “researcher”,
doing something which is a unique and individual contribution’
• This is true of advanced learners only!
• The key to successful data-driven learning is the
appropriate level of pedagogical mediation depending on
the learners’ age, experience, and proficiency level, etc
o
‘A
corpus is not a simple object, and it is just as easy to derive
nonsensical conclusions from the evidence as insightful ones’
(Sinclair 2004: 2)
Direct uses: Current situation
• So far confined largely to learning at more advanced
levels, especially in tertiary education
• Almost absent in general ELT classroom, e.g. secondary
education (and in the teaching of other foreign languages
at all levels)
– Learners’ age, level and experience
– Time constraints and curricular requirements
– Knowledge and skills required of teachers for corpus analysis
and pedagogical mediation
– Access to appropriate resources such as corpora and tools
– …or indeed probably a combination of all of these factors
LSP corpora vs. professional communication
• Third focus of convergence: Development of teachingoriented corpora: LSP, parallel, and learner corpora
• Teaching of language for specific purposes and professional
communication can benefit greatly from domain- or genrespecific specialized corpora both directly and indirectly, e.g.
– Coxhead’s (2000) Academic Word List (AWL)
– Paul Nation’s Range and GSL/AWL
• http://www.victoria.ac.nz/lals/about/staff/paul-nation
– Biber’s (2006) comprehensive analysis of university language based on
the TOEFL 2000 Spoken and Written Academic Language Corpus
– McCarthy and Handford’s (2004) exploration of pedagogical
implications regarding spoken business English on the basis of the
Cambridge and Nottingham Spoken Business English Corpus (CANBEC)
LSP corpora vs. professional communication
• Specialized corpora in translation teaching
– ‘Large corpora concordancing’ (LCC) can help students
to develop ‘awareness’, ‘reflectiveness’ and
‘resourcefulness’, the skills that distinguish a translator
from those unskilled amateurs (Bernardini 1997)
– Corpora help trainee translators become aware of
general patterns and preferred ways of expressing things
in the target language, get better comprehension of
source language texts and improve production skills
(Zanettin 1998)
– Comparable and parallel corpora in translation studies
Parallel concordancing
• Parallel corpora and parallel concordancing
are particularly useful in translation teaching
• They can also aid the so-called ‘reciprocal
learning’ (Johns 1997)
– i.e. two language learners with different L1
backgrounds are paired to help each other learn
their language
Learner corpora
• Welcomed as one of the most exciting recent
developments in corpus-based language studies
• For indirect use, they have been explored to inform
curriculum design, materials development and
teaching methodology (cf. Keck 2004)
• For direct use, they provide a bottom-up approach to
language teaching and learning - as opposed to the
top-down approach with native corpora of the target
language (Osborne 2002)
Learner corpora
• Can also provide indirect, observable, and empirical
evidence for the invisible mental process of language
acquisition and serve as a test bed for hypotheses
generated using the psycholinguistic approach in SLA
research
• Provide an empirical basis enabling the findings
previously made on the basis of limited data of a handful
of informants to be generalized
• Have widened the scope of SLA research so that
interlanguage research nowadays treats learner
performance data in its own right rather than as
decontextualised errors in traditional error analysis (cf.
Granger 1998: 6)
Ongoing debate: Frequency & authenticity
• Often considered as two of the most important
advantages of using corpora
• Also the targets of criticism from language pedagogy
researchers
– Corpus data impoverishes language learning by giving
undue prominence to what is simply frequent at the
expense of rarer but more effective or salient expressions
(Cook 1998)
– Corpus data is authentic only in a very limited sense in that
it is de-contextualized – genuine but not authentic
(Widdowson 1990, 2000, 2003)
– …flawed arguments
Frequency
• ‘Using corpus data not only increases the chances of
learners being confronted with relatively infrequent
instances of language use, but also of their being able to
see in what way such uses are atypical, in what contexts
they do appear, and how they fit in with the pattern of
more prototypical uses’ (Osborne 2001: 486)
• ‘Frequency ranking will be a parameter for sequencing and
grading learning materials’ because ‘frequency is a measure
of probability of usefulness’ and ‘high-frequency words
constitute a core vocabulary that is useful above the
incidental choice of text of one teacher or textbook author’
(Goethals 2003: 424)
Frequency
• Do you agree?
– ‘What is frequent in language will be picked up by
learners automatically, precisely because it is
frequent, and therefore does not have to be
consciously learned’ (Kaltenböck and MehlmauerLarcher 2005: 78)
• This is not true, however – cross-linguistic difference
– Determiners such as a and the are certainly very
frequent in English, yet they are difficult for Chinese
learners of English because their mother tongue does
not have such grammatical morphemes and does not
maintain a count-mass noun distinction
Frequency
• Frequency ‘should be only one of the criteria used to
influence instruction’; ‘the facts about language and
language use which emerge from corpus analyses
should never be allowed to become a burden for
pedagogy’ (Kennedy 1998: 290)
–
–
–
–
–
–
overall teaching objectives
learners’ concrete situations
cognitive salience
learnability
generative value
teachers’ intuitions
Frequency
• It would be inappropriate for language teachers, syllabus
designers, and materials writers to ignore ‘compelling
frequency evidence already available’ (Leech 1997: 16)
– ‘Whatever the imperfections of the simple equation “most
frequent” = “most important to learn”, it is difficult to deny that
frequency information becoming available from corpora has an
important empirical input to language learning materials.’
– Lech, G. (2011) ‘Why frequency can no longer bw ignored in
ELT’. 外语教学与研究 2011(1).
• Frequency can at least help syllabus designers, materials
writers and teachers alike to make better-informed and
more carefully motivated decisions (cf. Gavioli and Aston
2001: 239)
Authenticity
• Corpus data are authentic by definition
• Widdowson (1990, 2000) questions the use of
authentic texts in language teaching
– Authenticity of language in the classroom is ‘an
illusion’ (1990: 44) because even though corpus
data may be authentic in one sense, its
authenticity of purpose is destroyed by its use
with an unintended audience of language learners
Authenticity
• Widdowson’s (2003) distinction between
genuineness (features of text as a product) vs.
authenticity (features of discourse as a process)
– Corpora are genuine in that they comprise attested
language use, but they are not authentic for language
teaching because their contexts (as opposed to cotexts) have been deprived
– Implication?
• Only language produced for imaginary situations in the
classroom is ‘authentic’
Authenticity
• Product (text) vs. process (discourse)
– Interesting but not always useful
– Using product as evidence for process may not be less
reliable; sometimes this is the only practical way of
finding about process (cf. Stubbs 2001)
• Stubbs (2001) draws a parallel between corpora in
corpus linguistics and rocks in geology
– ‘both assume a relation between process and product.
By and large, the processes are invisible, and must be
inferred from the products.’
Authenticity
• Like geologists who study rocks (products)
because they are interested in geological
processes (e.g. earthquakes, volcanoes) to
which they do not have direct access, SLA
researchers can analyze learner performance
data (products) to infer the inaccessible
mental process of SLA
Authenticity
• If we do follow Widdowson’s distinction…
– Genuine: attested
– Authentic: occurring in real communicative
context
• …are the imaginary situations conjured up for
classroom teaching authentic?
– Do they occur in real communicative context?
– When students are learning and practising a
shopping ‘discourse’, are they actually doing
shopping?
Authenticity
• Furthermore, invented examples often do not reflect
nuances and complexities of real usage (Fox 1987)
– Students who have been taught ‘school English’
cannot readily cope with English used by native
speakers in real life (Mindt 1996: 232)
• ‘The preference for “authentic” texts requires both
learners and teachers to cope with language which
the textbooks do not predict’ (Wichmann 1997: xvi)
– Corpora are useful for this purpose
Corpus-based pedagogy: Today
• Currently, corpora appear to have played a
more important role in helping to decide what
to teach (i.e. indirect uses) than how to teach
(i.e. direct uses)
– Indirect uses of corpora seem to be well
established
– Direct uses of corpora in teaching are largely
confined to tertiary education and are nearly
absent in general language classroom
From today to tomorrow
• If corpora are to be further popularised to
more general language teaching context, there
are two priorities in near future
– Corpus linguists must create and facilitate access to
corpora that are pedagogically motivated, in both
design and content, to meet pedagogical needs and
curricular requirements so that corpus-based learning
activities become an integral part, rather than an
additional option, of the overall language curriculum
– Language teachers should be provided, through preservice training or continued professional
development, with the required knowledge and skills
for corpus analysis and pedagogical mediation of
corpus-based learning activities
Corpus-based pedagogy: Tomorrow
• If these two tasks are accomplished, it is my
view that corpora will not only ‘revolutionize
the teaching of grammar’ in the 21st century
as Conrad (2000: 549) has predicted, they will
also fundamentally change, with the aid of a
new generation of teachers, the ways we
approach language teaching, including both
what is taught and how it is taught
Using CCL to inform SLA
• Introducing Contrastive Corpus Linguistics (CCL)
• Presenting a brief summary of the relevant findings in
a corpus-based contrastive study of passives in English
and Chinese (Xiao, McEnery and Qian 2006)
• Exploring passives in the Chinese learner English
Corpus (CLEC) in comparison with a comparable native
English corpus
Contrastive corpus linguistics
• Contrastive analysis (CA)
– Recognised as an important part of foreign language
teaching methodology following WWII
– Dominant throughout the 1960s
– But soon lost ground to more learner-oriented approaches
such as error analysis, performance analysis and
interlanguage analysis
– Revived in the 1990s
• …largely thanks to the advances of the corpus methodology, which
is inherently comparative in nature (Salki 2002, Xiao 2011)
• Contrastive Corpus Linguistics brings together the
strengths of contrastive analysis and corpus analysis
Contrastive corpus linguistics
• Parallel vs. comparable corpora
– Parallel corpus: source texts plus translations
– Comparable corpus: different native languages
sampled with comparable sampling criteria and
similar balance
• Can parallel corpora be used in contrastive studies?
– ‘translation equivalence is the best available basis of
comparison’ (James 1980: 178)
– ‘studies based on real translations are the only sound
method for contrastive analysis’ (Santos 1996: i)
Contrastive corpus linguistics
• Translated language is merely an unrepresentative
special variant of the target native language which is
perceptibly influenced by the source
language...unreliable for contrastive analysis if relied
upon alone
– Baker 1993; Gellerstam 1996; Teubert 1996; Laviosa 1997;
McEnery and Wilson 2001; McEnery and Xiao 2002;
McEnery and Xiao 2007; Xiao and Yue 2009, Xiao 2010,
2011, 2012
• In contrast, comparable corpora are well suited for
contrastive study as they are unaffected by
translationese
Contrastive corpus linguistics
Comparable corpora in this study
• Two English corpora
– Freiburg-LOB (FLOB)
– BNCdemo (4 M words of conversations)
• Two Chinese corpora
– Lancaster Corpus of Mandarin Chinese (LCMC)
– LDC CallHome Mandarin Transcripts: 300K words
• English and Chinese data are comparable in
compositions and sampling periods
– Providing a reliable basis for the cross-linguistic contrast
of passives in the two languages
English vs. Chinese passives (1)
• Ten times as frequent in
English as in Chinese
1200
– Dynamicity
– Pragmatic meaning
– Different habitual
tendency
– Unmarked notional
passives
1000
800
600
400
200
0
English
Chinese
• Chinese learners of
English are very likely to
underuse passives in their
interlanguage
English vs. Chinese passives (2)
• Passive formation
– English passives
• Auxiliary be/get followed by a past participial verb
– Chinese passives
•
•
•
•
•
•
Passivised verbs do not inflect morphologically
Also the notion of auxiliary verbs is less salient in Chinese
Syntactic passives (e.g. 被, 叫, 让)
Lexical passives (e.g. 挨, 受(到), 遭(到))
Unmarked notional passive and topic sentences (topic + comment)
Special structures (e.g. disposal 把 and predicative 是…的)
• Choice of correct auxiliaries and proper inflectional forms of
passivised verbs can constitute a difficult area for Chinese
learners to acquire English passives
English vs. Chinese passives (3)
• Long vs. short passives
• Short passives are predominant in English (over 90% in speech
and writing)
– Often used as a strategy that allows one to avoid mentioning the agent
when it cannot or must not be mentioned
• 3 out of 5 syntactic passive markers in Chinese (为…所, 叫, 让)
only occur in long passives
• For 被 and 给 passives, proportions of short forms (60.7% and
57.5% respectively) are significantly lower than in English
– The agent must normally be spelt out at early stages of Chinese,
though the constraints have become more relaxed
• Chinese learners of English are expected to overuse long
passives and underuse short passives
English vs. Chinese passives (4)
• Chinese passives are more
frequently used with an
inflictive meaning
100%
4.7%
10.7%
Percent
80%
60%
37.8%
80.3%
40%
51.5%
20%
0%
15.0%
English be passives
Chinese bei passives
Language
Negative
Neutral
Positive
– Chinese passives were used
at early stages primarily for
unpleasant or undesirable
events (bei, “suffer”)
• Marking negative pragmatic
meanings is not a basic feature
of the English passive norm
(be passives)
– Get-passives sometimes
(37.7% of the time) refer to
undesirable events
• Chinese learners are more
likely to use English passives
for undesirable situations
Interlanguage of Chinese learners
• CLEC (learn data): the Chinese Learner English Corpus
– One million words
– Essays
– Five proficiency levels (high school students and university
students)
– Fully annotated with learner errors using a tagset of 61
error types clustered in 11 categories
• LOCNESS (control data): the Louvain Corpus of Native
English Essays
– ca. 300,000 words
– Essays
– British A-Level children and British and American university
students
• Roughly comparable in terms of task type, learner age
and sampling period
Underuse of passives
Corpus
CLEC
LOCNESS
Words
Passives
Frequency
per 100K
words
1,070,602
9,711
907
324,304
5,465
1,685
LL score
p value
LL=1235.6
1.d.f.
p<0.001
Long vs. short passives
• As can be expected from the contrastive analysis, in
comparison with native English writing, long passives
are more frequent in Chinese learner English
– Long passives in CLEC
• 9.14%: 888 out of 9,711
– Long passives in LOCNESS
• 8.44%: 461 out of 5,465
• ...the difference is marginal and not statistically
significant
– LL=2.184, 1 d.f., p=0.139
Pragmatic meanings
• Passives are more
frequently negative in
Chinese learner English
100%
– CLEC
4.4%
5.9%
Percent
80%
60%
68.4%
78.8%
40%
20%
Positive
Neutral
Negative
25.7%
16.8%
CLEC
LOCNESS
0%
Corpus
• Negative: 25.7%
• Positive: 5.9%
• Neutral: 68.4%
– LOCNESS
• Negative: 16.8%
• Positive: 4.4%
• Neutral: 78.8%
– LL=7.4, 2 d.f., p=0.025
• Consistent with earlier
finding (50.5% vs. 15%)
Frequency per 200,000 words
Passive errors vs. learner levels
250
200
Aux. errors
150
Misformation
Misuse
100
Underuse
All error types
50
0
ST2
ST3
ST4
Learner level
ST5
ST6
Error types vs. learner levels
• Error types are associated with learner levels when the dataset is
taken as a whole
– LL=51.774, 12 d.f., p<0.001
• But similar learner groups also show similar error types
–
–
–
–
ST2 >> ST3: statistically significant (LL=27.303, 3 d.f., p<0.001)
ST3 >> ST4: not significant (LL=6.955, 3 d.f., p=0.073)
ST4 >> ST5: statistically significant (LL=18.563, 3 d.f., p<0.001)
ST5 >> ST6: not significant (LL=6.987, 3 d.f., p=0.072)
ST2
ST3/ST4
ST5/ST6
(High
(Junior/Senior
(Junior/Senior
school
students)
non-English
major students)
English major
students)
Underuse errors
• Likely to be a result of L1 transfer, as can be predicted from
results of cross-linguistic contrast and confirmed by the
learner-native corpus comparison
• Typically occur with verbs whose Chinese equivalents are not
normally used in passives, e.g.
– A birthday party will hold in Lily’s house. (ST2)
– The woman in white called Anne Catherick. (ST5)
• Also occur under the influence of the Chinese topic sentence
– The supper
had done. (ST2)
晚饭
<*bei>
做好
了
supper
<*PASS> cook-ready ASP
topic
comment
Misuse errors
• 1) Intransitive verbs used in passives, e.g.
– A very unhappy thing was happened in this week. (ST2)
– I was graduated from Zhongshan University (ST5)
• 2) Misuse of ergative verbs, e.g.
– …the secince <sic science> is developed quickly (ST4)
• 3) Training transfer (overdone passive training in classroom
instructions), e.g.
– …many machine <sic machines> and appliance <sic appliances> are
used electricity as power (ST5)
– Because they have been mastered everything of this job… (ST4)
Misformation errors
• Possibly a result of L1 interference
• Related to morphological inflections
– Passivised verbs do not inflect in Chinese
• Chinese learners tend to use uninflected verbs or
misspelt past participles in passives, e.g.
– His relatives can not stop him, because his choice is
protect by the laws. (ST6)
– Since the People’s Republic of china <sic China> was found
on October 1, 1949… (ST2)
Auxiliary errors
• Related to omission and misuse of auxiliaries
• A result of L1 interference
– Auxiliaries are not a salient linguistic feature in Chinese
• Chinese is not a morphologically inflectional language
• Chinese learners tend to omit or misuse auxiliaries in
passives, e.g.
– In China, since the new China <sic was> established,
people’s life has goten <sic gotten> better and better. (ST3)
– I am not a smoker, but why do <sic are> we forced to be a
second-hand smoker? (ST5)
Case study summary
• The learner’s performance in interlanguage can be
predicted, diagnosed, and accounted for from the
perspective of Contrastive Corpus Linguistics
• The integrated approach that combines contrastive
analysis (CA) and contrastive interlanguage analysis (CIA)
is an indispensable tool in SLA research
– Granger (1998: 14): ‘if we want to be able to make firm
pronouncements about transfer-related phenomena, it is
essential to combine CA and CIA approaches.’
• 语料库与语言教育. 中国外语教育 2008(5)
• 语料库在语言教学中的运用. 浙江大学学报(人文社科版)
2010(6)
Lab: semantic prosody and DDL
• Sentence (a) was produced by a Chinese-speaking postgraduate of
tourism, which Tim Johns suggested revising as (b). Why? Can you
provide evidence from available corpora to support your answer
and revise (c-e) from CLEC?
– (a) Although economic improvement may be caused by tourism, the
investment and operational costs of tourism must also be considered.
– (b) Although tourism may lead to economic improvement, the investment
and operational costs of tourism must also be considered.
– (c) The city caused him great interest, caused all citizens to grasp time and
chances, to work for a better life.
– (d) <...> there are a lot of advantages are caused by them.
– (e) During the past fifty years, the political, economic, and social changes in
China have caused dramatic changes in people’s lives.
• BNCweb (collocation) or FLOB
Download