a corpus-based study of young learners` written narratives

advertisement
Discriminating CEFR levels in Greek
L2: a corpus-based study of young
learners’ written narratives
Giagkou Maria
Kantzou Vicky
Stamouli Spyridoula
Tzevelekou Maria
Learner Corpus Research Conference, Bergen, Norway, September 27, 2013
Background
• Specification of CEFR functional descriptions: criterial features
(specific lexical and grammatical features used differently by L2
learners at different proficiency levels)
• Cambridge English Profile Programme (Hawkins & Filipovic 2012,
Hawkings & Buttery 2010)
• CEFR proficiency levels and L2 acquisition of various linguistic features
• SLATE network (Second Language Acquisition and Testing in Europe)
• Different languages: Dutch, Italian and Spanish (Kuiken, Vedder & Gilabert
2010), Finnish (Alanen, Huhta & Tarnanen 2010, Martin et al. 2010), French
(Forsberg & Bartning 2010, Prodeau et al. 2012), Norwegian (Carlsen 2010)
• Mostly adult learners, smaller number of studies on young L2 learners
(Pallotti 2010).
Research objectives
• Identification of criterial properties for Greek as L2 -> specification of the CEFR
proficiency levels with respect to the linguistic features of Greek -> help educators and
researchers discriminate the language production of each level from the production of
adjacent levels.
• Focus:
• young L2 learners of Greek enrolled in Greek state schools (immigrants and indigene
minorities): 18% of students nowadays are immigrants or repatriated Greeks learning Greek as
L2 (Gropas and Triandafyllidou (2011)
• written narratives because children are familiarized with from an early age; and because
narratives have been widely investigated in L1 and L2 acquisition.
• Investigation of the developing narrative ability at micro- and macro- level, as indicated
by:
•
•
•
•
•
•
Narrative length
Clause Subordination
Discourse markers
Modifiers
Grammatical accuracy
Lexical density
• Previous research findings in Greek L1 and L2 acquisition (Kantzou 2010, 2012, Stamouli
2010, Varlokosta & Triantafyllidou 2003), and in evaluation of Greek text difficulty
(Giagkou 2012)
Elicitation task and level allocation
• Two writing tasks performed by ca. 1200 immigrant and repatriated children (October 2011 to
February 2012):
• a narrative based on the Cat Story picture series (Hickman 2003)
• a letter or diary entry
• Two evaluators placed each student at a CEFR level on the basis of the two written productions
• Rating was based on CEFR descriptors and more specifically on the Overall Written Production,
Creative Writing and Lexical, Grammatical and Orthographic Competence scales.
Corpus
• Narratives based on the on the Cat Story picture series (letters and diary entry excluded
from further analysis)
• Only narratives placed at the same level by both evaluators are included
• Corpus of 150 scripts (9742 tokens). Levels A2, B1 and B2 are represented in the corpus,
with 50 scripts in each level.
scripts
tokens
clauses
N
N
Mean
(std)
Min
Max
N
Mean
(std)
Min
Max
A2
50
2.384
47,68
(14,34)
19
83
511
10,22
(3,19)
4
18
B1
50
3.193
63,86
(13,2)
31
95
654
13,08
(3,2)
8
22
B2
50
4.165
83,30
(22,19)
53
181
842
16,84
(4,11)
9
33
150
9.742
64,95
(22,37)
19
181
2.007
13,38
(4,43)
4
33
Total
Sample
• 150 primary school pupils
•
•
•
•
83 boys, 67 girls
grades 3 to 6 (aged 8-14)
different linguistic backgrounds, mainly Albanian (49%) and Russian (15%)
resided in geographically diverse regions of Greece
2 5
1
13
43
6
7
5
5
7
5
1
2
10
1 28
1
3
2
2
1
Transcription and annotation
• Manual transcription: a) a version preserving learner’s spelling, and b)
a corrected version
• Clause separation: clause expresses a single situation and has one
predicate (Berman and Slobin 1994).
• Annotation:
•
•
•
•
Type of clause
Clitics within the verb frame
Adjectives and adverbs
Discourse markers
Transcription and annotation
Type of clause
• Independent
• Dependent
• Relative clauses
• Complement clauses
• Clauses of purpose
• Clause of cause
• Clause of time
• Center-embedding
mia mera // mia γata citaksa kala kala ta mikra
pulacia
[pu i mitera iχe pai]
[na vri trofi]
One day // a cat looked at the little birdies
[that the mother had left]
[to find food]
Transcription and annotation
Clitics
• Clitics within the verb frame
• Appropriate use:
ce o scilos tis δaγose tin ura tis (A2)
and the dog bit its (=the cat’s) tail
• Inappropriate use:
ce i γata arχize na treχi ce o scilos ton ciniγuse (B1)
and the cat started running and the dog was chasing him*
(inappropriate gender marking)
Transcription and annotation
Adjectives and adverbs
• Adjectives
• Descriptive:
itan ena mikro spitaci pu iχe mikra pulacia (A2)
there was a little house that had little birdies
• Evaluative:
i kakurγa γata pinuse (B2)
the wicked cat was hungry
• Adverbs
• Descriptive:
i γata skarfani apano sto δendro (A2)
the cat was climbing on the tree
• Evaluative:
ta pulacia citusan ti γata paraxena (B2)
the birdies were looking the cat weirdly
Transcription and annotation
Discourse markers
•
•
•
•
•
Additive
Temporal
Contrastive
Inferential
Other
Analysis
• Annotated linguistic features -> metrics based on frequency of
occurrence per level
• Means comparison : One-Way ANOVA
• Post-hoc multiple comparisons between levels A2, B1 and B2:
Bonferroni tests
Results
Narrative length
• Main effect and all post-hoc comparisons were significant for:
• Number of tokens [F (2, 147)=54.673, p=0,000]
• Number of clauses [F (2, 147)=44.000, p=0,000]
 The lengthier the narrative the higher the level
Results
Clause subordination (1/2)
• Percentage of dependent clauses:
• Zero occurrences are possible in A2 and B1 (though rarer), but at least one
dependent clause is expected in B2
• Main effect is significant [F (2, 147)=40.172, p=0.000] and so are all post-hoc
comparisons
• A successful discriminator both between A2-B1, and between B1-B2
 A script with no dependent clauses is most likely to be below B2
Results
Clause subordination (2/2)
• Percentage of the different types of dependent clauses:
• Complement, relative, purpose and causal clauses did not significantly discriminate levels
• Only temporal clauses achieved a significant main effect [F (2, 114)=6.109, p=0.003] but only in
discriminating A2 from B1 and from B2.
 In A2 narratives sequential events are not subordinated. Temporal clauses
are used from B1 onwards.
Results
Center-embedding
• Percentage of embedded clauses:
• Significant main effect
[F (2, 147)=6.417, p=0.001]
• Post-hoc tests: A2 - B2
• Embedding used by:
• A2: 3 learners
• B1: 9 learners
• B2: 29 learners. More than one
embedding in the same script
 More than one embedding, indicates a B2 learner.
Results
Clitics
• Percentage of correct clitics to clitics:
• Significant main effect [F (2, 120)= 17.380, p=0.000) and all post-hoc
comparisons
• A2: minimum=0%,
maximum=100%
• B1: more than half of
learners have got all their
clitics correct
• B2: occasional
inappropriate uses by
only 3 learners
 A B2 learner should be expected to use clitics correctly in terms of
gender, number, person and case agreement
Results
Discourse markers: general metrics
• All features were found statistically significant:
• average number of discourse markers per clause [F (2, 147)=14.141, p=0.000]
• percentage of discourse markers to tokens [F (2, 147)=19.958, p=0.000)
• Both are successful discriminators of A2-B2 and B1-B2
Results
Discourse markers: type of marker
• Mean # of the different types of
markers per clause: statistically
significant
•
•
•
•
•
Additive markers : all levels
Temporal markers: A2-B2 and B1-B2
Contrastive markers: A2-B1 and A2-B2
Inferential markers : A2-B2
Other markers: B1-B2
 Exclusive use of the additive και /ce/ (=and) the temporal μετά /meta’ / (=then) is expected
in A2. All other additive or temporal markers should indicate an above A2 learner.
 B1 learners reduce the use of και and μετά, and they start marking contrast.
 Inference marking is never encountered in A2. It should be expected from learners in B1 or
above.
Results
Verb and noun modifiers
Mainstatistically
effect statistically
significant:
• Not
significant:
••
••
•
•
percentage
of evaluative
adjectives
to adjectives: B1-B2 and A2-B2
average
number
of adjectives
per clause
percentage of
of adjectives
evaluative to
adverbs
percentage
tokensto adverbs: all level pairs
average number of adverbs per clause and
percentage of adverbs to tokens
 Systematic use of evaluative adjectives and adverbs indicates a learner above level
A2, and most likely of level B2
Results
Lexical density
• Not statistically significant
Results
at a glance
Metrics
A2 – B1
B1 – B2
A2 – B2
Narrative length
Number of tokens and clauses



Subordination
Percentage of dependent clauses



Percentage of temporal clauses



Percentage of embedded clauses


Mean number of discourse markers per
clause


Percentage of discourse markers to tokens


Percentage of temporal discourse markers


Clitics
Percentage of correct clitics
Discourse
markers

Percentage of contrastive discourse
markers

Percentage of additive discourse markers




Percentage of inferential discourse markers
Modifiers
Percentage of evaluative adjectives
Percentage of evaluative adverbs






Criterial features
at a glance
A2
B1
B2
Subordination
Temporal clauses are not
expected
Systematic use of
temporal clauses
•At least one dependent
clause
•Embedding is
encountered more than
once
Discourse
•Exclusive use of the
additive και and the
temporal μετά is
expected
•No inference
•Start marking contrast
•Start marking inference
Systematize inference
marking
Grammatical
accuracy
Clitics used correctly in
terms of gender, number
and case agreement
Evaluation
Systematic use of
evaluative adjectives and
adverbs
Further research…
• Larger sample of A2-B2 learners and C1-C2
• More fine-grained analysis of indices, e.g. temporal clauses denoting
simultaneity
• New indices, e.g. verbal morphology, vocabulary growth
• Different discourse types and modalities
References
Alanen, Riikka, Huhta, Ari & Tarnanen, Mirja (2010). Designing and assessing L2 writing tasks across CEFR proficiency levels. In
Bartning, Martin & Vedder (Eds.), 21-56.
Bartning, Inge, Martin, Maisa & Vedder, Ineke (eds.) (2010) Communicative development and linguistic development: intersections
between SLA and language testing research. Eurosla Monographs Series 1. Available at:
http://eurosla.org/monographs/EM01/EM01tot.pdf. (date accessed 21/05/2013).
Carlsen, Cecilie (2010) Discourse connectives across CEFR-levels: A corpus based study. In Bartning, Martin & Vedder (Eds.), 191210.
Forsberg, Fanny & Bartning, Inge (2010) Can linguistic features discriminate between the communicative CEFR-levels? A pilot study
of written L2 French. In Bartning, Martin & Vedder (Eds.), 133-158.
Giagkou, Maria. (2012). A readability statistical model for pedagogically relevant text retrieval. In Papadopoulou & Recythiadou
(Eds), Proceedings of the 32nd Annual Meeting Department of Linguistics, AUTH (pp 65-76). Thessaloniki: Institute of Modern
Greek Studies.
Gropas, R. & Triandafyllidou, A. (2011). Greek education policy and the challenge of migration: an ‘intercultural’ view of assimilation.
Race Ethnicity and Education, 14(3), 399-419.
Hawkins, John A. & Buttery, Paula (2010) Criterial Features in Learner Corpora: Theory and Illustrations. English Profile Journal 1(1):
1-23.
Hawkins, John A. & Filipović, Luna (2012) Criterial Features in L2 English: Specifying the Reference Levels of the Common European
Framework (English Profile Studies). Cambridge: Cambridge University Press.
Hickmann, Maya (2003) Children’s discourse: Person, space and time across languages. Cambridge: Cambridge University Press.
Kantzou, Vicky (2010) The temporal structure of narrative in the acquisition of Greek as a first and as a second language. Phd Thesis.
Athens: National and Kapodistrian University of Athens. [In Greek]
Kantzou, Vicky (2012) The temporal structure of narratives in second language acquisition of Greek. In: Gavriilidou Ζoi, Efthymiou
Αggeliki, Thomadaki Εvangelia. & Kambakis-Vougiouklis Penelope (eds) Selected Papers – The 10th International Conference of
Greek Linguistics (pp 354-364) Komotini/Greece: Democritus University of Thrace. Available at:
http://www.icgl.gr/files/English/26.Kantzou_10ICGL_pp.354-364.pdf (date accessed 21/05/2013).
Kuiken, Folkert, Ineke Vedder & Roger Gilabert (2010) Communicative adequacy and linguistic complexity in L2 writing. In Bartning,
Martin & Vedder (Eds.), 81-100.
Pallotti, Gabriele (2010) Doing interlanguage analysis in school contexts. In Bartning, Martin & Vedder (Eds.), 159-190.
Prodeau, Mireille, Lopez, Sabine & Véronique, Daniel (2012) Acquisition of French as a Second Language: Do developmental stages
correlate with CEFR levels? Journal of Applied Language Studies 6(1): 47–68.
Stamouli, Spyridoula (2010) Narrative development in Greek L1 and child L2. Phd Thesis. Athens: National and Kapodistrian
University of Athens. [in Greek]
Varlokosta, Spyridoula & Triantafillidou, Leda (2003). Proficiency Levels in Greek as a Second Language. Athens: Centre for
Intercultural Education, University of Athens. [in Greek]
Thank you!
Part of this work, data collection and rating, was funded by the
educational project “Education of Repatriate and Immigrant
Students”, Action 1 “Linguistic and Educational Support for
Reception Classes”, Aristotle University of Thessaloniki
(National Strategic Reference Framework 2007-2013 and the
Ministry of Education and Religious Affairs)
Download