SLABank

advertisement
SLABank Database Guide
This guide provides documentation regarding the data on bilingualism and second language
acquisition (SLA) in the TalkBank database. All of these data are available from
http://talkbank.org/data/BilingBank. TalkBank is an international system for the exchange
of data on spoken language interactions. The majority of the corpora in TalkBank have
either audio or video media linked to transcripts. All transcripts are formatted in the
CHAT system and can be automatically converted to XML using the CHAT2XML
convertor. TalkBank data dealing with first language acquisition are available from the
CHILDES site at http://childes.psy.cmu.edu.
To jump to the relevant section, click on the page number to the right of the corpus.
1. BELC (Spanish-English) ............................................................................................ 2
2. BCN-L2 (Berber-Spanish, Arabic-Spanish) .............................................................. 9
3. Connolly (Japanese-English) ................................................................................... 13
4. CUHK (Chinese-English) ......................................................................................... 15
5. DiazRodriguez (Spanish-Various)............................................................................ 16
6. Dresden (German-English/French/Czech) .............................................................. 17
7. ESF (Arabic/Finnish/Punjabi/Spanish/TurkishDutch/English/French/German/Swedish) ...................................................................... 19
8. FLLOC (English-French) ........................................................................................ 20
9. Køge (Turkish-Danish) ............................................................................................. 25
10. Langman (Chinese-Hungarian) ............................................................................. 26
11. Liceras ..................................................................................................................... 28
12. Paradis ..................................................................................................................... 29
13. PAROLE (Various-English, Various-French) ...................................................... 30
14. Qatar ........................................................................................................................ 33
15. Reading (English-French)...................................................................................... 34
The Interviews .....................................................................................................................................34
Participants...........................................................................................................................................34
List of Files ..........................................................................................................................................36
16. SPLLOC (English-Spanish) ................................................................................... 38
17. TCD (English-French) ........................................................................................... 39
1. BELC (Spanish-English)
The Barcelona English Language Corpus (BELC) has its origin in the Barcelona Age
Factor (BAF) project. This is a project that examines the effects of age on the acquisition
of English as a foreign language.
The BAF Project began at a moment when the changes in the timing of foreign language
instruction brought about by a new Education Law were being progressively
implemented in both primary and secondary schools around Spain, entailing an earlier
introduction of the foreign language in primary education from grade 6 (11 years) to
grade 3 (8 years). The replacement of the previous curriculum by the new curriculum
took eight years, during which it was possible to find pupils who had begun English
instruction at the age of 11, under the previous curriculum, and pupils who had begun
English instruction at the age of 8, under the new curriculum. In addition to these central
groups, two other age groups were also included in the design of the study, one of
adolescents whose initial age of learning English was 14 and one of adults who began
instruction in English at the age of 18 or older.
The research on age effects on the learning of English as a foreign language was
conducted with students from state schools in Catalonia (Spain). It is important to note
that Catalonia is a bilingual community with a majority language, Spanish, known by
practically the totality of the population, and a minority language, Catalan, which is the
community language and the language of instruction in the state school system in
Catalonia. English is the first foreign language in most schools, hence being the third
language of school pupils. It is also important to remark that the earlier introduction of
the foreign language entailed a decrease in intensity. That is, whereas English had been
taught for three hours per week under the former curriculum (beginning in grade 6), at the
time of data collection in the new curriculum it was taught for two hours and a half per
week on average from grade 3 to grade 10, and for two hours per week in grades 11 and
12. The approximate amount of instruction in English was about 750 hours under the
former curriculum, distributed over seven years; and about 800 hours, distributed over ten
years, under the new one.
Data were collected at four times: after 200 hours of instruction, 416 hours, 726 hours
and 826 hours (Time 1, 2, 3, and 4, respectively) though only one of the groups was
available the four times (see Table 1 below). There were 2063 subjects in total, but it
should be noted that a number of them had had more hours of instruction, either because
of extracurricular exposure or because of retaking a course grade. Pupils with only school
exposure (OSE) fulfilled the conditions for comparison. Table 1 below indicates the
number of subjects in each group, the age at which they began instruction in English and
each group’s mean chronological age at testing.
Table 1. Characteristics of subjects in the study
Time 1
200 h.
Group A
AO = 8
AT = 10;9
A1 N = 284
Group B
AO = 11
AT = 12;9
B1 N = 286
Group C
AO = 14
AT= 15,9
C1 N = 40
Group D
AO = 18+
AT = 28;9
D1 N = 91
Time 2
416 h.
Time 3
726 h.
Time 4
826 h.
OSE = 164
AT = 12;9
A2 N = 278
OSE = 140
AT = 16;9
A3 N = 338
OSE = 71
AT = 17;9
A4 N = 155
OSE = 71
B2
B3
OSE = 107
AT = 14;9
N = 240
OSE = 96
AT = 17;9
N = 296
OSE = 51
_
C2
_
_
OSE = 21
AT= 19,1
N = 11
OSE = 4
OSE = 67
AT = 39;4
D2 N = 44
OSE = 21
_
_
AO = age of onset
AT = age at testing
N = number of subjects
OSE = only school exposure
The data included in BELC correspond to those subjects who could be followed
longitudinally and for whom there are two, three or four collection times over a period of
seven years, although not all subjects fulfilled all the tasks (See Table 2).
The files in the TalkBank database are taken across the four times and across four tasks.
The files are grouped in folders by the tasks. The file names gives first the time (1, 2, 3,
4) then the group (A, B, C), then the task (c, i, n, r), then the subject number (L06, etc).
Written composition. The written composition dealt with a familiar topic: “Me: my past,
present and future”. Students were given a set time (15 minutes), the same for
everybody.1
Oral narrative. The narrative was elicited from a series of six pictures at which the
subjects could freely look before and while they were telling the story in the presence of
the researcher. In the story there are two main protagonists, a boy and a girl, who are
getting ready for a picnic; a secondary character, their mother; and a character that
disappears and later reappears, a dog that gets into the food basket and eats the children's
sandwiches.
Oral interview. It was a semi-guided interview that began with a series of questions
about the subject’s family, daily life and hobbies. This constituted a warming-up phase
that helped students feel more at ease. In general, interviewers attempted to elicit as many
responses as possible from the learners, and accepted learner-initiated topics in order to
create as natural and interactive a situation as possible.
Role-play. The role-play task was performed in randomly chosen pairs. In the role-play
one of the students was given the role of the mother/father while the second student was
given the role of the son/daughter. The latter had to ask permission to have a party at
home and both students were asked to negotiate setting, time, activities (music, eating,
1
Younger and less proficient learners did not use up all the time they were given because of their
language limitations
drinking), etc. The researcher gave the initial instructions and when needed also elicited
talk by reminding learners of topics for discussion or led the task to its completion by
asking about the outcome of the negotiation.
Table 2. Spoken tasks performed by BELC longitudinal learners
Subjec
t
Tasks
T1
T2
NAR
R



ROL
E
L1
L2
L3
IN
T



L4
L5
L6
L7











L8
L9
L10









T3
IN
T



NAR
R



ROL
E
















IN
T
NAR
R
ROL
E
IN
T
NAR
R
ROL
E





















































L11
L12
L13




L14
L15
L16
L17














L18
L19
L20















L21
L22
L23











L24
L25
L26
L27
















L28
L29
L30
L31
L32
L33
L34
L35
L36
L37












































T4
































L38
L39
L40
L41
L42
L43
L44
L45
L46
L47
L48
L49
L50
L51
L52
L53
L54
L55





































































T1





T3
T4











































T2



Table 3. Written compositions performed by BELC longitudinal learners.
Subject
L01
L02
L03
L04
L05
L06
L07
L08
L09
L10
L11
L12
L13
L14
L15
L16
L17
L18
L19
L20
L21
L22
L23
L24
L25
L26







L27
L28
L29
L30
L31
L32
L33
L34
L35
L36
L37
L38
L39
L40
L41
L42
L43
L44
L45
L46
L47
L48
L49
L50
L51
L52
L53
L54
L55











































The main results of the BAF Project so far can be found in the volume Age and the Rate
of Foreign Language Learning (see below).
2014 UPDATE in the folders “narratives-2014” and “compositions-2014”
DESCRIPTION OF THE SUBJECTS:
The subjects (N=21) constitute a subsample from a larger on-going project
(participants N=232, L1 Spanish and Ls1 Spanish and Catalan), which explores the
influence of such independent variables as starting age, cumulative L2 input, frequency
of the current contact with an L2, as well as the influence of cognitive abilities (working
memory, attention switching capacity and language aptitude) on L2 proficiency and on
L2 oral and written performance.
The subsample of the participants that we present here ( N=21; 6 male, 15 female) were
undergraduate students, many of them majoring in English, with an intermediate to
advanced level of English.
Their average age at first testing was 23.6 (SD 8.3) and the range 18-52.
This group had had at least 6 years of English language learning experience: the average
length was 14.2 (SD 8.2; range 6-38 ).
The mean starting age, defined as the beginning of exposure to English as FL (preschool,
primary school or secondary school) was 9.84 (SD 3.33) and the range 4-15.
Most of the participants were multilingual, and had been learning an L3 for at least 1 year
(mean 2.6, SD 1.2, range 1-5).
DESCRIPTION OF THE DATA:
The data that we present here contain the transcriptions of the EFL oral production task
with the matching sound files, and EFL written compositions.
N=6 participants performed the oral production task and the written composition twice
with 1 year´s interval (Time 1 and Time 2).
N=15 participants performed the oral production task and the written composition twice
with 2 years´ interval (Time 1 and Time 3).
L2 oral production task:
L2 oral production was a video-retelling task elicited with the help of the video prompt
(“Alone and Hungry” episode (7 minutes long) from the Charlie Chaplin movie). The
subjects watched the whole episode once, then they watched the 1st part of the episode
(3.5 minutes approximately) and were asked to retell this part. After that, the subjects
watched the 2nd part of the movie, and subsequently did the retelling of the 2nd part. The
transcriptions correspond to the retelling of the 1st part of the movie.
L2 written composition:
The written composition dealt with a familiar topic: “My past, present and future
expectations”. Students were given 15 minutes to write the task.
The data are organized into 3 main files, which contain 2 sub-files each. The file name
gives the type of the data:
1. EFL oral narratives_transcriptions:
contains the transcriptions of the EFL oral narratives and consists of 2 sub-files:
Narratives_transcriptions_Time 1_Time 2
Narratives_transcriptions_Time 1_Time 3
2. EFL oral narratives_sound files:
contains the sound files of the EFL oral narratives and consists of 2 sub-files:
Narratives_sound files_Time 1_Time 2
Narratives_sound files_Time 1_Time 3
3. EFL written compositions:
contains EFL written compositions and consists of 2 sub-files:
Written compositions_Time 1_Time 2
Written compositions_Time 1_Time 3
WHO ARE WE?
Our research group (GRAL) consists of the following members. Unless otherwise
indicated, all participants are located in the Department of English at the University of
Barcelona.
Dr. Carme Muñoz (coordinator)
munoz@ub.edu
a
Dr. M Luz Celaya
mluzcelaya@ub.edu
Dr. Júlia Baron
juliabaron@ub.edu
Dr. Natália Fullana (Language and Literature Education) nataliafullana@ub.edu
Dr. Roger Gilabert
rogergilabert@ub.edu
Dr. Mayya Levkina
mayya.levkina@ub.edu
Ms. Aleksandra Malicka
amalicka@ub.edu
Ms. Anna Marsol
amarsol@ub.edu
Dr. Immaculada Miralpeix
imiralpeix@ub.edu
Dr. Joan Carles Mora
mora@ub.edu
Dr. Teresa Navés
tnaves@ub.edu
Ms. Mireia Ortega
m.ortega@ub.edu
Dr. Laura Sanchéz
laura.sanchez@ub.edu
Dr. Raquel Serrano
raquelserrano@ub.edu
Dr. Elsa Tragant
tragant@ub.edu
Collaborator:
Dr. Ma Angels Llanes (Universitat de Lleida) allanes@dal.udl.cat
Research assistants:
Ms. Marina Ruiz Tada
Ms. Olena Vasylets
marinaruiztada@gmail.com
vasylets@ub.edu
Articles that make use of these data should cite:
C. Muñoz (ed.), (2006) Age and the Rate of Foreign Language Learning. Clevedon:
Multilingual Matters.
2. BCN (Berber-Spanish, Arabic-Spanish)
Aurora Bel
ALLENCAM Research Group
Universitat Pompeu Fabra
Barcelona (Spain)
aurora.bel@pcf.edu
1. Project
The BCN-L2 Spanish Corpus was collected within a research project supported by two
grants to Aurora Bel from the Ministry of Science and Innovation of the Spanish
Government (FFI2009-09349 & FFI2012-35058). The project aims at investigating
different phenomena at the syntax-pragmatic and syntax-morphology interface in the
acquisition of new languages (mainly L2 Catalan and L2 Spanish) in educational
contexts.
2. Elicitation task
The corpus consists of 228 spoken and written narrative texts gathered following the
procedure designed within the international project Developing Literacy in Different
Contexts and Different Languages, P.I.: R. Berman (Berman, 2008). Participants were
shown a three-minute silent video displaying scenes of interpersonal conflicts at school,
and were then asked to tell and write in Spanish a similar story that happened to a friend.
The fact that the participants were asked to tell somebody else’s story necessarily implies
the production of third-person referents, as opposed to what happens with personal
narratives.
3. Participants (origin, ages and language proficiency level)
Data collection was performed during the spring of 2011 and 2012 in different secondary
schools in the metropolitan area of Barcelona. Participants are 88 native speakers of
Moroccan Arabic (Darija) and 26 speakers of Berber (Amazigh) living in Catalonia. For
all the participants Moroccan Arabic or Berber is their family language. In most cases
their first contact with Spanish and Catalan (the two environmental languages) coincides
with their entry in the Spanish school system (usually at preschool level). In general, they
use the family language on a daily basis with family and the environmental languages
with friends (for a detailed description of language use patterns and language proficiency,
see Bel & García-Alcaraz 2013).
Participants were grouped into four age ranges (as established by the Spanish
secondary education system (Enseñanza secundaria obligatoria, ESO). The
correspondences between the different systems are shown in table 1 below.
Table 1. Age ranges and grades
Age range
Spanish grade
12-13
1º ESO
US equivalent
7th grade
13-14
14-15
15-16
2º ESO
3º ESO
4º ESO
8th grade
9th grade
10th grade
Participants were also classified into different levels of proficiency in Spanish. We
followed the criteria established by the CEFR (Common European Framework of
Reference for Languages, 2001), which divides learners into three levels, which can be
further divided into six sublevels:
Table 2. Levels of proficiency in Spanish
CERF
A Basic User
B Independent User
C Proficient User
Level
proficiency
A1 Breakthrough or beginner
1
A2 Waystage or elementary
2
B1 Threshold or intermediate
3
B2 Vantage or upper intermediate
4
C1 Effective Operational Proficiency or 5
advanced
C2 Mastery or proficiency
6
of
4. Filenames and ID
All participants were assigned a code number to ensure confidentiality, and this number
was used to identify the two files with the transcription of their oral and written
narratives. The filenames use the following syntax:
Subject number
L1 language
Age ranges
Text modality
from 01 to 156
dar stands for darija; ber stands for bereber
1E, 2E, 3E, 4E where E stands for ESO
o stands for spoken (oral), e stands for written
For example, a file that is named ‘10ber1Eo.cha’ is an oral text produced by participant
number 10, who is a native speaker of Berber from the 1st grade of ESO.
ID headers are arranged as follows:
@Participants:
STU Target_Student, INV Investigator
@ID:
spa|periferias_L2|STU|16;08.00|male|ber|26|Target_Student|4E|2|
@ID:
spa|periferias_L2|EST|||||Investigator|||
The participants are introduced in the Participants compulsory header with the codes STU
(for Student) and INV (for Investigator), and their corresponding role. The information in
the ID header for the target student is structured as follows: target language
(spa=Spanish), project name (periferias_L2), participant code (STU), age, sex (male or
female), participant’s L1 (ber=Berber or ary=Moroccan Arabic), subject number code (as
explained above), participant’s role, grade in the Spanish school system (1E, 2E, 3E, 4E,
as specified in Table 1) and level of proficiency in Spanish according to the CERF (from
1 to 6, as specified in Table 2).
5. Some notes on transcription
All the collected texts (spoken and written) are orthographically transcribed following
CHAT conventions and segmented into clauses, so that each tier contains a clause
(Berman & Slobin 1994). All the transcriptions were checked by a second transcriber to
ensure reliability. Other important remarks concerning transcription are listed below:
- Proper names (people and institutions) are replaced by X, Y, W, etc.
- Accents and Spanish letter ‘ñ’ are incorporated.
- Correction of orthographic errors in written texts is included in brackets as shown in the
following example:
Example 1
*STU: porque no decía nada respecto a esa situacion [: situación]
because he didn’t say anything about that situation
- Words segmentation errors in written texts are marked as shown in the following
examples:
es^condido ( instead of ‘escondido’, hidden)
yasta [: ya está]
- Omissions are marked differently depending on the modality of production:
Spoken texts: le ha da(d)o
He hit him
Written texts: le ha pillao [: pillado]
He caught him
- Words in Catalan (the other environmental language) are marked with @s followed by
the corresponding word in Spanish using the replace notation:
taula@s [: mesa] (‘taula’ is the Catalan word for ‘table’)
- Enclitic pronouns, which are attached orthographically to the verb, are marked as
follows:
dá+me+lo (give it to me)
quedar+se (to remain)
This does not affect preclitic pronouns, since they are conventionally written separate
from the verb (‘me lo da’, he gives it to me).
- Punctuation marks that could come into conflict with CHAT format as well as
typographic conventions typical of written texts are identified in brackets as the following
examples:
Example 2
*STU: yo no (h)ice nada.
I didn’t do anything
*STU: y tampoco tenía intención [% punto].
and I had no intention either [% period]
Example 3
*STU: el [% e mayúscula] problema empezó.
the [% e upper case] problem started
6. Team work
Three research assistants (Júlia Perera, Mònica Tarrés and Estela García-Alcaraz, who
also supervised the process) collected the data and transcribed the spoken and written
texts. Transcription and assessment of language level was coordinated by Dr. Elisa
Rosado.
The authors of the corpus would appreciate being notified and receiving a copy, or a
summary, of any work using the data of the corpus. For a full description of the data
collection methods, codes, and analyses followed in most of these studies, please consult
this basic work that should also be cited in publications using these data:
Bel, A. & García-Alcaraz, E. (2013) Subjects in the L2 Spanish of Moroccan Arabic
speakers: evidence from bilingual and second language learners. T. Judy & S.
Perpiñán (eds.) The Acquisition of Spanish as a Second Language: Data from
Understudied Languages Pairings. Amsterdam: John Benjamins.
Bel, A. & García-Alcaraz, E., Rosado, E. (forthcoming) Reference comprehension and
production in L2 Spanish: the view from null-subject languages. Issues in Hispanic
and Lusophone Linguistics. Amsterdam: John Benjamins.
3. Connolly (Japanese-English)
Steve Connolly
Hazawa, 2-12-11
Nerima-ku, Tokyo Japan 176-0003
(03) 5999-5997
Connolly@inter.net
This project was entitled “Peer-to-peer discourse journal writing by Japanese Junior High
School ERL Students” and was submitted as a doctoral thesis. A peer-to-peer “secret”
dialogue journal project, emulating projects by Green and Green (1993) and Worthington
(1997), was instituted between 30 Japanese junior high school students at one public
school, and 15 students each at two other Tokyo public schools. The project spanned five
terms during which the students exchanged journals weekly in English with partners, who
changed each term. Using names and school names was forbidden in order to maintain a
sense of mystery, and to force the partners to learn as much as permissible about each
other by communicating in English. The supervising teachers did not correct or respond
to the entries; the researcher occasionally scanned them to check for sole use of the L2
and for appropriateness of content.
There were four entries by each partner in Terms 1, 4, and 5. There were six each in
Terms 2 and 3. The 60 secret journal participants were average public middle-school
students from a Tokyo suburb. They entered the seventh grade at around 12 years old,
and were 12-13 at the beginning of the journal project. At the time the project ended, a
year-and-a-half later, the participants were in the ninth grade and were 14-15 years old.
All three schools that participated in the project are average public schools in the same
ward (county), and all three are in close proximity. Schools N and T are within
approximately 2.75 km and 1.75 km, respectively, of school K. The enrollments of the
schools varied. School T had only two classes of eighth graders: they averaged over 37
students per class. School K had three classes that averaged over 32 students per class,
and school N had four classes which averaged over 30 students per class.
Given that the schools are situated in the same ward, the curricula for the three schools
are uniform and are mandated by a combination of the Japanese federal agency
responsible for education (the Ministry of Education or Mombusho) and the ward
education committee. Mombusho provides general educational guidance, while the ward
committee chooses textbooks and makes other day-to-day administrative decisions.
The students attended three 50-minute English classes per week, which were taught by
their Japanese English teachers, based largely on the grammar-translation approach.
Dependent on the year of the student, the classes included 9-18 classes per year that were
team-taught by a Japanese English teacher and a native English speaker, in an effort to
bring more of a communicative approach to the classroom. The curriculum sequence was
dictated by a textbook common to all of the middle schools in the ward.
The purpose of the study was to investigate the pedagogical efficacy of peer-to-peer
dialogue journals. In addition to the journals themselves, three sets of data were collected
and analyzed: a free-writing quiz, a free-speaking quiz, and term-end surveys. The
journals themselves, the free-writing quiz, and the free-speaking quiz were transcribed
using the CHAT format.
In the third term, 290 eighth graders at all three schools took a surprise ten-minute freewriting quiz. A one-way MANOVA showed that the journal participants statistically
significantly outperformed the journal non-participants on measures of fluency, accuracy,
and syntactic complexity.
In the fourth term, 96 eighth graders at one school took a surprise recorded three-minute
free-speaking quiz. A one-way ANOVA showed that the journal participants statistically
significantly outperformed the journal non-participants on the measure of fluency.
After each of the first four terms, the participants completed written surveys to gauge
their attitudes toward their partners, the activity, and their feelings about their linguistic
improvement. In general, the responses indicated that the participants enjoyed the project,
and they felt that the journal contributed to increases in their writing and reading
proficiencies, less so to their listening and speaking proficiencies. They also felt that on
occasion they learned something from their partners.
After the project, the journals were analyzed, using repeated-measures ANOVAs, for
trends over the five terms in measures of total words and word types per entry, mean
length of utterance (MLU), and for common errors. Only the MLU showed no significant
term-to-term changes over the five terms. The trends were generally down in Terms 2
and 3 (six entries apiece) and back up to Term 1 levels in Terms 4 and 5 (four entries
apiece). The trends did not show marked improvement in any of the measures, however,
the journal participants statistically significantly outperformed the journal nonparticipants on both the free-writing and free-speaking quizzes.
This type of activity is one that adolescents enjoy because of their desire to socialize, and
doing so in English probably contributes greatly to linguistic improvement. Furthermore,
because teachers do not intervene at all, the workload on supervising teachers is minimal.
Green, C., & Green, J. M. (1993). Secret friend journals. TESOL Journal, 2(3), 20-23.
Worthington, L. (1997). Let’s not show the teacher: EFL students’ secret exchange
journals. Forum, 35(3), 2-7.
4. CUHK (Chinese-English)
Brian MacWhinney
Department of Psychology
Carnegie Mellon University
Pittsburgh, PA 15213
These data were collected and transcribed by students in a class that Brian MacWhinney
taught at Chinese University of Hong Kong in the Spring semester of 2007. They track
Chinese speakers at various ages learning English and, in one case, French.
5. DiazRodriguez (Spanish-Various)
Lourdes Diaz Rodriguez
lourdes.diaz@upf.edu
This DIAZ corpus contains Adult Spanish L2 oral data of Indoeuropean and Asian
Learners, both semi-spontaneous and experimental, obtained in Barcelona, Spain under
the umbrella of a research project supervised by Dr. Lourdes Díaz Rodríguez
(Universitat Pompeu Fabra, Spain), and funded by the Spanish Government. A parallel
set of data was gathered in Ottawa, in instructed FL setting (no immersion in the
language) under the supervision of Prof. J.M. Liceras.
(a) Semi-spontaneous data were obtained through structured interviews (conducted
by a Spanish speaking interviewer), the topics being student’s context and
language contact profile, mainly.
(b) Experimental data came from structured questionnaires consisting of 1-2 picture
description tasks, eliciting vocabulary, DPs and verb inflection; 1-3 sets of
questions requiring the production of interrogative sentences, relative clauses,
cleft-clauses and repetitions.
Subjects’ mother tongues were: German, Swedish, Icelandic, Korean and Chinese.
All data in this set were gathered in Barcelona among learners of L2/L3 Spanish who
volunteered. All were interviewed by consent at school (EOI) and University premises
(UPF). Their production was audio-taped and later transcribed at the Universitat Pompeu
Fabra, Spain. The research team that has taken part in the different intervals of data
gathering consisted of: P. Álvarez; K. Bekiou; A. Bel; M. Bini; A. Blanco; P. Deza; R.
Fernández Fuertes; B. Laguardia; J. A. Redó; E. Rosado; G. Feliu; A.Ruggia and L.
Díaz.
The research reported was supported by grants from the Spanish Ministerio de
Educación, and Ministerio de Ciencia e Innovación to Dr. Lourdes Díaz Rodríguez from
1995-2000, namely: PB94-1096-C02-01; BFF2000-0928; HUM2006-10235.
6. Dresden (German-English/French/Czech)
Angelika Kubanek-German
ELL Saxony
University of Braunschweig
a. kubanek-german@tu-bs.de
The Early Language Learning (“Fruehes Fremdsprachenlernen”) Project was a project
funded by the Department of Education of Saxony in 2000. A foreign language - English,
French, Czech - was offered for 4 hours per week to 8 and 9-year-olds, i.e. grade 3 and 4,
instead of the then standard 1 hour per week. A study, commissioned by the Department
of Education and conducted by Angelika Kubanek-German, investigated 12 classes (150
pupils) during the first two years of the program, autumn 2000 to summer 2002. The
overall research project (see preliminary report, Kubanek-German 2003) pursued three
aims:
1. to assess the linguistic achievement of the children after 2 years of learning,
contrasting the subgroups: intensive versus standard; and between different
languages
2. to gain a holistic picture of primary foreign language learning by focusing the
research activities not only on the foreign language but also on more unexplored
territory such as cultural awareness and,
3. as a sub-question, to investigate whether curricular anchored notions of what a
child can do in the foreign language class are justified, thus expanding on the
notion of child-orientation (cf. Kubanek-German 2001)
Data in TalkBank are from assessment interviews that lasted 25 minutes and were
composed of three parts. Part 1 (warm up) included themes familiar to the children. Part
2 (water interview) involved questions based on an unfamiliar picture book about the
theme of water. In part 3 (rat search) students used teamwork to solve the “rathunt”
puzzle. Children were interviewed in pairs and the same tasks were used in all three
languages by the same interviewer. For English, there were 20 boys and 18 girls. For
French, 10 boys and 8 girls. For Czech, 16 boys and 16 girls. Data were collected in
Chemnitz, Radebeul, Dresden, and Leipzig.
The English teacher set high objectives in the linguistic domain. The pedagogical style
was rather teacher-centred. She used immediate correction. She most clearly changed her
attitude towards the research project towards the positive. For her class, there was no
catchment area restriction. Her pupils did very well in the communication test. She was a
trained primary teacher, and had taught Russian at primary level. After 1990, a re-training
for English was offered to those teachers of Russian, including language training in
Britain.
The French teacher had spent some time in France teaching German as a foreign
language. There was a fear at the inception of the intensive programme that French would
not meet with acceptance on the part of the parents (in contrast to English). However:
after one year, the whole school where she was employed successfully started offering
only intensive French (i.e. in both grade 3 classes): the programme is non-selective. This
teacher supported the less fluent teachers of French in the project. Her approach is
holistic, she uses a lot of body language. She took the 4th graders to Brittany (classe de
mer) - a long way from Dresden. “It is just fascinating to see how much they can do” is
the statement that best characterises her attitude.
The Czech teacher is a native speaker with training for grammar school, but he had been
teaching at the primary level before the pilot project began. He taught grammar more
explicitly and was concerned about pronunciation. He explained this by stressing the
difficulties of the Czech language. It should be stated, though, that he, as well as the
others, did many songs and dances and rhymes with the class.
7. ESF (Arabic/Finnish/Punjabi/Spanish/TurkishDutch/English/French/German/Swedish)
Wolfgang Klein
Clive Perdue
Max Planck Institut
Nijmegen, Netherlands
klein@mpi.nl
The ESF (European Science Foundation Second Language) Database is a
computerized archive of data collected by research groups of the ESF project in five
European countries: France, Germany, Great Britain, The Netherlands and Sweden. The
project concentrates on the spontaneous second language acquisition of forty adult
immigrant workers living in Western Europe, and their communication with native
speakers in the respective host countries. The target languages are Dutch, English,
French, German and Swedish. For each target language, two source languages were
selected. The corpora are:
- Dutch L2 and Arabic L1
- Dutch L2 and Turkish L1
- English L2 and Panjabi L1
- English L2 and Italian L1
- FrenchL2 and Arabic L1
- French L2 and Spanish L1
- German L2 and Italian L1
- German L2 and Turkish L1
- Swedish L2 and Finnish L1
- Swedish L2 and Spanish L1
The Dutch, English, and French L2 transcripts have accompanying audio. The German
and Swedish L2 transcripts do not. Biographical information about the informants is
currently in the bios.zip file. A filename like lsfbe24a.1.cha indicates:








l subject from the longitudinal group,
s source language is Spanish,
f target language is French,
be the informant's name is Berta,
2 the session took place in the 2nd data collection cycle,
4 it was the 4th encounter in that cycle,
a the activity transcribed is a free conversation (activity code A),
1 it is the 1st conversation in the encounter,
Publications that use this corpus should cite:
Perdue, C. (ed.) (1993). Adult Language Acquisition. Vol 1: Field Methods. Cambridge
University Press
8. FLLOC (English-French)
Florence Myles
Modern Languages
School of Humanities
University of Southampton
Southampton SO17 1BJ
England
e-mail: fjm@soton.ac.uk
Linguistic Development in Classroom learners of French: a Cross sectional Study: This
directory contains sound files and corresponding transcripts from an ESRC-funded one
year project which ran from October 2001 to September 2002 (ESRC grant
R000234754). One of its aims was to provide a database of learner language for years, 9,
10 and 11 of secondary education in the UK context. The Project Director was Florence
Myles and the other team members were Emma Marsden, Rosamond Mitchell and Sarah
Rule.
Three groups of twenty learners in each of years 9, 10 and 11 (i.e. in their 3rd, 4th and
5th year respectively of learning French in the UK educational context; age 13-14, 14-15,
15-16 respectively) in a local secondary school were tested.
A gender-balanced sample from the three different year groups, and containing pupils of
all the ability range, as judged by the teachers and the pupils' school grades, was used in
the study. The sample is however slightly biased towards the top ability pupils, as they
are more likely to show signs of further development. The participants were numbered 1 20 for each year group. However as this was a short term cross-sectional study if a cohort
pupil was absent then a replacement pupil carried out the task and these were given
random numbers between 60 and 90. This ensured that the number of pupils in each year
that carried out a particular task was always 20. In selecting and involving informants in
the research, the project followed the Recommendations on Good Practice in Applied
Linguistics of the British Association of Applied Linguistics (1994) on the responsibility
of researchers in respecting the privacy of participants, ensuring confidentiality of
personal details and in maintaining openness about the goals of the research.
4 oral tasks were administered to all 60 subjects, on a one-to-one basis with a researcher.
The tasks used were the same for all years, in order to enable a comparison of results.
Moreover, some of the tasks were the same as those used in the 'Progression Project' (to
enable comparisons to be drawn). The tasks were as follows:
Cartoon story (Loch Ness Monster): in this task, learners have to tell a story on the basis
of a series of cartoon pictures. This task was developed and used in the Progression
Project. It also provides valuable information on learners' developing discourse level
skills. Task Code L
Interrogative elicitation task: this task is an information gap activity in which the subjects
have to find out missing information from the researcher in order to reconstruct a
drawing. The main purpose of this task is to elicit interrogative constructions and
pronominal reference, as well as gender markings. This task was also developed and used
in the Progression Project. Task Code I
Photos task: One-to-one interview with a researcher: this is a directed conversation with a
researcher in which the subject has to respond to a number of questions, as well as ask
questions based on photographs brought by the researcher. The main purpose of this task
is to elicit a wide range of structures, with a particular focus on verbal morphology (past
tense, future). A version of this task was used in the Progression Project, although we
modified it in order to ensure elicitation of a range of temporal reference (as we were
dealing with more advanced learners). Task Code P
Negative elicitation task: learners have to describe a famous person by saying what they
do and do not do (following picture cues), and the researcher has to guess who the
famous person is on the basis of the learner's description and a series of possible
celebrities. Task Code N
Recording
All tasks were recorded digitally, and took around 15 minutes each, in a one-to-one
situation with a researcher, making a total of around one hour of spoken language per
pupil.
Additional Conventions
In this section, we describe some of the general decisions we have taken in the
transcribing of French interlanguage oral data, as well as some of the adaptations we have
made to the CHILDES system, in the context of L2 data. As will become obvious, many
of the decisions were dictated by our research agenda in both the Linguistic Development
and the Progression projects, and our choice to use the automatic morphosyntactic parser.
And although it means that sometimes, the transcription is somewhat deviant from the
actual phonological shape of the words produced by learners, we felt it is not too much of
a problem as other researchers interested in e.g. phonology, can listen to the sound files
as they read the transcripts, and add their own level of coding. The data has been
transcribed orthographically. This is necessary in order to use the French
morphosyntactic parser on the completed transcripts, as it will not recognise non-words.
There is no extensive coding of errors and overlaps are not marked, since they can be
heard in the sound files. Learner utterances have been carefully segmented into distinct
utterances, but this has not been done for the researcher.
If a participant exactly repeats the researcher (or another participant in the case of pair
tasks), it has been coded as follows:
*32N: [^ eng: how do you say he goes]
*ADR: il va
*32N: il@g va@g au cinema
@g is added after every repeated word. @g has been added to the special form marker
file sf.cut file in the French MOR program. @g is used to ensure the imitation is not
included for analysis by the French morphosyntactic parser, as this could give misleading
information about the current grammar of the learner .
In order for the French MOR programme to ignore the English we coded whole
utterances as follows:
*SAR: [^ eng: yes you begin by asking questions]
*43P: [^ eng: how do you say dog?]
Use of a single English word to complete a French Phrase
If an English word has been used to complete a French phrase, then we have coded the
words as follows:
Noun;
@s:d
Adjective
@s:a
Adverb
@s:adv
Preposition @s:pre
Verb
@s:v
Pronoun
@s:pro
Determiner @s:det
Conjunction @s:con
For example:
*28L: il achete le skirt@s:d
These forms are then analysed by the morphosyntactic parser as 'English N, or V, or A
etc., rather than just ignoring them and producing outputs which do not correspond to the
learner's grammar (e.g. in this example, suggesting that this learner's grammar allows a
determiner to be followed by nothing, as the parser would not recognise 'skirt'). These
special form markers have been added to the sf.cut file in MOR and they have also been
added to the depfile in CLAN (so the files pass through check) .
Indeterminate forms
In beginner datasets, it is often difficult to determine which form a learner has intended,
as learners often produce something very approximate. There are four examples of this
use of indeterminate forms that occur consistently in our data and we coded them as
follows:


Definite articles which sound like something between le and la: le@n
Indefinite articles which sound like something between un and une: un@n


First person subject pronoun which sound like something between je and j'ai:
je@n
A verb form which sounds like something between a and est: a@n
These forms have been added to the neo.cut file (see below), and are analysed by the
parser as e.g. definite article, without specifying the gender.
Neologistic verb endings
Our learners also used neologistic verb forms, which were usually non-finite. Each of
these new forms is written on the main tier then added to the MOR programme in a
neo.cut file, created, then saved as part of the MOR lexicon. For example:
pren {[scat neo:v:inf]} "prendre"
will be transcribed as pren on the main tier, and analysed by the parser as neo:v:inf
(neologism:verb:infinitive)
We have also added a number of words, particularly nouns, to the MOR lexicon,
For example, we added le shopping, le jogging, le badminton, and le t_shirt, so that they
can be recognised and therefore tagged by the parser.
Additionally, the following project-specific conventions were used in order to code
'intended tense', in the context of the Photos task:
In the 'Photos' task, each photoset was designed to elicit a dialogue in the present, past or
future (by referring to holidays just gone - Christmas, forthcoming - summer, and to
hobbies - present). We have therefore coded the data for intended tense use. For example,
in the following sentence, we wanted to be able to know that the infinitive form 'aller'
was produced in a context where a future form would be expected:
*84P: l'ete prochain je aller Marjorca .
would be transcribed as follows:
*84P: l'ete prochain je aller@f Marjorca .
where the following tags have been added to the sf.cut file in MOR :
@p {[scat inf:pres]} for contexts where a present form would be expected
@f {[scat inf:future]} for contexts where a future form would be expected
@c {[scat inf:past]} for contexts where a past form would be expected
this enables the morphosyntactic parser to analyse these forms as v:inf:future|aller, and
therefore to retrieve them easily for analysis .
Directories
Interrogatives Year 9
Interrogatives Year 10
Interrogatives Year 11
Loch Ness Year 9
Loch Ness Year 10
Loch Ness Year 11
Negatives Year 9
Negatives Year 10
Negatives Year 11
Photos Year 9
Photos Year 10
Photos Year 11
All the files in each directory have a corresponding MOR file in the appropriate
directory. We would like to acknowledge Chritophe Parisse's expert guidance in making
some of these adaptations to the French MOR programme,
The Files are labelled in the following way:
Soundfiles: 01L9SAR.wav
Transcriptions: 01L9SAR.cha (01 is the number of the student, L is the task code, 9 is the
student's year, SAR is the abbreviation for the researcher)
Publications using these data should cite:
Myles 2002: Full Report of Research Activities and Results. Linguistic Development in
Classroom Learners of French.
www.regard.ac.uk/research_findings/R000223421/report.pdf
9. Køge (Turkish-Danish)
Jens Normann Jørgensen
University of Copenhagen
Copenhagen, DK
normann@hum.ku.dk
This data were collected from adolescent Turkish-Danish bilinguals in the town of Køge
near Copenhagen. The data include interviews in Danish and Turkish and group
discussions in both Danish and Turkish. There are audio files, but they are not yet
available to TalkBank.
10.
Langman (Chinese-Hungarian)
Dr. Juliet Langman
Division of Bicultural-Bilingual Studies
University of Texas at San Antonio
6900 North Loop, 1603 West
San Antonio, TX 78249
jlangman@lonestar.utsa.edu
This corpus is made up of 10 files consisting of interviews conducted in 1994 with 11
Chinese immigrants living in Hungary. The bulk of the conversation is in Hungarian, although in the case of those who speak English there is also English, and in the case of
one transcript (KIN10) there are significant amounts of Chinese (with a Hungarian
translation in a %tra dependent tier). Interviews focused on issues related to their arrival
in Hungary as well as their daily life activities. With the exception of KIN2 and KIN10
none of the participants had had formal training in Hungarian. Interviewers were the
researcher, as well as three different Hungarian undergraduates. Data were collected with
two purposes in mind: the analyses of communicative strategies among adult secondlanguage learners learning in a nonstructured environment, and the analysis of the
acquisition of morphology of an agglutinative language. The following additional form
markers have been used in the (*) speaker lines of the transcripts:
@e = english word, e.g., go@e
@c = chinese word, e.g., xie@c
@a = adult-invented word, e.g., pigyilni@a
The following special codes have been used on the %lan tier:
$MIX
utterances with some form of code-switching or borrowing
$CHI
utterance in Chinese (used only in KIN10)
The following special codes have been used on the %rep (repetition) tier to identify:
1. whose speech is repeated
SRP
self-repetition of immediately previous utterance
ORP
other repetition of immediately previous utterance
SRE
self-repetition of an utterance not immediately preceding
ORE
other repetition of an utterance not immediately preceding
2. the function of the repetition
MIS
misunderstanding, prompting, asking for clarification
VAL
validation repetition of previous utterance
EXP
explanation to ease understanding
COR
correction and language learning functions
3. the form of the repetition
PAR
partial
COM
exact
TRA
translation
PLU
repetition including additional information
These three types of codes could be combined as in: %rep: SRP:MIS:PAR
Error coding focused exclusively on morphology and is represented on two separate
tiers, %err and %mor. The %mor tier shows the actual target form for each error marked.
The %err tier marks the types of errors using the following codes:
$OMI:
omission
$OMI:PAR
partial omission
$INS:
insertion
$INS:PAR
partial insertion
$SWI
switched form
$SWI:PAR
partially switched form
Partial support for data collection and analysis was provided through a grant awarded to
Dr. Csaba Pléh, OTKA grant T018173, A magyar morfológia pszicholingvistikai
vizsgálata (The psycholinguistic study of Hungarian morphology).
Publications using these data should cite:
Langman, Juliet. (1998) “Aha” as Communication Strategy: Chinese speakers of Hungarian. In Regan, V. (ed.) Contemporary Approaches to Second-language Acquisition in
Social Context: Crosslinguistic Perspectives. Dublin: University College Dublin
Press, 32-45.
Langman, Juliet. (1997).
Analyzing second-language learners’ communication
strategies: Chinese speakers of Hungarian. Acta Linguistica Hungarica 44, 277–299.
Langman, Juliet. (1995-1996). The role of code-switching in achieving understanding:
Chinese speakers of Hungarian. Acta Linguistica Hungarica, 43, 323–344.
11.
Liceras
Liceras, Juana
Department of Modern Languages
University of Ottawa
jliceras@uottawa.ca
Josiane
LucAndre
Nicholas
NicholasM
Tristan
ClaireH
ClaireP
Falco
Ginger
Joanna
Phillippe
F female
F male
E male
E male
F male
E female
F female
E male
E female
E female, Polish also
F male
form formaciónpreguntas formulaciónpreguntas
narr narraciónes
cont contestarpreguntas
pers preguntaspersonales
rep repeticiones
comp completaroraciones
comppreg completarpreguntas
comprecomprehension
role roleplaying
12.
Paradis
13.
PAROLE (Various-English, Various-French)
The Corpus PAROLE (PARallèle Oral en Langue Etrangère) was compiled by members
of the Langages research team (Laboratoire LLS) at the Université de Savoie (Chambéry,
France), to investigate the characteristics of different L2 proficiency levels. The
particularity of the corpus is our attempt to incorporate temporal elements of spoken
production in the main transcription line, along with more classic coding of errors and
retracings.
PAROLE is composed of oral productions by 68 young adult learners of three foreign
languages (English, French, Italian), as well as a benchmark corpus of productions by 27
native speakers performing the same tasks. Transcripts and recordings of three tasks (two
summaries of a video clip immediately after viewing, and a short autobiographical
narrative) will constitute the PAROLE corpus. Task details are provided in the PAROLE
Manual (PAROLE_documents folder).
In addition to the speaking tasks, all the non-native subjects completed a battery of tests
and questionnaires, furnishing complementary data on their L2 knowledge, experience,
motivation for L2 study, and two aspects of language-learning aptitude (nonword
repetition and morpho-syntactic analysis). Test results for the learner subjects are
available in the subject_data file (PAROLE_documents folder), and references for the
tests used are provided in the PAROLE Manual (same folder). Pdf files of the subject
profile and the motivation questionnaires used (English L2 subjects) are also included in
the documents folder.
PAROLE was funded through a global research grant given to the Laboratoire LLS by
the French Ministère de l'Education Nationale, as part of the contrats quadriénnaux
between the Ministry and the Université de Savoie for 2003-2006 and 2007-2010. The
Ministry also provided funds for two doctoral students working on the corpus.
We began pre-testing production triggers and assembling test materials in 2003; most of
the French L2 and English L2 subjects were recorded in 2005 and the native speakers in
2006, and transcription work began in earnest in 2006. Due to illness and a shortage of
personnel, the Italian recordings and transcriptions are lagging behind English and
French; the first wave of Italian files should be available on-line by the end of 2008 (and
we apologize for this frustrating delay).
We have attempted to adhere to CHAT conventions as closely as possible; major
innovations concern the scoped timing of "hesitation groups" (unbroken sequences of
hesitation phenomena, such as silent pauses, filled pauses, and certain paralinguistic
noises). We have also made a distinction between words produced in the learners' L1
(coded with the new suffix "@l1"), and words produced in another foreign language
(coded "@s").
See the PAROLE Manual for detailed descriptions of our use of CHAT coding symbols,
occasional additions to the code base, our criteria for utterance delimitation, error coding,
etc. (PAROLE_documents folder).
Participants in the learner corpus (54 females, 14 males):
33 learners of English (24 French-L1, 9 German-L1; average age 21);
12 learners of French (5 Spanish-L1, 3 Chinese-L1, 2 Swedish-L1, 1 Polish-L1, 1
English-L1; average age 23);
23 learners of Italian (all French-L1; average age 19).
Participants in the native-speaker corpus (20 females and 7 males):
9 English-L1 (average age 21);
8 French-L1 (average age 22);
10 Italian-L1 (average age 23);.
All participants were enrolled in a French or Italian university (either in a normal or
study-abroad program) at the time of recording. See the subject_data file for detailed
information on each participant (PAROLE_documents folder).
The corpus consists of audio files (.wav format) and transcripts for each participant
performing two short video summary tasks ("task A," "task C"), and one short
autobiographical narrative ("task E"; on-line publication planned in late 2008). Sound
files and transcripts are segmented according to task. All transcripts have been carefully
linked to the digital sound files with bullet points in Sonic Mode. We recommend that
researchers wishing to work with PAROLE organize their files with sound files and
transcripts in the same folder, for optimal comparison between the transcripts and the
productions. Carefully disambiguated tagged files are stored together in a special folder
for each language.
Key to file names (three-digit numbers refer to each subject):
L2 English learners: 0
L2 Italian learners: 2
L2 French learners: 4
British and NZ English: N0
North American English: N1
Italian native-speakers: N2
French native-speakers: N4
The single letter (a, c, or e) following the subject number indicates which task is
involved: file "010a.cha" is the CHAT transcript for English learner 010 performing task
A (first video description); file "010a.wav" is the sound file corresponding to this
transcript & task; file "010a.pst.cex" is the tagged transcript.
All recordings took place in a small, closed classroom or office, without distractions or
interruptions. Video support material ("triggers") were presented on a portable computer,
and integrated into .html pages that the subject manipulated directly. See PAROLE
Manual for details of interview structure, video presentation, interviewer behavior,
recording equipment, etc.
HILTON, H. E. (forthcoming, 2008) Connaissances, procédures et productions orales en
L2. AILE.
HILTON, H. E. (forthcoming, 2008) The link between vocabulary knowledge and spoken
L2 fluency. Language Learning Journal.
OSBORNE, J. (2007) Investigating L2 fluency through oral learner corpora. In M.C.
Campoy & M.J. Luzón (eds.) Spoken Corpora in Applied Linguistics. Frankfurt:
Peter Lang, 181-197.
OSBORNE, J. & RUTIGLIANO, S. (2007) Constitution d’un corpus multilingue
d’apprenants d’une L2: recueil et exploitation des données. In H. Hilton (ed.)
Acquisition et didactique, Actes de l’atelier didactique, AFLS 2005. Chambéry : LLS,
Collection Langages, 141-156.
14.
Qatar
Yun Zhao
Department of Modern Language
Carnegie Mellon University
This is a corpus of spoken interviews with Qatari learners of English, contributed by Yun
Zhao.
Name
Grade
Nationality Gender
Sam
Abe
Charles
Tom
Larry
Ali (missing)
Bill
Harry
Arnold
Jenny
Nancy
Lucy
Anne
Alice
Paula
Pat
Tina
Linda
Donna
12
12
11
12
12
12
12
12
12
12
12
12
12
11
11
12
12
11
11
Qatari
Qatari
Qatari
Qatari
Qatari
Qatari
Qatari
Jordanian
Qatari
Qatari
Qatari
Qatari
Qatari
Qatari
Qatari
Qatari
Qatari
Qatari
Kuwaiti
Male
Male
Male
Male
Male
Male
Male
Male
Male
Female
Female
Female
Female
Female
Female
Female
Female
Female
Female
Reading
skills
39.61
62.57
61.12
44.35
93.66
47.66
75.32
Missing
Language
Usage
65.76
73.67
80.31
39.31
89.91
52.76
99
Missing
Average
English
52.685
68.12
70.715
41.83
91.785
50.21
87.16
Missing
77.63
96.81
99
94.54
78.46
53.06
53.37
79.65
66.95
71.32
97
99
97.2
87.75
98.8
57.4
84.62
89.32
90.51
91.1
87.315
97.905
98.1
91.145
88.63
55.23
68.995
84.485
78.73
81.21
15.
Reading (English-French)
Brian Richards
Dept. of Arts and Humanities in Education
University of Reading
Bulmershe Court
Earley, Reading RG6 1HY United Kingdom
B.J.Richards@reading.ac.uk
These data on French foreign language oral interviews were transcribed as part of a study
of the reliability and validity of oral assessment in modern foreign languages in the
General Certificate of Secondary Education (GCSE). GCSE is a public examination normally taken by school children in the United Kingdom at the age of 16, i.e. after the 11
years of compulsory schooling. The 34 interviews constitute one part of the French oral
examination: the so-called “free conversation.” Here, the French teacher interviews
students about everyday topics such as school, home, family, holidays, future aspirations
and hobbies, and interests. Other parts of the oral examination such as role-plays are not
part of these data. The title of the project was “Oral Assessment in Modern Languages
Project”, funded by the Research Endowment Trust Fund of the University of Reading.
Our analyses have compared lexical and grammatical features of the children’s language
with teachers’ expectations of foreign language learners of this age, and with the language of French native speakers in a similar interview setting (Chambers & Richards,
1995). We have also compared teachers’ impressionistic assessments of the presence of
qualities specified in the assessment criteria with our own objective counts using the
CLAN software (Richards & Chambers, 1996). We are currently looking at teacherstudent interaction, focusing on the teachers’ accommodation strategies.
The Interviews
Teachers conduct the oral examinations, including the interviews on set dates and on
topics determined by the official examination board. Only one teacher and one student
are present during each interview, the audio recording being made by the teacher. The
teacher enters assessments on a mark sheet during the interview, and on completion of
the examination the tapes and mark sheets are sent to the examination board. A sample of
tapes is remarked by a moderator appointed by the examination board and the teachers’
assessments adjusted if necessary. The average length of the interviews is 5 minutes 30
seconds. They range from 3 minutes to 12 minutes.
Participants
All 34 participants come from the same all-ability secondary school (11-18 comprehensive school) in an English-speaking area of South Wales. They are 16 years old and are
native speakers of English who have been learning French for 5 years. All have also spent
at least one year learning Welsh and some have had the opportunity to learn German.
The school is situated in a predominantly working-class area, but the students selected
here cover a wide range of social background. It should be noted that students with the
weakest performance in French were excluded from this sample because the focus of our
study was the Higher Level examination. This part of the examination, which is taken in
addition to Basic Level, gives students access to the highest grades. Students in the
sample obtained pass grades ranging from Grade A (the highest) to Grade E. No students
with Grades F and G were included.
Two teachers, one female and one male, are involved in the conduct of the interviews.
Neither are native speakers of French; both are native speakers of British English who
have learned French as a foreign language and have a degree in Modern Languages.
As a condition of using the school’s tapes we promised that the identity of the school,
teachers, and students would not be revealed. We have therefore used pseudonyms for
these. In addition, we have changed the names of all locations mentioned on the tapes, as
well as names of sports teams, and exchange schools in France and Germany. Francine
Chambers who is a native speaker of French transcribed the recordings and subsequently
checked the transcripts edited and coded by Brian Richards. Fiona Richards did the final
checking.
The following points should be noted:
1. In transcribing the French language we have followed the CHILDES manual (sections 4.5.14 and 27.4.1) in dealing with apostrophes and hyphens: apostrophes are
followed by a space (l’ aim, c’ est); hyphens in compounds are replaced by a plus
sign (le week+end); dashes between words (est-ce que) are replaced by spaces (est
ce que).
2. It is difficult to draw a line between an English accent and a pronunciation error;
because an assessment criterion of the GCSE examination is whether an utterance
would be comprehensible to a “sympathetic native speaker,” only those student
errors that were serious enough to cause a breakdown of communication, or
which were followed by a teacher correction, were coded. These were transcribed
in UNIBET on the %err tier.
3. Some students answer questions in English or insert English words. Where the
whole utterance is in English, a separate speaker tier for the student (*STE) has
been created. English words inserted in French are marked with the @e suffix (father@e). Students who are also learning German sometimes use German words.
These are marked with a @g suffix. Both the @e and @g symbols are contained
in the 00DEPADD file.
4. Other additions to the 00DEPADD file are: +//? (self-interruption of a question)
and +..? (question tailing off).
5. Acknowledgment tokens have been coded as back channels and are marked [+
bch]. These can be excluded from MLU and MLT counts using the -s”[+ bch]”
switch.
6. The exclamations and interactional markers used are: “aah,” “euh,” “mm,” and
“um.” To omit these from analyses they can be placed in an exclude file.
List of Files
In the table below, the fourth column shows the combined total of points obtained by
each student for the tests in Speaking, Listening, Reading, and Writing in the GCSE
examination. A maximum of 7 points is awarded for each of these 4 skills, giving a
possible total of 28 points. The fifth column shows the score for the whole oral test,
including the interview and role-plays.
Table 1:
Recordings and GSCE Scores
File numberSex
W01.cha
W02.cha
W03.cha
W04.cha
W05.cha
W06.cha
W07.cha
W08.cha
W09.cha
W10.cha
W11.cha
W12.cha
W13.cha
W14.cha
W15.cha
W16.cha
W17.cha
W18.cha
W19.cha
W20.cha
W21.cha
W22.cha
W23.cha
W24.cha
W25.cha
W26.cha
W27.cha
W28.cha
W29.cha
W30.cha
W31.cha
W32.cha
W33.cha
W34.cha
male
male
female
female
female
male
male
female
male
female
female
male
male
male
male
female
male
female
male
male
male
female
female
male
female
female
female
female
male
female
male
female
male
female
Teacher
Sex
male
male
female
female
male
male
male
male
male
female
male
male
female
male
male
female
female
male
male
female
male
female
male
male
male
male
male
male
male
female
male
male
male
male
GCSE PointsOral Test
19
17
16
11
16
19
18
22
15
14
20
17
12
12
19
11
16
23
23
12
19
10
20
17
21
14
21
21
21
16
24
25
8
26
4
3
2
3
4
4
4
5
4
3
4
4
2
3
4
2
4
6
6
2
5
2
4
4
5
4
5
5
4
3
6
6
7
6
Publications using these data should cite:
Chambers, F., & Richards, B. J. (1995). The “free conversation” and the assessment of
oral proficiency. Language Learning, 11, 6–10.
16.
SPLLOC (English-Spanish)
Laura Dominguez
University of Southampton
SPLLOC is a corpus of L2 Spanish (a.k.a. SPLLOC) that has been collected by a team of
researchers in Southampton, Newcastle, and York universities sponsored by an ESRC
research grant award (2006-2008). The data is also freely available in anonymised form
through the project website (www.splloc.soton.ac.uk<http://www.splloc.soton.ac.uk>) for
use by other second language acquisition researchers.
The L2 oral Spanish data have been collected from classroom learners in schools and
universities in England, using a series of specially designed elicitation tasks, including
storytelling, picture description, discussion and individual interview. There were 20
learners at each of 3 levels: beginners (Year 9 students aged 13-14), intermediate students
(A2 students aged 17-18), and fourth year undergraduates. All of them were native
English speakers. Depending on their level, each learner was audiorecorded undertaking
between 3 and 5 oral tasks. They also completed computer based and paper based tasks
that provided complementary data on aspects of their Spanish knowledge. For
comparison purposes, small numbers of native speakers were also recorded undertaking
the same tasks. The resulting database contains 290 digital soundfiles (240 learner
recordings, 50 native speaker recordings) that are accompanied by transcripts in
CHILDES format. Some files also have an extra layer of tagging which identifies parts of
speech.
17.
TCD (English-French)
Seán Devitt, F.T.C.D
Senior Lecturer in Education
School of Education
University of Dublin, Trinity College
Dublin 2, Ireland
sdevitt@tcd.ie
This project, designed and implemented by Seán Devitt, School of Education, Trinity
College, Dublin, set out to track the development of the means of expressing temporality
by children learning French as a second language in France. The subjects were five
children, aged between eight and twelve, of three different nationalities Irish, Polish and
Cambodian who were in primary school in Paris in the early part of 1982. The data
presented here were gathered over a five-month period from March 31 to September 6
1982, during a sabbatical term and a summer holiday.
The five-month stay of the researcher and his family in France was funded by two grants,
one from the National Board for Science and Technology (now Entreprise Ireland) and
one from the Ministére des Affaires Étrangéres of the French Government, organized by
the Service Culturel de l'Ambassade de France in Dublin. The French Government,
through the Ministére de l'Education Nationale, provided further support by arranging for
the researcher’s three children to attend school in Paris. The school picked by the
Ministére for Marie and Ann was Ecole rue de la Plaine in the 20th arrondissement of
Paris. [Their older brother, Séamus, was admitted to the nearby Lycée Héléne Boucher.]
The Ministére also helped in locating the other three subjects in nearby schools.
The two Irish subjects, Marie and Ann, were aged 11 and 8. They were the researcher's
daughters, and had been to France twice prior to 1982 for holidays. On one of these
occasions (July-August 1980) they had spent three of the five weeks of their holiday in a
Centre Aéré, a type of holiday camp, which is described below. Neither had studied
French at school and their exposure to French had been minimal apart from on these
visits to France. Their stay in France was planned to be of five months duration. After
that they were to return to Ireland.
On March 31 the family (parents and three children) arrived in Paris to find that the
apartment they had booked was quite unsatisfactory. Ten days were spent in looking for
proper accommodation. A small apartment was eventually located but did not become
available until April 23. The intervening two weeks were spent with English-speaking
friends in Hermonville, a village some 9 kilometers from Reims. Marie and Ann were
allowed to attend the village school for one of these weeks; the other week coincided
with the Easter holidays.
The language spoken at home was normally English. Contact with French was, therefore,
confined mainly to the school in the first three months. However, further opportunities
for contact were provided by television in the evenings and at weekends, by visits to
friends, and by visits of friends to the apartment. There was one longer visit of three days
(without their parents) to friends in Reims.
The third subject, PPM, was a twelve-year old Polish boy, an only child. His father had
come to France in 1978 to find work as a plumber; PPM and his mother had remained in
Poland. In October 1981 they came to Paris to visit the father for a few weeks. While
they were there, martial law was declared in Poland and they were unable to return
immediately. By September 1982 (the end of the research period) PPM seemed to have
accepted that he would be staying in France; his mother had not. In Poland PPM would
have been in the first year of secondary school.
PPM had absolutely no knowledge of French before coming to France. Neither had his
mother. Since she presumed that she was to return to Poland at the first possible
opportunity, she did not set about learning it. The family lived in an apartment in an inner
suburb of Paris. The language spoken at home was invariably Polish. At the time of the
recordings PPM had not made friends with French children. At weekends he would go to
the Bois de Vincennes with his father to play ball. He had one Polish friend who had been
in France since he was seven and spoke French fluently. Otherwise he had little contact
outside of school with native French speakers. In school his contact with native French
children also seemed limited.
The fourth and fifth subjects, PCF and CCM, were two Cambodian children, sister and
brother. PCF was nine-years old, the youngest of ten children. Her brother, CCM was
twelve. Some time in 1980 both had fled Cambodia with their parents and three other
siblings. Before that they had attended school but under very difficult conditions, often
having to spend a large part of the day working in the fields. The family spent some
months in refugee camps in Thailand before arriving in France in January 1981. They
stayed a few months with an older brother who had come to Paris some years previously
and had married a French woman. The family then moved to their own apartment.
Neither child had had any contact with French before coming to France. While they were
staying with their brother, he and his wife were an important source of support for
learning French. At home the language spoken was generally Cambodian, with some
Chinese. When their sister-in-law visited the home, or when they visited her home, she
spoke French with them.
Schooling for the five subjects
From April 23, three weeks after their arrival, until the end of June Marie and Ann
attended the local primary school in their area. Marie was in CM2, Ann in CE2. They
received no special treatment in the form of a special class for foreigners, but were fully
integrated into their classes from the first day. This was specifically requested when
applying for permission for them to attend school in France.
In January 1982, three months after his arrival in France, PPM began to attend the local
primary school, L'Ecole X in V. The school had a special language programme for
foreigners like PPM, involving several hours of French tuition per day. As the children
were felt to be able for it, they were permitted to attend lessons in other subjects, usually
in a class of children a little younger than themselves. PPM was taking this programme at
the time of the first recording in late May. He had just begun to attend Mathematics
lessons in a mainstream class (CM1 for 10 year olds). He had been in France for seven or
eight months at the time of the first recording.
In April 1981, three months after their arrival in France, PCF and CCM went to a school
in Z, an inner suburb of Paris, where they followed a special programme in French for
foreign children. In February 1982 they changed to Ecole B near their apartment. Here
they were fully integrated into the school, PCF in CM1, CCM in CM2. Both had close
French friends According to their teachers both children were very bright and were
performing very well in class. They had been in France for a year and a half at the time of
the first recording.
In France the school day lasts from 8.30 am to 4.30 pm, with a two-hour break for lunch
and two other shorter breaks. Children are free to go home for lunch or to have it in the
cantine. Marie, Ann, and the Polish boy stayed in school for lunch. The two Cambodian
children went home. There was also the option of remaining in school from 4.30 to 6.00
for supervised study, preceded by a short break. Marie and Ann remained for the
supervised study until 6.00.
In the second half of the five-month period, when schools were closed, the two Irish
children were allowed to attend a Centre Aéré by the Mairie de Paris. [A Centre Aéré is a
type of holiday camp that French municipal authorities organize during the summer
months for children up to the age of 16. These centres are usually located in nearby
forests and children are transported there and back by bus from various collection points.
They are very carefully organized and supervised, providing a wide range of physical
activities (football, horse-riding, swimming etc), activities to develop manual skills,
(macramé, model making etc), nature walks etc. Children are assigned each day (in
groups of about seven) to specially trained moniteurs/monitrices.] From the beginning of
July to the end of August (with a break of ten days in the beginning of August) Marie and
Ann went daily to a Centre Aéré. They had to be at the meeting point by 8.30 am each
morning. Shortly afterwards they were taken by bus to the Centre Aéré. They returned
late in the afternoon and were met by their parents between 6.00 and 6.15. They did not
go to the Centre Aéré for the first ten days of August, because they were unhappy that
their friends were not staying on for August and that they would have a different set of
moniteurs.
PPM spent July in Paris, with only minimal contact with French speakers. He spent
August with his parents on the Mediterranean. For CCM and PCF, the two months of the
holidays were spent in Paris or with relatives in the suburbs. During the holidays their
French friends were away and they had little or no contact with French speakers, except
with their sister-in-law.
Frequency and timing of recordings
Because of the extended settling-in time while accommodation was being sought, it was
over three weeks after arrival before the first recordings could be made with Marie and
Ann. At first every possible opportunity was taken to record them in contact with native
speakers. Once the rhythm of school-life was established certain constraints were
imposed and recordings in the school could be made only at intervals of about a week.
Recordings at home continued to be made as often as the opportunity presented itself.
Once the holidays began (beginning of July) all recordings had to be made in the
evenings at home, since Marie and Ann objected strenuously to the idea of recordings
being made at the Centre Aéré.
In the cases of PPM, CCM and PCF recordings began in late May, since it was some time
after the researcher's arrival before they were located as suitable subjects. The recordings
were made in specially designated rooms in the schools on a set day every week, unless
that day happened to be a holiday, when the recording for that week had to be dropped.
Once the holidays began recordings took place in the homes of the subjects, but at longer
intervals so as not to intrude too much on family life.
A number of Marie and Ann's recording sessions were totally unstructured. For example,
they simply wore the radio-microphone in the canteen or in the playground. Others were
carefully structured, with the children having a particular task to perform, such as filling
out a family tree for someone else in a group, or preparing with friends for a class outing
to a big store.
This wide range of settings for recordings (both those the children were aware of and
those they were unaware of) might be expected to have provided a rich supply of
linguistic output. It did not. Marie and Ann interacted very naturally in many of these
settings with very little or no language. For example, Ann and her friend were filmed
playing with dolls for over two hours during which very little was said. A game of
elastique involving three or four children produced almost no language at all. On other
occasions the structured interactions produced many instances of the same basic
structures. For example, in the session filling out the family tree the question “Comment
s'appelle...” kept recurring. In the preparation for the class outing to the big store, Marie
was not inclined to intervene as the other children became totally taken up in the activity.
These early recordings yielded very sparse data and have generally been disregarded.
For this reason it was decided to fall back on interview-type settings for most of the
remaining recordings, since these seemed to produce much more data. In the case of
PPM, CCM and PCF, this was the solution adopted from the beginning because of the
limited access (about one hour per week). While school lasted native-speaker peers were
used for the interviews that took place in specially designated rooms in the schools. The
native speakers were given general indications to follow, such as to share information
about how the previous weekend had been spent, or to find out how the subjects had
come to France, or to have them compare their native countries with France.
These interviews with native speaker peers were more or less successful depending on
the person involved. In some cases the native speakers (especially those of about nine
years of age) simply "ran out of steam" and had nothing further to say. Alternatively they
jumped from one topic to another. In general, however, the interview-type setting, in
spite of its limitations, provided the subjects with the opportunity of using French in a
wide variety of discourse types. On certain occasions, and especially during holiday time
when native speaker peers were not available and recordings had to be made in the home,
the researches conducted the interviews.
There are twelve recordings each of Marie and Ann on their own, and a further four
where they were recorded together. Overall frequency was once every ten to twelve days.
There are eight recordings of PPM, the first five at weekly intervals, and the remaining
three at three to four week intervals. There are eight recordings of PCF, and two of CCM
on their own, and a further two where they were recorded together. Frequency was
similar to that of PPM. The date of each recording is coded through the three digits at the
end of the file name. Thus, 615 means the 15th of June.
Acknowledgements
The research was facilitated in every possible way by the principals and teachers in the
three schools concerned. Rooms were made available for recording, arrangements were
made for the subjects to leave their classes, and French children were recruited to take
part in the interviews. On occasion the teachers in rue de la Plaine allowed cameras and
video-recorders into their classrooms for whole class recordings. I would like to thank the
headmaster of Ecole de la Plaine, M. Watier, and to the teachers of Marie in CM2, M.
Rubelli and Mme Dutot, and of Ann in CE2, Mlle. Schmidt, for the way they welcomed
and looked after our children. To my wife Ann I owe an enormous debt of gratitude for
her support and encouragement for the project right from the beginning. Without her
constant encouragement it would never have reached this stage.
Above all, I must thank the children who took part in the project: the native children who
so readily agreed to act as interviewers, but especially the five subjects who were
prepared to participate so readily and so fully over the whole period of the project.
Without them there would have been nothing. It is to them, Marie, Ann, PPM, CCM and
PCF, that this body of data is dedicated in a very special way.
Download