SLABank Database Guide This guide provides documentation regarding the data on bilingualism and second language acquisition (SLA) in the TalkBank database. All of these data are available from http://talkbank.org/data/BilingBank. TalkBank is an international system for the exchange of data on spoken language interactions. The majority of the corpora in TalkBank have either audio or video media linked to transcripts. All transcripts are formatted in the CHAT system and can be automatically converted to XML using the CHAT2XML convertor. TalkBank data dealing with first language acquisition are available from the CHILDES site at http://childes.psy.cmu.edu. To jump to the relevant section, click on the page number to the right of the corpus. 1. BELC (Spanish-English) ............................................................................................ 2 2. BCN-L2 (Berber-Spanish, Arabic-Spanish) .............................................................. 9 3. Connolly (Japanese-English) ................................................................................... 13 4. CUHK (Chinese-English) ......................................................................................... 15 5. DiazRodriguez (Spanish-Various)............................................................................ 16 6. Dresden (German-English/French/Czech) .............................................................. 17 7. ESF (Arabic/Finnish/Punjabi/Spanish/TurkishDutch/English/French/German/Swedish) ...................................................................... 19 8. FLLOC (English-French) ........................................................................................ 20 9. Køge (Turkish-Danish) ............................................................................................. 25 10. Langman (Chinese-Hungarian) ............................................................................. 26 11. Liceras ..................................................................................................................... 28 12. Paradis ..................................................................................................................... 29 13. PAROLE (Various-English, Various-French) ...................................................... 30 14. Qatar ........................................................................................................................ 33 15. Reading (English-French)...................................................................................... 34 The Interviews .....................................................................................................................................34 Participants...........................................................................................................................................34 List of Files ..........................................................................................................................................36 16. SPLLOC (English-Spanish) ................................................................................... 38 17. TCD (English-French) ........................................................................................... 39 1. BELC (Spanish-English) The Barcelona English Language Corpus (BELC) has its origin in the Barcelona Age Factor (BAF) project. This is a project that examines the effects of age on the acquisition of English as a foreign language. The BAF Project began at a moment when the changes in the timing of foreign language instruction brought about by a new Education Law were being progressively implemented in both primary and secondary schools around Spain, entailing an earlier introduction of the foreign language in primary education from grade 6 (11 years) to grade 3 (8 years). The replacement of the previous curriculum by the new curriculum took eight years, during which it was possible to find pupils who had begun English instruction at the age of 11, under the previous curriculum, and pupils who had begun English instruction at the age of 8, under the new curriculum. In addition to these central groups, two other age groups were also included in the design of the study, one of adolescents whose initial age of learning English was 14 and one of adults who began instruction in English at the age of 18 or older. The research on age effects on the learning of English as a foreign language was conducted with students from state schools in Catalonia (Spain). It is important to note that Catalonia is a bilingual community with a majority language, Spanish, known by practically the totality of the population, and a minority language, Catalan, which is the community language and the language of instruction in the state school system in Catalonia. English is the first foreign language in most schools, hence being the third language of school pupils. It is also important to remark that the earlier introduction of the foreign language entailed a decrease in intensity. That is, whereas English had been taught for three hours per week under the former curriculum (beginning in grade 6), at the time of data collection in the new curriculum it was taught for two hours and a half per week on average from grade 3 to grade 10, and for two hours per week in grades 11 and 12. The approximate amount of instruction in English was about 750 hours under the former curriculum, distributed over seven years; and about 800 hours, distributed over ten years, under the new one. Data were collected at four times: after 200 hours of instruction, 416 hours, 726 hours and 826 hours (Time 1, 2, 3, and 4, respectively) though only one of the groups was available the four times (see Table 1 below). There were 2063 subjects in total, but it should be noted that a number of them had had more hours of instruction, either because of extracurricular exposure or because of retaking a course grade. Pupils with only school exposure (OSE) fulfilled the conditions for comparison. Table 1 below indicates the number of subjects in each group, the age at which they began instruction in English and each group’s mean chronological age at testing. Table 1. Characteristics of subjects in the study Time 1 200 h. Group A AO = 8 AT = 10;9 A1 N = 284 Group B AO = 11 AT = 12;9 B1 N = 286 Group C AO = 14 AT= 15,9 C1 N = 40 Group D AO = 18+ AT = 28;9 D1 N = 91 Time 2 416 h. Time 3 726 h. Time 4 826 h. OSE = 164 AT = 12;9 A2 N = 278 OSE = 140 AT = 16;9 A3 N = 338 OSE = 71 AT = 17;9 A4 N = 155 OSE = 71 B2 B3 OSE = 107 AT = 14;9 N = 240 OSE = 96 AT = 17;9 N = 296 OSE = 51 _ C2 _ _ OSE = 21 AT= 19,1 N = 11 OSE = 4 OSE = 67 AT = 39;4 D2 N = 44 OSE = 21 _ _ AO = age of onset AT = age at testing N = number of subjects OSE = only school exposure The data included in BELC correspond to those subjects who could be followed longitudinally and for whom there are two, three or four collection times over a period of seven years, although not all subjects fulfilled all the tasks (See Table 2). The files in the TalkBank database are taken across the four times and across four tasks. The files are grouped in folders by the tasks. The file names gives first the time (1, 2, 3, 4) then the group (A, B, C), then the task (c, i, n, r), then the subject number (L06, etc). Written composition. The written composition dealt with a familiar topic: “Me: my past, present and future”. Students were given a set time (15 minutes), the same for everybody.1 Oral narrative. The narrative was elicited from a series of six pictures at which the subjects could freely look before and while they were telling the story in the presence of the researcher. In the story there are two main protagonists, a boy and a girl, who are getting ready for a picnic; a secondary character, their mother; and a character that disappears and later reappears, a dog that gets into the food basket and eats the children's sandwiches. Oral interview. It was a semi-guided interview that began with a series of questions about the subject’s family, daily life and hobbies. This constituted a warming-up phase that helped students feel more at ease. In general, interviewers attempted to elicit as many responses as possible from the learners, and accepted learner-initiated topics in order to create as natural and interactive a situation as possible. Role-play. The role-play task was performed in randomly chosen pairs. In the role-play one of the students was given the role of the mother/father while the second student was given the role of the son/daughter. The latter had to ask permission to have a party at home and both students were asked to negotiate setting, time, activities (music, eating, 1 Younger and less proficient learners did not use up all the time they were given because of their language limitations drinking), etc. The researcher gave the initial instructions and when needed also elicited talk by reminding learners of topics for discussion or led the task to its completion by asking about the outcome of the negotiation. Table 2. Spoken tasks performed by BELC longitudinal learners Subjec t Tasks T1 T2 NAR R ROL E L1 L2 L3 IN T L4 L5 L6 L7 L8 L9 L10 T3 IN T NAR R ROL E IN T NAR R ROL E IN T NAR R ROL E L11 L12 L13 L14 L15 L16 L17 L18 L19 L20 L21 L22 L23 L24 L25 L26 L27 L28 L29 L30 L31 L32 L33 L34 L35 L36 L37 T4 L38 L39 L40 L41 L42 L43 L44 L45 L46 L47 L48 L49 L50 L51 L52 L53 L54 L55 T1 T3 T4 T2 Table 3. Written compositions performed by BELC longitudinal learners. Subject L01 L02 L03 L04 L05 L06 L07 L08 L09 L10 L11 L12 L13 L14 L15 L16 L17 L18 L19 L20 L21 L22 L23 L24 L25 L26 L27 L28 L29 L30 L31 L32 L33 L34 L35 L36 L37 L38 L39 L40 L41 L42 L43 L44 L45 L46 L47 L48 L49 L50 L51 L52 L53 L54 L55 The main results of the BAF Project so far can be found in the volume Age and the Rate of Foreign Language Learning (see below). 2014 UPDATE in the folders “narratives-2014” and “compositions-2014” DESCRIPTION OF THE SUBJECTS: The subjects (N=21) constitute a subsample from a larger on-going project (participants N=232, L1 Spanish and Ls1 Spanish and Catalan), which explores the influence of such independent variables as starting age, cumulative L2 input, frequency of the current contact with an L2, as well as the influence of cognitive abilities (working memory, attention switching capacity and language aptitude) on L2 proficiency and on L2 oral and written performance. The subsample of the participants that we present here ( N=21; 6 male, 15 female) were undergraduate students, many of them majoring in English, with an intermediate to advanced level of English. Their average age at first testing was 23.6 (SD 8.3) and the range 18-52. This group had had at least 6 years of English language learning experience: the average length was 14.2 (SD 8.2; range 6-38 ). The mean starting age, defined as the beginning of exposure to English as FL (preschool, primary school or secondary school) was 9.84 (SD 3.33) and the range 4-15. Most of the participants were multilingual, and had been learning an L3 for at least 1 year (mean 2.6, SD 1.2, range 1-5). DESCRIPTION OF THE DATA: The data that we present here contain the transcriptions of the EFL oral production task with the matching sound files, and EFL written compositions. N=6 participants performed the oral production task and the written composition twice with 1 year´s interval (Time 1 and Time 2). N=15 participants performed the oral production task and the written composition twice with 2 years´ interval (Time 1 and Time 3). L2 oral production task: L2 oral production was a video-retelling task elicited with the help of the video prompt (“Alone and Hungry” episode (7 minutes long) from the Charlie Chaplin movie). The subjects watched the whole episode once, then they watched the 1st part of the episode (3.5 minutes approximately) and were asked to retell this part. After that, the subjects watched the 2nd part of the movie, and subsequently did the retelling of the 2nd part. The transcriptions correspond to the retelling of the 1st part of the movie. L2 written composition: The written composition dealt with a familiar topic: “My past, present and future expectations”. Students were given 15 minutes to write the task. The data are organized into 3 main files, which contain 2 sub-files each. The file name gives the type of the data: 1. EFL oral narratives_transcriptions: contains the transcriptions of the EFL oral narratives and consists of 2 sub-files: Narratives_transcriptions_Time 1_Time 2 Narratives_transcriptions_Time 1_Time 3 2. EFL oral narratives_sound files: contains the sound files of the EFL oral narratives and consists of 2 sub-files: Narratives_sound files_Time 1_Time 2 Narratives_sound files_Time 1_Time 3 3. EFL written compositions: contains EFL written compositions and consists of 2 sub-files: Written compositions_Time 1_Time 2 Written compositions_Time 1_Time 3 WHO ARE WE? Our research group (GRAL) consists of the following members. Unless otherwise indicated, all participants are located in the Department of English at the University of Barcelona. Dr. Carme Muñoz (coordinator) munoz@ub.edu a Dr. M Luz Celaya mluzcelaya@ub.edu Dr. Júlia Baron juliabaron@ub.edu Dr. Natália Fullana (Language and Literature Education) nataliafullana@ub.edu Dr. Roger Gilabert rogergilabert@ub.edu Dr. Mayya Levkina mayya.levkina@ub.edu Ms. Aleksandra Malicka amalicka@ub.edu Ms. Anna Marsol amarsol@ub.edu Dr. Immaculada Miralpeix imiralpeix@ub.edu Dr. Joan Carles Mora mora@ub.edu Dr. Teresa Navés tnaves@ub.edu Ms. Mireia Ortega m.ortega@ub.edu Dr. Laura Sanchéz laura.sanchez@ub.edu Dr. Raquel Serrano raquelserrano@ub.edu Dr. Elsa Tragant tragant@ub.edu Collaborator: Dr. Ma Angels Llanes (Universitat de Lleida) allanes@dal.udl.cat Research assistants: Ms. Marina Ruiz Tada Ms. Olena Vasylets marinaruiztada@gmail.com vasylets@ub.edu Articles that make use of these data should cite: C. Muñoz (ed.), (2006) Age and the Rate of Foreign Language Learning. Clevedon: Multilingual Matters. 2. BCN (Berber-Spanish, Arabic-Spanish) Aurora Bel ALLENCAM Research Group Universitat Pompeu Fabra Barcelona (Spain) aurora.bel@pcf.edu 1. Project The BCN-L2 Spanish Corpus was collected within a research project supported by two grants to Aurora Bel from the Ministry of Science and Innovation of the Spanish Government (FFI2009-09349 & FFI2012-35058). The project aims at investigating different phenomena at the syntax-pragmatic and syntax-morphology interface in the acquisition of new languages (mainly L2 Catalan and L2 Spanish) in educational contexts. 2. Elicitation task The corpus consists of 228 spoken and written narrative texts gathered following the procedure designed within the international project Developing Literacy in Different Contexts and Different Languages, P.I.: R. Berman (Berman, 2008). Participants were shown a three-minute silent video displaying scenes of interpersonal conflicts at school, and were then asked to tell and write in Spanish a similar story that happened to a friend. The fact that the participants were asked to tell somebody else’s story necessarily implies the production of third-person referents, as opposed to what happens with personal narratives. 3. Participants (origin, ages and language proficiency level) Data collection was performed during the spring of 2011 and 2012 in different secondary schools in the metropolitan area of Barcelona. Participants are 88 native speakers of Moroccan Arabic (Darija) and 26 speakers of Berber (Amazigh) living in Catalonia. For all the participants Moroccan Arabic or Berber is their family language. In most cases their first contact with Spanish and Catalan (the two environmental languages) coincides with their entry in the Spanish school system (usually at preschool level). In general, they use the family language on a daily basis with family and the environmental languages with friends (for a detailed description of language use patterns and language proficiency, see Bel & García-Alcaraz 2013). Participants were grouped into four age ranges (as established by the Spanish secondary education system (Enseñanza secundaria obligatoria, ESO). The correspondences between the different systems are shown in table 1 below. Table 1. Age ranges and grades Age range Spanish grade 12-13 1º ESO US equivalent 7th grade 13-14 14-15 15-16 2º ESO 3º ESO 4º ESO 8th grade 9th grade 10th grade Participants were also classified into different levels of proficiency in Spanish. We followed the criteria established by the CEFR (Common European Framework of Reference for Languages, 2001), which divides learners into three levels, which can be further divided into six sublevels: Table 2. Levels of proficiency in Spanish CERF A Basic User B Independent User C Proficient User Level proficiency A1 Breakthrough or beginner 1 A2 Waystage or elementary 2 B1 Threshold or intermediate 3 B2 Vantage or upper intermediate 4 C1 Effective Operational Proficiency or 5 advanced C2 Mastery or proficiency 6 of 4. Filenames and ID All participants were assigned a code number to ensure confidentiality, and this number was used to identify the two files with the transcription of their oral and written narratives. The filenames use the following syntax: Subject number L1 language Age ranges Text modality from 01 to 156 dar stands for darija; ber stands for bereber 1E, 2E, 3E, 4E where E stands for ESO o stands for spoken (oral), e stands for written For example, a file that is named ‘10ber1Eo.cha’ is an oral text produced by participant number 10, who is a native speaker of Berber from the 1st grade of ESO. ID headers are arranged as follows: @Participants: STU Target_Student, INV Investigator @ID: spa|periferias_L2|STU|16;08.00|male|ber|26|Target_Student|4E|2| @ID: spa|periferias_L2|EST|||||Investigator||| The participants are introduced in the Participants compulsory header with the codes STU (for Student) and INV (for Investigator), and their corresponding role. The information in the ID header for the target student is structured as follows: target language (spa=Spanish), project name (periferias_L2), participant code (STU), age, sex (male or female), participant’s L1 (ber=Berber or ary=Moroccan Arabic), subject number code (as explained above), participant’s role, grade in the Spanish school system (1E, 2E, 3E, 4E, as specified in Table 1) and level of proficiency in Spanish according to the CERF (from 1 to 6, as specified in Table 2). 5. Some notes on transcription All the collected texts (spoken and written) are orthographically transcribed following CHAT conventions and segmented into clauses, so that each tier contains a clause (Berman & Slobin 1994). All the transcriptions were checked by a second transcriber to ensure reliability. Other important remarks concerning transcription are listed below: - Proper names (people and institutions) are replaced by X, Y, W, etc. - Accents and Spanish letter ‘ñ’ are incorporated. - Correction of orthographic errors in written texts is included in brackets as shown in the following example: Example 1 *STU: porque no decía nada respecto a esa situacion [: situación] because he didn’t say anything about that situation - Words segmentation errors in written texts are marked as shown in the following examples: es^condido ( instead of ‘escondido’, hidden) yasta [: ya está] - Omissions are marked differently depending on the modality of production: Spoken texts: le ha da(d)o He hit him Written texts: le ha pillao [: pillado] He caught him - Words in Catalan (the other environmental language) are marked with @s followed by the corresponding word in Spanish using the replace notation: taula@s [: mesa] (‘taula’ is the Catalan word for ‘table’) - Enclitic pronouns, which are attached orthographically to the verb, are marked as follows: dá+me+lo (give it to me) quedar+se (to remain) This does not affect preclitic pronouns, since they are conventionally written separate from the verb (‘me lo da’, he gives it to me). - Punctuation marks that could come into conflict with CHAT format as well as typographic conventions typical of written texts are identified in brackets as the following examples: Example 2 *STU: yo no (h)ice nada. I didn’t do anything *STU: y tampoco tenía intención [% punto]. and I had no intention either [% period] Example 3 *STU: el [% e mayúscula] problema empezó. the [% e upper case] problem started 6. Team work Three research assistants (Júlia Perera, Mònica Tarrés and Estela García-Alcaraz, who also supervised the process) collected the data and transcribed the spoken and written texts. Transcription and assessment of language level was coordinated by Dr. Elisa Rosado. The authors of the corpus would appreciate being notified and receiving a copy, or a summary, of any work using the data of the corpus. For a full description of the data collection methods, codes, and analyses followed in most of these studies, please consult this basic work that should also be cited in publications using these data: Bel, A. & García-Alcaraz, E. (2013) Subjects in the L2 Spanish of Moroccan Arabic speakers: evidence from bilingual and second language learners. T. Judy & S. Perpiñán (eds.) The Acquisition of Spanish as a Second Language: Data from Understudied Languages Pairings. Amsterdam: John Benjamins. Bel, A. & García-Alcaraz, E., Rosado, E. (forthcoming) Reference comprehension and production in L2 Spanish: the view from null-subject languages. Issues in Hispanic and Lusophone Linguistics. Amsterdam: John Benjamins. 3. Connolly (Japanese-English) Steve Connolly Hazawa, 2-12-11 Nerima-ku, Tokyo Japan 176-0003 (03) 5999-5997 Connolly@inter.net This project was entitled “Peer-to-peer discourse journal writing by Japanese Junior High School ERL Students” and was submitted as a doctoral thesis. A peer-to-peer “secret” dialogue journal project, emulating projects by Green and Green (1993) and Worthington (1997), was instituted between 30 Japanese junior high school students at one public school, and 15 students each at two other Tokyo public schools. The project spanned five terms during which the students exchanged journals weekly in English with partners, who changed each term. Using names and school names was forbidden in order to maintain a sense of mystery, and to force the partners to learn as much as permissible about each other by communicating in English. The supervising teachers did not correct or respond to the entries; the researcher occasionally scanned them to check for sole use of the L2 and for appropriateness of content. There were four entries by each partner in Terms 1, 4, and 5. There were six each in Terms 2 and 3. The 60 secret journal participants were average public middle-school students from a Tokyo suburb. They entered the seventh grade at around 12 years old, and were 12-13 at the beginning of the journal project. At the time the project ended, a year-and-a-half later, the participants were in the ninth grade and were 14-15 years old. All three schools that participated in the project are average public schools in the same ward (county), and all three are in close proximity. Schools N and T are within approximately 2.75 km and 1.75 km, respectively, of school K. The enrollments of the schools varied. School T had only two classes of eighth graders: they averaged over 37 students per class. School K had three classes that averaged over 32 students per class, and school N had four classes which averaged over 30 students per class. Given that the schools are situated in the same ward, the curricula for the three schools are uniform and are mandated by a combination of the Japanese federal agency responsible for education (the Ministry of Education or Mombusho) and the ward education committee. Mombusho provides general educational guidance, while the ward committee chooses textbooks and makes other day-to-day administrative decisions. The students attended three 50-minute English classes per week, which were taught by their Japanese English teachers, based largely on the grammar-translation approach. Dependent on the year of the student, the classes included 9-18 classes per year that were team-taught by a Japanese English teacher and a native English speaker, in an effort to bring more of a communicative approach to the classroom. The curriculum sequence was dictated by a textbook common to all of the middle schools in the ward. The purpose of the study was to investigate the pedagogical efficacy of peer-to-peer dialogue journals. In addition to the journals themselves, three sets of data were collected and analyzed: a free-writing quiz, a free-speaking quiz, and term-end surveys. The journals themselves, the free-writing quiz, and the free-speaking quiz were transcribed using the CHAT format. In the third term, 290 eighth graders at all three schools took a surprise ten-minute freewriting quiz. A one-way MANOVA showed that the journal participants statistically significantly outperformed the journal non-participants on measures of fluency, accuracy, and syntactic complexity. In the fourth term, 96 eighth graders at one school took a surprise recorded three-minute free-speaking quiz. A one-way ANOVA showed that the journal participants statistically significantly outperformed the journal non-participants on the measure of fluency. After each of the first four terms, the participants completed written surveys to gauge their attitudes toward their partners, the activity, and their feelings about their linguistic improvement. In general, the responses indicated that the participants enjoyed the project, and they felt that the journal contributed to increases in their writing and reading proficiencies, less so to their listening and speaking proficiencies. They also felt that on occasion they learned something from their partners. After the project, the journals were analyzed, using repeated-measures ANOVAs, for trends over the five terms in measures of total words and word types per entry, mean length of utterance (MLU), and for common errors. Only the MLU showed no significant term-to-term changes over the five terms. The trends were generally down in Terms 2 and 3 (six entries apiece) and back up to Term 1 levels in Terms 4 and 5 (four entries apiece). The trends did not show marked improvement in any of the measures, however, the journal participants statistically significantly outperformed the journal nonparticipants on both the free-writing and free-speaking quizzes. This type of activity is one that adolescents enjoy because of their desire to socialize, and doing so in English probably contributes greatly to linguistic improvement. Furthermore, because teachers do not intervene at all, the workload on supervising teachers is minimal. Green, C., & Green, J. M. (1993). Secret friend journals. TESOL Journal, 2(3), 20-23. Worthington, L. (1997). Let’s not show the teacher: EFL students’ secret exchange journals. Forum, 35(3), 2-7. 4. CUHK (Chinese-English) Brian MacWhinney Department of Psychology Carnegie Mellon University Pittsburgh, PA 15213 These data were collected and transcribed by students in a class that Brian MacWhinney taught at Chinese University of Hong Kong in the Spring semester of 2007. They track Chinese speakers at various ages learning English and, in one case, French. 5. DiazRodriguez (Spanish-Various) Lourdes Diaz Rodriguez lourdes.diaz@upf.edu This DIAZ corpus contains Adult Spanish L2 oral data of Indoeuropean and Asian Learners, both semi-spontaneous and experimental, obtained in Barcelona, Spain under the umbrella of a research project supervised by Dr. Lourdes Díaz Rodríguez (Universitat Pompeu Fabra, Spain), and funded by the Spanish Government. A parallel set of data was gathered in Ottawa, in instructed FL setting (no immersion in the language) under the supervision of Prof. J.M. Liceras. (a) Semi-spontaneous data were obtained through structured interviews (conducted by a Spanish speaking interviewer), the topics being student’s context and language contact profile, mainly. (b) Experimental data came from structured questionnaires consisting of 1-2 picture description tasks, eliciting vocabulary, DPs and verb inflection; 1-3 sets of questions requiring the production of interrogative sentences, relative clauses, cleft-clauses and repetitions. Subjects’ mother tongues were: German, Swedish, Icelandic, Korean and Chinese. All data in this set were gathered in Barcelona among learners of L2/L3 Spanish who volunteered. All were interviewed by consent at school (EOI) and University premises (UPF). Their production was audio-taped and later transcribed at the Universitat Pompeu Fabra, Spain. The research team that has taken part in the different intervals of data gathering consisted of: P. Álvarez; K. Bekiou; A. Bel; M. Bini; A. Blanco; P. Deza; R. Fernández Fuertes; B. Laguardia; J. A. Redó; E. Rosado; G. Feliu; A.Ruggia and L. Díaz. The research reported was supported by grants from the Spanish Ministerio de Educación, and Ministerio de Ciencia e Innovación to Dr. Lourdes Díaz Rodríguez from 1995-2000, namely: PB94-1096-C02-01; BFF2000-0928; HUM2006-10235. 6. Dresden (German-English/French/Czech) Angelika Kubanek-German ELL Saxony University of Braunschweig a. kubanek-german@tu-bs.de The Early Language Learning (“Fruehes Fremdsprachenlernen”) Project was a project funded by the Department of Education of Saxony in 2000. A foreign language - English, French, Czech - was offered for 4 hours per week to 8 and 9-year-olds, i.e. grade 3 and 4, instead of the then standard 1 hour per week. A study, commissioned by the Department of Education and conducted by Angelika Kubanek-German, investigated 12 classes (150 pupils) during the first two years of the program, autumn 2000 to summer 2002. The overall research project (see preliminary report, Kubanek-German 2003) pursued three aims: 1. to assess the linguistic achievement of the children after 2 years of learning, contrasting the subgroups: intensive versus standard; and between different languages 2. to gain a holistic picture of primary foreign language learning by focusing the research activities not only on the foreign language but also on more unexplored territory such as cultural awareness and, 3. as a sub-question, to investigate whether curricular anchored notions of what a child can do in the foreign language class are justified, thus expanding on the notion of child-orientation (cf. Kubanek-German 2001) Data in TalkBank are from assessment interviews that lasted 25 minutes and were composed of three parts. Part 1 (warm up) included themes familiar to the children. Part 2 (water interview) involved questions based on an unfamiliar picture book about the theme of water. In part 3 (rat search) students used teamwork to solve the “rathunt” puzzle. Children were interviewed in pairs and the same tasks were used in all three languages by the same interviewer. For English, there were 20 boys and 18 girls. For French, 10 boys and 8 girls. For Czech, 16 boys and 16 girls. Data were collected in Chemnitz, Radebeul, Dresden, and Leipzig. The English teacher set high objectives in the linguistic domain. The pedagogical style was rather teacher-centred. She used immediate correction. She most clearly changed her attitude towards the research project towards the positive. For her class, there was no catchment area restriction. Her pupils did very well in the communication test. She was a trained primary teacher, and had taught Russian at primary level. After 1990, a re-training for English was offered to those teachers of Russian, including language training in Britain. The French teacher had spent some time in France teaching German as a foreign language. There was a fear at the inception of the intensive programme that French would not meet with acceptance on the part of the parents (in contrast to English). However: after one year, the whole school where she was employed successfully started offering only intensive French (i.e. in both grade 3 classes): the programme is non-selective. This teacher supported the less fluent teachers of French in the project. Her approach is holistic, she uses a lot of body language. She took the 4th graders to Brittany (classe de mer) - a long way from Dresden. “It is just fascinating to see how much they can do” is the statement that best characterises her attitude. The Czech teacher is a native speaker with training for grammar school, but he had been teaching at the primary level before the pilot project began. He taught grammar more explicitly and was concerned about pronunciation. He explained this by stressing the difficulties of the Czech language. It should be stated, though, that he, as well as the others, did many songs and dances and rhymes with the class. 7. ESF (Arabic/Finnish/Punjabi/Spanish/TurkishDutch/English/French/German/Swedish) Wolfgang Klein Clive Perdue Max Planck Institut Nijmegen, Netherlands klein@mpi.nl The ESF (European Science Foundation Second Language) Database is a computerized archive of data collected by research groups of the ESF project in five European countries: France, Germany, Great Britain, The Netherlands and Sweden. The project concentrates on the spontaneous second language acquisition of forty adult immigrant workers living in Western Europe, and their communication with native speakers in the respective host countries. The target languages are Dutch, English, French, German and Swedish. For each target language, two source languages were selected. The corpora are: - Dutch L2 and Arabic L1 - Dutch L2 and Turkish L1 - English L2 and Panjabi L1 - English L2 and Italian L1 - FrenchL2 and Arabic L1 - French L2 and Spanish L1 - German L2 and Italian L1 - German L2 and Turkish L1 - Swedish L2 and Finnish L1 - Swedish L2 and Spanish L1 The Dutch, English, and French L2 transcripts have accompanying audio. The German and Swedish L2 transcripts do not. Biographical information about the informants is currently in the bios.zip file. A filename like lsfbe24a.1.cha indicates: l subject from the longitudinal group, s source language is Spanish, f target language is French, be the informant's name is Berta, 2 the session took place in the 2nd data collection cycle, 4 it was the 4th encounter in that cycle, a the activity transcribed is a free conversation (activity code A), 1 it is the 1st conversation in the encounter, Publications that use this corpus should cite: Perdue, C. (ed.) (1993). Adult Language Acquisition. Vol 1: Field Methods. Cambridge University Press 8. FLLOC (English-French) Florence Myles Modern Languages School of Humanities University of Southampton Southampton SO17 1BJ England e-mail: fjm@soton.ac.uk Linguistic Development in Classroom learners of French: a Cross sectional Study: This directory contains sound files and corresponding transcripts from an ESRC-funded one year project which ran from October 2001 to September 2002 (ESRC grant R000234754). One of its aims was to provide a database of learner language for years, 9, 10 and 11 of secondary education in the UK context. The Project Director was Florence Myles and the other team members were Emma Marsden, Rosamond Mitchell and Sarah Rule. Three groups of twenty learners in each of years 9, 10 and 11 (i.e. in their 3rd, 4th and 5th year respectively of learning French in the UK educational context; age 13-14, 14-15, 15-16 respectively) in a local secondary school were tested. A gender-balanced sample from the three different year groups, and containing pupils of all the ability range, as judged by the teachers and the pupils' school grades, was used in the study. The sample is however slightly biased towards the top ability pupils, as they are more likely to show signs of further development. The participants were numbered 1 20 for each year group. However as this was a short term cross-sectional study if a cohort pupil was absent then a replacement pupil carried out the task and these were given random numbers between 60 and 90. This ensured that the number of pupils in each year that carried out a particular task was always 20. In selecting and involving informants in the research, the project followed the Recommendations on Good Practice in Applied Linguistics of the British Association of Applied Linguistics (1994) on the responsibility of researchers in respecting the privacy of participants, ensuring confidentiality of personal details and in maintaining openness about the goals of the research. 4 oral tasks were administered to all 60 subjects, on a one-to-one basis with a researcher. The tasks used were the same for all years, in order to enable a comparison of results. Moreover, some of the tasks were the same as those used in the 'Progression Project' (to enable comparisons to be drawn). The tasks were as follows: Cartoon story (Loch Ness Monster): in this task, learners have to tell a story on the basis of a series of cartoon pictures. This task was developed and used in the Progression Project. It also provides valuable information on learners' developing discourse level skills. Task Code L Interrogative elicitation task: this task is an information gap activity in which the subjects have to find out missing information from the researcher in order to reconstruct a drawing. The main purpose of this task is to elicit interrogative constructions and pronominal reference, as well as gender markings. This task was also developed and used in the Progression Project. Task Code I Photos task: One-to-one interview with a researcher: this is a directed conversation with a researcher in which the subject has to respond to a number of questions, as well as ask questions based on photographs brought by the researcher. The main purpose of this task is to elicit a wide range of structures, with a particular focus on verbal morphology (past tense, future). A version of this task was used in the Progression Project, although we modified it in order to ensure elicitation of a range of temporal reference (as we were dealing with more advanced learners). Task Code P Negative elicitation task: learners have to describe a famous person by saying what they do and do not do (following picture cues), and the researcher has to guess who the famous person is on the basis of the learner's description and a series of possible celebrities. Task Code N Recording All tasks were recorded digitally, and took around 15 minutes each, in a one-to-one situation with a researcher, making a total of around one hour of spoken language per pupil. Additional Conventions In this section, we describe some of the general decisions we have taken in the transcribing of French interlanguage oral data, as well as some of the adaptations we have made to the CHILDES system, in the context of L2 data. As will become obvious, many of the decisions were dictated by our research agenda in both the Linguistic Development and the Progression projects, and our choice to use the automatic morphosyntactic parser. And although it means that sometimes, the transcription is somewhat deviant from the actual phonological shape of the words produced by learners, we felt it is not too much of a problem as other researchers interested in e.g. phonology, can listen to the sound files as they read the transcripts, and add their own level of coding. The data has been transcribed orthographically. This is necessary in order to use the French morphosyntactic parser on the completed transcripts, as it will not recognise non-words. There is no extensive coding of errors and overlaps are not marked, since they can be heard in the sound files. Learner utterances have been carefully segmented into distinct utterances, but this has not been done for the researcher. If a participant exactly repeats the researcher (or another participant in the case of pair tasks), it has been coded as follows: *32N: [^ eng: how do you say he goes] *ADR: il va *32N: il@g va@g au cinema @g is added after every repeated word. @g has been added to the special form marker file sf.cut file in the French MOR program. @g is used to ensure the imitation is not included for analysis by the French morphosyntactic parser, as this could give misleading information about the current grammar of the learner . In order for the French MOR programme to ignore the English we coded whole utterances as follows: *SAR: [^ eng: yes you begin by asking questions] *43P: [^ eng: how do you say dog?] Use of a single English word to complete a French Phrase If an English word has been used to complete a French phrase, then we have coded the words as follows: Noun; @s:d Adjective @s:a Adverb @s:adv Preposition @s:pre Verb @s:v Pronoun @s:pro Determiner @s:det Conjunction @s:con For example: *28L: il achete le skirt@s:d These forms are then analysed by the morphosyntactic parser as 'English N, or V, or A etc., rather than just ignoring them and producing outputs which do not correspond to the learner's grammar (e.g. in this example, suggesting that this learner's grammar allows a determiner to be followed by nothing, as the parser would not recognise 'skirt'). These special form markers have been added to the sf.cut file in MOR and they have also been added to the depfile in CLAN (so the files pass through check) . Indeterminate forms In beginner datasets, it is often difficult to determine which form a learner has intended, as learners often produce something very approximate. There are four examples of this use of indeterminate forms that occur consistently in our data and we coded them as follows: Definite articles which sound like something between le and la: le@n Indefinite articles which sound like something between un and une: un@n First person subject pronoun which sound like something between je and j'ai: je@n A verb form which sounds like something between a and est: a@n These forms have been added to the neo.cut file (see below), and are analysed by the parser as e.g. definite article, without specifying the gender. Neologistic verb endings Our learners also used neologistic verb forms, which were usually non-finite. Each of these new forms is written on the main tier then added to the MOR programme in a neo.cut file, created, then saved as part of the MOR lexicon. For example: pren {[scat neo:v:inf]} "prendre" will be transcribed as pren on the main tier, and analysed by the parser as neo:v:inf (neologism:verb:infinitive) We have also added a number of words, particularly nouns, to the MOR lexicon, For example, we added le shopping, le jogging, le badminton, and le t_shirt, so that they can be recognised and therefore tagged by the parser. Additionally, the following project-specific conventions were used in order to code 'intended tense', in the context of the Photos task: In the 'Photos' task, each photoset was designed to elicit a dialogue in the present, past or future (by referring to holidays just gone - Christmas, forthcoming - summer, and to hobbies - present). We have therefore coded the data for intended tense use. For example, in the following sentence, we wanted to be able to know that the infinitive form 'aller' was produced in a context where a future form would be expected: *84P: l'ete prochain je aller Marjorca . would be transcribed as follows: *84P: l'ete prochain je aller@f Marjorca . where the following tags have been added to the sf.cut file in MOR : @p {[scat inf:pres]} for contexts where a present form would be expected @f {[scat inf:future]} for contexts where a future form would be expected @c {[scat inf:past]} for contexts where a past form would be expected this enables the morphosyntactic parser to analyse these forms as v:inf:future|aller, and therefore to retrieve them easily for analysis . Directories Interrogatives Year 9 Interrogatives Year 10 Interrogatives Year 11 Loch Ness Year 9 Loch Ness Year 10 Loch Ness Year 11 Negatives Year 9 Negatives Year 10 Negatives Year 11 Photos Year 9 Photos Year 10 Photos Year 11 All the files in each directory have a corresponding MOR file in the appropriate directory. We would like to acknowledge Chritophe Parisse's expert guidance in making some of these adaptations to the French MOR programme, The Files are labelled in the following way: Soundfiles: 01L9SAR.wav Transcriptions: 01L9SAR.cha (01 is the number of the student, L is the task code, 9 is the student's year, SAR is the abbreviation for the researcher) Publications using these data should cite: Myles 2002: Full Report of Research Activities and Results. Linguistic Development in Classroom Learners of French. www.regard.ac.uk/research_findings/R000223421/report.pdf 9. Køge (Turkish-Danish) Jens Normann Jørgensen University of Copenhagen Copenhagen, DK normann@hum.ku.dk This data were collected from adolescent Turkish-Danish bilinguals in the town of Køge near Copenhagen. The data include interviews in Danish and Turkish and group discussions in both Danish and Turkish. There are audio files, but they are not yet available to TalkBank. 10. Langman (Chinese-Hungarian) Dr. Juliet Langman Division of Bicultural-Bilingual Studies University of Texas at San Antonio 6900 North Loop, 1603 West San Antonio, TX 78249 jlangman@lonestar.utsa.edu This corpus is made up of 10 files consisting of interviews conducted in 1994 with 11 Chinese immigrants living in Hungary. The bulk of the conversation is in Hungarian, although in the case of those who speak English there is also English, and in the case of one transcript (KIN10) there are significant amounts of Chinese (with a Hungarian translation in a %tra dependent tier). Interviews focused on issues related to their arrival in Hungary as well as their daily life activities. With the exception of KIN2 and KIN10 none of the participants had had formal training in Hungarian. Interviewers were the researcher, as well as three different Hungarian undergraduates. Data were collected with two purposes in mind: the analyses of communicative strategies among adult secondlanguage learners learning in a nonstructured environment, and the analysis of the acquisition of morphology of an agglutinative language. The following additional form markers have been used in the (*) speaker lines of the transcripts: @e = english word, e.g., go@e @c = chinese word, e.g., xie@c @a = adult-invented word, e.g., pigyilni@a The following special codes have been used on the %lan tier: $MIX utterances with some form of code-switching or borrowing $CHI utterance in Chinese (used only in KIN10) The following special codes have been used on the %rep (repetition) tier to identify: 1. whose speech is repeated SRP self-repetition of immediately previous utterance ORP other repetition of immediately previous utterance SRE self-repetition of an utterance not immediately preceding ORE other repetition of an utterance not immediately preceding 2. the function of the repetition MIS misunderstanding, prompting, asking for clarification VAL validation repetition of previous utterance EXP explanation to ease understanding COR correction and language learning functions 3. the form of the repetition PAR partial COM exact TRA translation PLU repetition including additional information These three types of codes could be combined as in: %rep: SRP:MIS:PAR Error coding focused exclusively on morphology and is represented on two separate tiers, %err and %mor. The %mor tier shows the actual target form for each error marked. The %err tier marks the types of errors using the following codes: $OMI: omission $OMI:PAR partial omission $INS: insertion $INS:PAR partial insertion $SWI switched form $SWI:PAR partially switched form Partial support for data collection and analysis was provided through a grant awarded to Dr. Csaba Pléh, OTKA grant T018173, A magyar morfológia pszicholingvistikai vizsgálata (The psycholinguistic study of Hungarian morphology). Publications using these data should cite: Langman, Juliet. (1998) “Aha” as Communication Strategy: Chinese speakers of Hungarian. In Regan, V. (ed.) Contemporary Approaches to Second-language Acquisition in Social Context: Crosslinguistic Perspectives. Dublin: University College Dublin Press, 32-45. Langman, Juliet. (1997). Analyzing second-language learners’ communication strategies: Chinese speakers of Hungarian. Acta Linguistica Hungarica 44, 277–299. Langman, Juliet. (1995-1996). The role of code-switching in achieving understanding: Chinese speakers of Hungarian. Acta Linguistica Hungarica, 43, 323–344. 11. Liceras Liceras, Juana Department of Modern Languages University of Ottawa jliceras@uottawa.ca Josiane LucAndre Nicholas NicholasM Tristan ClaireH ClaireP Falco Ginger Joanna Phillippe F female F male E male E male F male E female F female E male E female E female, Polish also F male form formaciónpreguntas formulaciónpreguntas narr narraciónes cont contestarpreguntas pers preguntaspersonales rep repeticiones comp completaroraciones comppreg completarpreguntas comprecomprehension role roleplaying 12. Paradis 13. PAROLE (Various-English, Various-French) The Corpus PAROLE (PARallèle Oral en Langue Etrangère) was compiled by members of the Langages research team (Laboratoire LLS) at the Université de Savoie (Chambéry, France), to investigate the characteristics of different L2 proficiency levels. The particularity of the corpus is our attempt to incorporate temporal elements of spoken production in the main transcription line, along with more classic coding of errors and retracings. PAROLE is composed of oral productions by 68 young adult learners of three foreign languages (English, French, Italian), as well as a benchmark corpus of productions by 27 native speakers performing the same tasks. Transcripts and recordings of three tasks (two summaries of a video clip immediately after viewing, and a short autobiographical narrative) will constitute the PAROLE corpus. Task details are provided in the PAROLE Manual (PAROLE_documents folder). In addition to the speaking tasks, all the non-native subjects completed a battery of tests and questionnaires, furnishing complementary data on their L2 knowledge, experience, motivation for L2 study, and two aspects of language-learning aptitude (nonword repetition and morpho-syntactic analysis). Test results for the learner subjects are available in the subject_data file (PAROLE_documents folder), and references for the tests used are provided in the PAROLE Manual (same folder). Pdf files of the subject profile and the motivation questionnaires used (English L2 subjects) are also included in the documents folder. PAROLE was funded through a global research grant given to the Laboratoire LLS by the French Ministère de l'Education Nationale, as part of the contrats quadriénnaux between the Ministry and the Université de Savoie for 2003-2006 and 2007-2010. The Ministry also provided funds for two doctoral students working on the corpus. We began pre-testing production triggers and assembling test materials in 2003; most of the French L2 and English L2 subjects were recorded in 2005 and the native speakers in 2006, and transcription work began in earnest in 2006. Due to illness and a shortage of personnel, the Italian recordings and transcriptions are lagging behind English and French; the first wave of Italian files should be available on-line by the end of 2008 (and we apologize for this frustrating delay). We have attempted to adhere to CHAT conventions as closely as possible; major innovations concern the scoped timing of "hesitation groups" (unbroken sequences of hesitation phenomena, such as silent pauses, filled pauses, and certain paralinguistic noises). We have also made a distinction between words produced in the learners' L1 (coded with the new suffix "@l1"), and words produced in another foreign language (coded "@s"). See the PAROLE Manual for detailed descriptions of our use of CHAT coding symbols, occasional additions to the code base, our criteria for utterance delimitation, error coding, etc. (PAROLE_documents folder). Participants in the learner corpus (54 females, 14 males): 33 learners of English (24 French-L1, 9 German-L1; average age 21); 12 learners of French (5 Spanish-L1, 3 Chinese-L1, 2 Swedish-L1, 1 Polish-L1, 1 English-L1; average age 23); 23 learners of Italian (all French-L1; average age 19). Participants in the native-speaker corpus (20 females and 7 males): 9 English-L1 (average age 21); 8 French-L1 (average age 22); 10 Italian-L1 (average age 23);. All participants were enrolled in a French or Italian university (either in a normal or study-abroad program) at the time of recording. See the subject_data file for detailed information on each participant (PAROLE_documents folder). The corpus consists of audio files (.wav format) and transcripts for each participant performing two short video summary tasks ("task A," "task C"), and one short autobiographical narrative ("task E"; on-line publication planned in late 2008). Sound files and transcripts are segmented according to task. All transcripts have been carefully linked to the digital sound files with bullet points in Sonic Mode. We recommend that researchers wishing to work with PAROLE organize their files with sound files and transcripts in the same folder, for optimal comparison between the transcripts and the productions. Carefully disambiguated tagged files are stored together in a special folder for each language. Key to file names (three-digit numbers refer to each subject): L2 English learners: 0 L2 Italian learners: 2 L2 French learners: 4 British and NZ English: N0 North American English: N1 Italian native-speakers: N2 French native-speakers: N4 The single letter (a, c, or e) following the subject number indicates which task is involved: file "010a.cha" is the CHAT transcript for English learner 010 performing task A (first video description); file "010a.wav" is the sound file corresponding to this transcript & task; file "010a.pst.cex" is the tagged transcript. All recordings took place in a small, closed classroom or office, without distractions or interruptions. Video support material ("triggers") were presented on a portable computer, and integrated into .html pages that the subject manipulated directly. See PAROLE Manual for details of interview structure, video presentation, interviewer behavior, recording equipment, etc. HILTON, H. E. (forthcoming, 2008) Connaissances, procédures et productions orales en L2. AILE. HILTON, H. E. (forthcoming, 2008) The link between vocabulary knowledge and spoken L2 fluency. Language Learning Journal. OSBORNE, J. (2007) Investigating L2 fluency through oral learner corpora. In M.C. Campoy & M.J. Luzón (eds.) Spoken Corpora in Applied Linguistics. Frankfurt: Peter Lang, 181-197. OSBORNE, J. & RUTIGLIANO, S. (2007) Constitution d’un corpus multilingue d’apprenants d’une L2: recueil et exploitation des données. In H. Hilton (ed.) Acquisition et didactique, Actes de l’atelier didactique, AFLS 2005. Chambéry : LLS, Collection Langages, 141-156. 14. Qatar Yun Zhao Department of Modern Language Carnegie Mellon University This is a corpus of spoken interviews with Qatari learners of English, contributed by Yun Zhao. Name Grade Nationality Gender Sam Abe Charles Tom Larry Ali (missing) Bill Harry Arnold Jenny Nancy Lucy Anne Alice Paula Pat Tina Linda Donna 12 12 11 12 12 12 12 12 12 12 12 12 12 11 11 12 12 11 11 Qatari Qatari Qatari Qatari Qatari Qatari Qatari Jordanian Qatari Qatari Qatari Qatari Qatari Qatari Qatari Qatari Qatari Qatari Kuwaiti Male Male Male Male Male Male Male Male Male Female Female Female Female Female Female Female Female Female Female Reading skills 39.61 62.57 61.12 44.35 93.66 47.66 75.32 Missing Language Usage 65.76 73.67 80.31 39.31 89.91 52.76 99 Missing Average English 52.685 68.12 70.715 41.83 91.785 50.21 87.16 Missing 77.63 96.81 99 94.54 78.46 53.06 53.37 79.65 66.95 71.32 97 99 97.2 87.75 98.8 57.4 84.62 89.32 90.51 91.1 87.315 97.905 98.1 91.145 88.63 55.23 68.995 84.485 78.73 81.21 15. Reading (English-French) Brian Richards Dept. of Arts and Humanities in Education University of Reading Bulmershe Court Earley, Reading RG6 1HY United Kingdom B.J.Richards@reading.ac.uk These data on French foreign language oral interviews were transcribed as part of a study of the reliability and validity of oral assessment in modern foreign languages in the General Certificate of Secondary Education (GCSE). GCSE is a public examination normally taken by school children in the United Kingdom at the age of 16, i.e. after the 11 years of compulsory schooling. The 34 interviews constitute one part of the French oral examination: the so-called “free conversation.” Here, the French teacher interviews students about everyday topics such as school, home, family, holidays, future aspirations and hobbies, and interests. Other parts of the oral examination such as role-plays are not part of these data. The title of the project was “Oral Assessment in Modern Languages Project”, funded by the Research Endowment Trust Fund of the University of Reading. Our analyses have compared lexical and grammatical features of the children’s language with teachers’ expectations of foreign language learners of this age, and with the language of French native speakers in a similar interview setting (Chambers & Richards, 1995). We have also compared teachers’ impressionistic assessments of the presence of qualities specified in the assessment criteria with our own objective counts using the CLAN software (Richards & Chambers, 1996). We are currently looking at teacherstudent interaction, focusing on the teachers’ accommodation strategies. The Interviews Teachers conduct the oral examinations, including the interviews on set dates and on topics determined by the official examination board. Only one teacher and one student are present during each interview, the audio recording being made by the teacher. The teacher enters assessments on a mark sheet during the interview, and on completion of the examination the tapes and mark sheets are sent to the examination board. A sample of tapes is remarked by a moderator appointed by the examination board and the teachers’ assessments adjusted if necessary. The average length of the interviews is 5 minutes 30 seconds. They range from 3 minutes to 12 minutes. Participants All 34 participants come from the same all-ability secondary school (11-18 comprehensive school) in an English-speaking area of South Wales. They are 16 years old and are native speakers of English who have been learning French for 5 years. All have also spent at least one year learning Welsh and some have had the opportunity to learn German. The school is situated in a predominantly working-class area, but the students selected here cover a wide range of social background. It should be noted that students with the weakest performance in French were excluded from this sample because the focus of our study was the Higher Level examination. This part of the examination, which is taken in addition to Basic Level, gives students access to the highest grades. Students in the sample obtained pass grades ranging from Grade A (the highest) to Grade E. No students with Grades F and G were included. Two teachers, one female and one male, are involved in the conduct of the interviews. Neither are native speakers of French; both are native speakers of British English who have learned French as a foreign language and have a degree in Modern Languages. As a condition of using the school’s tapes we promised that the identity of the school, teachers, and students would not be revealed. We have therefore used pseudonyms for these. In addition, we have changed the names of all locations mentioned on the tapes, as well as names of sports teams, and exchange schools in France and Germany. Francine Chambers who is a native speaker of French transcribed the recordings and subsequently checked the transcripts edited and coded by Brian Richards. Fiona Richards did the final checking. The following points should be noted: 1. In transcribing the French language we have followed the CHILDES manual (sections 4.5.14 and 27.4.1) in dealing with apostrophes and hyphens: apostrophes are followed by a space (l’ aim, c’ est); hyphens in compounds are replaced by a plus sign (le week+end); dashes between words (est-ce que) are replaced by spaces (est ce que). 2. It is difficult to draw a line between an English accent and a pronunciation error; because an assessment criterion of the GCSE examination is whether an utterance would be comprehensible to a “sympathetic native speaker,” only those student errors that were serious enough to cause a breakdown of communication, or which were followed by a teacher correction, were coded. These were transcribed in UNIBET on the %err tier. 3. Some students answer questions in English or insert English words. Where the whole utterance is in English, a separate speaker tier for the student (*STE) has been created. English words inserted in French are marked with the @e suffix (father@e). Students who are also learning German sometimes use German words. These are marked with a @g suffix. Both the @e and @g symbols are contained in the 00DEPADD file. 4. Other additions to the 00DEPADD file are: +//? (self-interruption of a question) and +..? (question tailing off). 5. Acknowledgment tokens have been coded as back channels and are marked [+ bch]. These can be excluded from MLU and MLT counts using the -s”[+ bch]” switch. 6. The exclamations and interactional markers used are: “aah,” “euh,” “mm,” and “um.” To omit these from analyses they can be placed in an exclude file. List of Files In the table below, the fourth column shows the combined total of points obtained by each student for the tests in Speaking, Listening, Reading, and Writing in the GCSE examination. A maximum of 7 points is awarded for each of these 4 skills, giving a possible total of 28 points. The fifth column shows the score for the whole oral test, including the interview and role-plays. Table 1: Recordings and GSCE Scores File numberSex W01.cha W02.cha W03.cha W04.cha W05.cha W06.cha W07.cha W08.cha W09.cha W10.cha W11.cha W12.cha W13.cha W14.cha W15.cha W16.cha W17.cha W18.cha W19.cha W20.cha W21.cha W22.cha W23.cha W24.cha W25.cha W26.cha W27.cha W28.cha W29.cha W30.cha W31.cha W32.cha W33.cha W34.cha male male female female female male male female male female female male male male male female male female male male male female female male female female female female male female male female male female Teacher Sex male male female female male male male male male female male male female male male female female male male female male female male male male male male male male female male male male male GCSE PointsOral Test 19 17 16 11 16 19 18 22 15 14 20 17 12 12 19 11 16 23 23 12 19 10 20 17 21 14 21 21 21 16 24 25 8 26 4 3 2 3 4 4 4 5 4 3 4 4 2 3 4 2 4 6 6 2 5 2 4 4 5 4 5 5 4 3 6 6 7 6 Publications using these data should cite: Chambers, F., & Richards, B. J. (1995). The “free conversation” and the assessment of oral proficiency. Language Learning, 11, 6–10. 16. SPLLOC (English-Spanish) Laura Dominguez University of Southampton SPLLOC is a corpus of L2 Spanish (a.k.a. SPLLOC) that has been collected by a team of researchers in Southampton, Newcastle, and York universities sponsored by an ESRC research grant award (2006-2008). The data is also freely available in anonymised form through the project website (www.splloc.soton.ac.uk<http://www.splloc.soton.ac.uk>) for use by other second language acquisition researchers. The L2 oral Spanish data have been collected from classroom learners in schools and universities in England, using a series of specially designed elicitation tasks, including storytelling, picture description, discussion and individual interview. There were 20 learners at each of 3 levels: beginners (Year 9 students aged 13-14), intermediate students (A2 students aged 17-18), and fourth year undergraduates. All of them were native English speakers. Depending on their level, each learner was audiorecorded undertaking between 3 and 5 oral tasks. They also completed computer based and paper based tasks that provided complementary data on aspects of their Spanish knowledge. For comparison purposes, small numbers of native speakers were also recorded undertaking the same tasks. The resulting database contains 290 digital soundfiles (240 learner recordings, 50 native speaker recordings) that are accompanied by transcripts in CHILDES format. Some files also have an extra layer of tagging which identifies parts of speech. 17. TCD (English-French) Seán Devitt, F.T.C.D Senior Lecturer in Education School of Education University of Dublin, Trinity College Dublin 2, Ireland sdevitt@tcd.ie This project, designed and implemented by Seán Devitt, School of Education, Trinity College, Dublin, set out to track the development of the means of expressing temporality by children learning French as a second language in France. The subjects were five children, aged between eight and twelve, of three different nationalities Irish, Polish and Cambodian who were in primary school in Paris in the early part of 1982. The data presented here were gathered over a five-month period from March 31 to September 6 1982, during a sabbatical term and a summer holiday. The five-month stay of the researcher and his family in France was funded by two grants, one from the National Board for Science and Technology (now Entreprise Ireland) and one from the Ministére des Affaires Étrangéres of the French Government, organized by the Service Culturel de l'Ambassade de France in Dublin. The French Government, through the Ministére de l'Education Nationale, provided further support by arranging for the researcher’s three children to attend school in Paris. The school picked by the Ministére for Marie and Ann was Ecole rue de la Plaine in the 20th arrondissement of Paris. [Their older brother, Séamus, was admitted to the nearby Lycée Héléne Boucher.] The Ministére also helped in locating the other three subjects in nearby schools. The two Irish subjects, Marie and Ann, were aged 11 and 8. They were the researcher's daughters, and had been to France twice prior to 1982 for holidays. On one of these occasions (July-August 1980) they had spent three of the five weeks of their holiday in a Centre Aéré, a type of holiday camp, which is described below. Neither had studied French at school and their exposure to French had been minimal apart from on these visits to France. Their stay in France was planned to be of five months duration. After that they were to return to Ireland. On March 31 the family (parents and three children) arrived in Paris to find that the apartment they had booked was quite unsatisfactory. Ten days were spent in looking for proper accommodation. A small apartment was eventually located but did not become available until April 23. The intervening two weeks were spent with English-speaking friends in Hermonville, a village some 9 kilometers from Reims. Marie and Ann were allowed to attend the village school for one of these weeks; the other week coincided with the Easter holidays. The language spoken at home was normally English. Contact with French was, therefore, confined mainly to the school in the first three months. However, further opportunities for contact were provided by television in the evenings and at weekends, by visits to friends, and by visits of friends to the apartment. There was one longer visit of three days (without their parents) to friends in Reims. The third subject, PPM, was a twelve-year old Polish boy, an only child. His father had come to France in 1978 to find work as a plumber; PPM and his mother had remained in Poland. In October 1981 they came to Paris to visit the father for a few weeks. While they were there, martial law was declared in Poland and they were unable to return immediately. By September 1982 (the end of the research period) PPM seemed to have accepted that he would be staying in France; his mother had not. In Poland PPM would have been in the first year of secondary school. PPM had absolutely no knowledge of French before coming to France. Neither had his mother. Since she presumed that she was to return to Poland at the first possible opportunity, she did not set about learning it. The family lived in an apartment in an inner suburb of Paris. The language spoken at home was invariably Polish. At the time of the recordings PPM had not made friends with French children. At weekends he would go to the Bois de Vincennes with his father to play ball. He had one Polish friend who had been in France since he was seven and spoke French fluently. Otherwise he had little contact outside of school with native French speakers. In school his contact with native French children also seemed limited. The fourth and fifth subjects, PCF and CCM, were two Cambodian children, sister and brother. PCF was nine-years old, the youngest of ten children. Her brother, CCM was twelve. Some time in 1980 both had fled Cambodia with their parents and three other siblings. Before that they had attended school but under very difficult conditions, often having to spend a large part of the day working in the fields. The family spent some months in refugee camps in Thailand before arriving in France in January 1981. They stayed a few months with an older brother who had come to Paris some years previously and had married a French woman. The family then moved to their own apartment. Neither child had had any contact with French before coming to France. While they were staying with their brother, he and his wife were an important source of support for learning French. At home the language spoken was generally Cambodian, with some Chinese. When their sister-in-law visited the home, or when they visited her home, she spoke French with them. Schooling for the five subjects From April 23, three weeks after their arrival, until the end of June Marie and Ann attended the local primary school in their area. Marie was in CM2, Ann in CE2. They received no special treatment in the form of a special class for foreigners, but were fully integrated into their classes from the first day. This was specifically requested when applying for permission for them to attend school in France. In January 1982, three months after his arrival in France, PPM began to attend the local primary school, L'Ecole X in V. The school had a special language programme for foreigners like PPM, involving several hours of French tuition per day. As the children were felt to be able for it, they were permitted to attend lessons in other subjects, usually in a class of children a little younger than themselves. PPM was taking this programme at the time of the first recording in late May. He had just begun to attend Mathematics lessons in a mainstream class (CM1 for 10 year olds). He had been in France for seven or eight months at the time of the first recording. In April 1981, three months after their arrival in France, PCF and CCM went to a school in Z, an inner suburb of Paris, where they followed a special programme in French for foreign children. In February 1982 they changed to Ecole B near their apartment. Here they were fully integrated into the school, PCF in CM1, CCM in CM2. Both had close French friends According to their teachers both children were very bright and were performing very well in class. They had been in France for a year and a half at the time of the first recording. In France the school day lasts from 8.30 am to 4.30 pm, with a two-hour break for lunch and two other shorter breaks. Children are free to go home for lunch or to have it in the cantine. Marie, Ann, and the Polish boy stayed in school for lunch. The two Cambodian children went home. There was also the option of remaining in school from 4.30 to 6.00 for supervised study, preceded by a short break. Marie and Ann remained for the supervised study until 6.00. In the second half of the five-month period, when schools were closed, the two Irish children were allowed to attend a Centre Aéré by the Mairie de Paris. [A Centre Aéré is a type of holiday camp that French municipal authorities organize during the summer months for children up to the age of 16. These centres are usually located in nearby forests and children are transported there and back by bus from various collection points. They are very carefully organized and supervised, providing a wide range of physical activities (football, horse-riding, swimming etc), activities to develop manual skills, (macramé, model making etc), nature walks etc. Children are assigned each day (in groups of about seven) to specially trained moniteurs/monitrices.] From the beginning of July to the end of August (with a break of ten days in the beginning of August) Marie and Ann went daily to a Centre Aéré. They had to be at the meeting point by 8.30 am each morning. Shortly afterwards they were taken by bus to the Centre Aéré. They returned late in the afternoon and were met by their parents between 6.00 and 6.15. They did not go to the Centre Aéré for the first ten days of August, because they were unhappy that their friends were not staying on for August and that they would have a different set of moniteurs. PPM spent July in Paris, with only minimal contact with French speakers. He spent August with his parents on the Mediterranean. For CCM and PCF, the two months of the holidays were spent in Paris or with relatives in the suburbs. During the holidays their French friends were away and they had little or no contact with French speakers, except with their sister-in-law. Frequency and timing of recordings Because of the extended settling-in time while accommodation was being sought, it was over three weeks after arrival before the first recordings could be made with Marie and Ann. At first every possible opportunity was taken to record them in contact with native speakers. Once the rhythm of school-life was established certain constraints were imposed and recordings in the school could be made only at intervals of about a week. Recordings at home continued to be made as often as the opportunity presented itself. Once the holidays began (beginning of July) all recordings had to be made in the evenings at home, since Marie and Ann objected strenuously to the idea of recordings being made at the Centre Aéré. In the cases of PPM, CCM and PCF recordings began in late May, since it was some time after the researcher's arrival before they were located as suitable subjects. The recordings were made in specially designated rooms in the schools on a set day every week, unless that day happened to be a holiday, when the recording for that week had to be dropped. Once the holidays began recordings took place in the homes of the subjects, but at longer intervals so as not to intrude too much on family life. A number of Marie and Ann's recording sessions were totally unstructured. For example, they simply wore the radio-microphone in the canteen or in the playground. Others were carefully structured, with the children having a particular task to perform, such as filling out a family tree for someone else in a group, or preparing with friends for a class outing to a big store. This wide range of settings for recordings (both those the children were aware of and those they were unaware of) might be expected to have provided a rich supply of linguistic output. It did not. Marie and Ann interacted very naturally in many of these settings with very little or no language. For example, Ann and her friend were filmed playing with dolls for over two hours during which very little was said. A game of elastique involving three or four children produced almost no language at all. On other occasions the structured interactions produced many instances of the same basic structures. For example, in the session filling out the family tree the question “Comment s'appelle...” kept recurring. In the preparation for the class outing to the big store, Marie was not inclined to intervene as the other children became totally taken up in the activity. These early recordings yielded very sparse data and have generally been disregarded. For this reason it was decided to fall back on interview-type settings for most of the remaining recordings, since these seemed to produce much more data. In the case of PPM, CCM and PCF, this was the solution adopted from the beginning because of the limited access (about one hour per week). While school lasted native-speaker peers were used for the interviews that took place in specially designated rooms in the schools. The native speakers were given general indications to follow, such as to share information about how the previous weekend had been spent, or to find out how the subjects had come to France, or to have them compare their native countries with France. These interviews with native speaker peers were more or less successful depending on the person involved. In some cases the native speakers (especially those of about nine years of age) simply "ran out of steam" and had nothing further to say. Alternatively they jumped from one topic to another. In general, however, the interview-type setting, in spite of its limitations, provided the subjects with the opportunity of using French in a wide variety of discourse types. On certain occasions, and especially during holiday time when native speaker peers were not available and recordings had to be made in the home, the researches conducted the interviews. There are twelve recordings each of Marie and Ann on their own, and a further four where they were recorded together. Overall frequency was once every ten to twelve days. There are eight recordings of PPM, the first five at weekly intervals, and the remaining three at three to four week intervals. There are eight recordings of PCF, and two of CCM on their own, and a further two where they were recorded together. Frequency was similar to that of PPM. The date of each recording is coded through the three digits at the end of the file name. Thus, 615 means the 15th of June. Acknowledgements The research was facilitated in every possible way by the principals and teachers in the three schools concerned. Rooms were made available for recording, arrangements were made for the subjects to leave their classes, and French children were recruited to take part in the interviews. On occasion the teachers in rue de la Plaine allowed cameras and video-recorders into their classrooms for whole class recordings. I would like to thank the headmaster of Ecole de la Plaine, M. Watier, and to the teachers of Marie in CM2, M. Rubelli and Mme Dutot, and of Ann in CE2, Mlle. Schmidt, for the way they welcomed and looked after our children. To my wife Ann I owe an enormous debt of gratitude for her support and encouragement for the project right from the beginning. Without her constant encouragement it would never have reached this stage. Above all, I must thank the children who took part in the project: the native children who so readily agreed to act as interviewers, but especially the five subjects who were prepared to participate so readily and so fully over the whole period of the project. Without them there would have been nothing. It is to them, Marie, Ann, PPM, CCM and PCF, that this body of data is dedicated in a very special way.