ANR-06-CORP-006 Échange de corpus d'apprentissage multimodaux (MULCE) Eurocall 2010 Workshop proposal "Dissemination and comparison of research findings: developing Contextualized Learning and Teaching Corpora (LETEC)" Eurocall 2010, Bordeaux, mercredi 8 septembre 2010 Workshop coordinators Maud CIEKANSKI (University of Paris 8), Marie-Laure BETBEDER (University of Franche-Comté) Mulce Project coordinator Thierry CHANIER (University of Clermont-Ferrand), ANR Corpus in Social and Human Sciences. 1 The Eurocall 2010 workshop will be the second scientific event for Mulce1, after our first symposium in EPAL07 (Grenoble). This document is an initial workshop proposal for the Eurocall 2010 conference, Bordeaux. This is a full session workshop in order to gather researchers on corpora from different communities. The morning session will focus on “out of the Eurocall members”, in particular CSCL members and CEHL members (involving in learning corpora, not necessarily in language learning). The afternoon session will highlight language learning applications. Only the afternoon session will be integrated into the Eurocall forma. Sincerely, Maud Ciekanski and Marie-Laure Betbeder Workshop coordinators 1 Mulce is funded by the ANR Corpus et Outils en SHS (ANR-06-CORP-006). Mulce gathers members of several laboratories and universities: LRL (Université Blaise Pascal), LIFC (Université de Franche-Comté) and CREET (The Open University), coordinated respectively by Thierry Chanier, Christophe Reffay and MarieNoëlle Lamy. 2 1. Synopsis of the workshop proposal Title of Dissemination and comparison of research findings: Developing Contextualized Learning and Teaching Corpora (LETEC) http://mulce.univ-fcomte.fr/ workshop Marie-Laure Betbeder and Maud Ciekanski Workshop coordinators Practitioner-researchers from the two main communities working on learning and Target teaching corpora (CALL community and Computing Environment for Human audience Learning (CEHL) community). The workshop welcomes experienced participants who contribute to the Prior development of learning and teaching corpora (LETEC) in any way: research, knowledge pedagogic developments, tools and interface, and corpora of various types required (monolingual, bilingual, spoken, written, multimodal, specialized, learner, etc.). Whilst it is becoming increasingly easy to save traces of interaction in online Contents educational exchanges, there is at the same time a growing interest in the research community for the construction of data sets allowing for the study of the learning processes themselves. However, such data sets are rarely structured into corpora, and comparing or re-analysing them is difficult. The workshop is a concrete step in this direction. For a deeper collaboration within and between our communities, we suggest sharing structured data collections. The Mulce project (http://mulce.univfcomte.fr/) aims at proposing a structure for Teaching and Learning Corpora (including pedagogical and research contexts), paying particular attention to the logging and analysis of ‘traces’ of interaction. Two main corpora (asynchronous data and synchronous data) have been built according to this structure. The workshop proposes a dialogue in two phases (morning and afternoon) on sharing corpora and tools to improve interaction analysis from different fields (CSCL, CALL, and CEHL). The morning programme focuses on the CSCL and CEHL perspectives whereas the afternoon programme focuses on CALL perspective and on spoken corpora researches. Part of the workshop defines the notion of a ‘Teaching and Learning Corpus’, shows its main structure and browses some parts of the structured interaction data as developed as part of the Mulce project. Several of the activities will use analysis tools (Calico, Tatiana), standards (TEI, XML), data annotation (multimodal interaction, spoken interaction) and the use of corpora (Mulce platform to browse and analyze a shared corpus; Computer Learner Corpora). Two reasons have motivated the choice of Eurocall 2010 Conference to present the achievement of the Mulce (Multimodal Learning Corpus Exchange) project, after our previous workshop presented in EPAL07 (Echanger Pour Apprendre En Ligne, Grenoble): - The dissemination of our work in the CALL community involving several SIG concerned by corpora, on the European and international scales (approximately 300 participants from 30 countries); - The proximity of the conference setting (Bordeaux) which enables different French communities working on corpora (CSCL, CEHL and spoken corpora) to join the Eurocall audience. In addition, the Eurocall conference also gives the opportunity to offer a frame for 3 publication (Recall (Eurocall association) and Alsic (ADALSIC association), according to the usual publishing procedures. The speakers will initially prepare an extract from their learning interaction corpus Workshop objectives and in their format tool and give the possibility to the audience of the workshop to use their demo after the workshop (access to their tool and platform, possibility to work methodology on own corpora). Participants will discover how to transform data and to create their own corpora, based on the Mulce format, how to use a variety of tools for interaction analysis (Calico, Tatiana), how to annotate and to specify multimodal data and spoken data (TEI, specific format), and the applications of such corpora (eg. Computer learner corpora). A full-day organized in two phases in order to encourage dialogue on language Presentation learning corpora - and on the specificity of research related to this topic - between time the following research communities: - the CSCL (Computer-Supported Collaborative Learning) and CEHL communities - the CALL (Computer Assisted Language Learning) community From the CSCL community: in addition to our current partners in the Mulce project (Calico, Bruillard; Tatiana, Lund), and after fruitful contacts in CSCL09, we expect to invite international specialists such as Suthers or Harrer. From the CALL community, we aim to gather several specialists concerned by the project (interaction analysis, Hampel), (Learner corpora, Granger), (spoken corpora, Jacobson). A round-table conference at the end of the day will bring together the coordinators of 3 Eurocall SIGs working on related issues: “Computer-Mediated Communication” (R. O’Dowd), “Natural language Processing/Intelligent CALL” (C. Tschichold) and “CorpusCALL” (A. Boulton). Out of the Eurocall format Previsionnal 9:45: Workshop Introduction and Welcome Address (workshop coordinators) Workshop 10:00: Structures for corpora in CSCL: new challenges? (A. Harrer, University of programme Duisburg-Essen) 10:30: Benefits of structuring learning and teaching corpora for the understanding of online learning and online interactions: The Mulce platform (Christophe Reffay, ENS Cachan) 11:00: Coffee Break 11:15: Analysis Tool presentation I- Corpus exchange and interoperability : The Calico project (E. Bruillard, ENS Cachan and Alain Mille, Université Lyon1) 11:45: Analysis Tool presentation II- the Tatiana project (K. Lund, University of Lyon2) 12:15: Feedback, questions, discussion on structure, instrumentation, collaboration and sharing in CSCL 12:30: Break Integrated into the Eurocall format 14:00: Corpus-based research in CALL: what are we looking for? (M.N Lamy, Open University) 14:45: Data processing I- Online multimodal interaction corpora : alignment, annotation, transcription (R. Hampel, Open University) 15:15: Kurt Kohn, University of Tubiegen, Allemagne 15:45: Coffee Break 16:00: Use of corpora in research- Tools and questioning interface on 4 heterogeneous data (Mulce project) 16:30: Use of corpora in teaching- Applications of research on Computer Learner Corpora in CALL (S. Granger, Catholic University of Leuven) 17:00: Round-table conference and open discussion on the potentialities of Disseminating and comparing research findings: E. Bruillard (University of Caen), T. Chanier (University of Clermont-Ferrand), M-N. Lamy (Open University), R. O’Dowd (University of Leon), C. Tschichold (University of Wales Swansea), A. Boulton (University of Nancy2). 18:00: Workshop ends AV equipment Access to videoconferencing system for some of the sessions (led by speakers at a distance). provided Fees to attend Coordinators qualifications Our objective is to create a rich dialogue between researchers from different communities. Since the members of CSCL and CEHL communities are not necessarily members of Eurocall, we propose two modalities: - Non-Eurocall members : they may attend the full session paying 45 euros per ½ day = 90 euros the full session (without paying Eurocall fees); - Eurocall members (paying Eurocall fees): they may attend the full session paying half fees (45 euros the day). Marie-Laure Betbeder is a lecturer in Computer science (Computer Science laboratory at the University of Franche-Comte) in the area of Technology Enhanced Learning. She is interested in the analysis of the interaction in collaborative situations of distance training / learning. Since 2006, as a member of Mulce, she has also been studying the structure of the data stemming from such situations, to constitute shareable corpora, usable by other members of the community. Maud Ciekanski is a lecturer in Applied Linguistics and Distance Education. Her main research focus concerns interaction analysis in the area of language learning contexts, multimodal communication and intercultural communication. Since 2006, she has been member of Mulce and is in charge of the analysis tasks. Her previous research concerns self-directed language learning, autonomy and adviser training. 5 2. Introduction and organisation 2.1. Our conference of choice We have chosen Eurocall 2010 (http://www.eurocall-languages.org) as the appropriate international scientific event for our Mulce workshop. There are two reasons for this choice: (1) to ensure dissemination of the outputs of the Mulce project within the CALL community both across Europe and beyond (about 300 delegates from 30 different countries are expected to attend the conference); (2) the location of the conference (Bordeaux) means that the French CSCL (Computer-Supported Collaborative Learning) community and French colleagues working on oral corpora can easily attend. Eurocall'2010 is co-organised by Eurocall (European Computer Assisted Language Learning), and by the ‘Adalsic’ association. These associations manage the refereed journals Recall and Alsic (http://alsic.org) respectively. Following the conference a special issue of each journal will be published, with articles selected according to the usual procedure (using three referees). A framework for publication will thus be available for our colloquium presenters. The colloquium will take place over one day, in two stages. Stage 1 will focus on the CSCL community; Stage 2 will be oriented to the CALL community. Overall the programme of the day will allow these two groups to network together around the concept of corpus and research allied to it. We plan to invite international CSCL experts (D. Suthers), as well as our French partners – those with whom we share tools for analysis (E. Bruillard, ENS Cachan; K. Lund, ICAR; A. Mille, LIRIS). Concerning the CALL aspects of the colloquium, we will bring together experts in online communication for language learning online (R. Hampel), others with expertise in the neighbouring domain of learner corpora (S. Granger), and others again with experience of working on corpora from a linguistic point of view (K. Kohn). Finally, we will invite to a Round Table the 2 Eurocall SIG leaders who work on themes close to ours, namely "Computer-Mediated Communication" (R. O'Dowd), "Natural Language Processing / Intelligent CALL" (C. Tschichold) and "Corpus CALL" (A. Boulton) . 2.2. Rationale "Dissemination and comparison of research findings: developing Learning and Teaching Corpora (LETEC)" Our workshop is built on the colloquium "Corpus d’apprentissage en ligne : conception, réutilisation, échange"2 organised by the MULCE (MUltimodal contextualized Learner Corpus Exchange) project team at EPAL07 (France). It involves researchers from diverse backgrounds and inviting them to examine data collected during online learning sessions, as well as the tools and research methods used, with a view to building shareable corpora to be made available to different groups of researchers. The objective of the workshop is to bring together researchers and practitioners who helped create the existing corpora, or who wish to participate in the creation of new corpora from online learning modules, using corpus research methodologies from EIAH (Environnements 2 oai : edutice.archives-ouvertes.fr:edutice-00161113_v1 6 Informatiques pour l'Apprentissage Humain, Computer Environments for Human Learning) or those that have been or are being developed in CALL. Whilst it is becoming increasingly easy to save traces of interaction in online educational exchanges, there is at the same time a growing interest in the research community for the construction of data sets allowing for the study of the learning processes themselves. However, such data sets are rarely structured into corpora, and comparing or re-analysing them is difficult. When constructing a corpus, there is a need to systematically assemble the data around converging themes, aiming to cover the chosen themes exhaustively, then to organise and structure these data according to shared standards (XML, TEI, etc.). Finally, the data need to be accessible and downloadable online, via search or annotation tools. Because the data are complex and non-homogeneous, a system of synchronisation and internal linkages is required, including access traces, interaction traces, learner productions, tests, interviews etc. Making sense of the learner interactions after the event is a priority. We will dedicate the morning to reflection on how to research corpora, and how corpora are used within CSCL, highlighting questions of specification, instrumentation, implementation and interoperability, all of which being aspects which inform our understanding of the conditions for supporting multiple analyses and re-analyses. Participants will show examples of environments and tools for, among other things, helping researchers to manage, synchronize, visualize and analyze their data in order to create new representations that will make it easier to understand how computer-mediated collaboration works. In these examples, online collaboration, in a variety of domains, will be a main focus. In the afternoon, researchers working on corpus-building in linguistics and applied languages will come together. These disciplinary areas present many new challenges, not least because of the importance of synchronicity and multimodality. The activities will cover a range of domains of application within which the notion of corpora has become central to research, such as corpora of online learning of languages, learner corpora and corpora of spoken language. Highlighting potential cross-fertilisation between the chosen methodologies, tools and methods of application (notably in the area of language learning), participants will support their point based on demonstrations of software using corpus extracts. Part of the discussion will focus on ethics and rights issues. The workshop will be conducted in English, with examples from different languages. Interfaces and tools will be in French and in English. 2.3. Presentation format The workshop is mainly targeted at CALL community practitioners and researchers, but more widely at the CEHL (Computing Environment for Human Learning) community. Speakers will focus on operational aspects of their research. They will prepare an extract from their interaction corpus (language or other online learning situations), in the format of their own analysis tool. Speakers will enable the audience to test their tool with the given corpus extract after the workshop. Therefore, speakers will fill out a short form describing the corpus and the analysis tool : the pedagogical context, a short description of the corpus extract, of the format used by the tool, a short description of the downloadable tool (together with a download link and access code), and a description of the research questions associated to the tool. The form data will then be published in the workshop proceedings. 2.4. Proposed agenda (1rst draft) The following agenda is work in progress. Speakers have not yet been formally contacted but are in touch with one or more members of the Mulce project. 7 Morning out of the Eurocall format 9h00-9h45 : Participant welcoming 9h45- 10h00 : Workshop agenda presentation (two perspectives : CSCL and CALL) 10h00-10h30: Structures for corpora in CSCL: new challenges? University of Duisburg-Essen) (A. Harrer, 10h30-11h00: Benefits of structuring learning and teaching corpora for the understanding of online learning and online interactions (C. Reffay, ENS Cachan) Coffee Break 11h15-11h45: Analysis Tool presentation I- Corpus exchange and interoperability : The Calico project (E. Bruillard, University of Caen and Alain Mille, University of Lyon1) 11h45-12h15: Analysis Tool presentation II- the Tatiana project (K. Lund, University of Lyon2) 12h15-12h30: Feedback, questions, discussion on structure, instrumentation, collaboration and sharing in CSCL 12h30: Lunch Afternoon integrated into the Eurocall format 14h00-14h45: Corpus-based research in CALL: what are we looking for? (M.N Lamy, Open University) 14h45-15h15 : Data processing I- Online multimodal interaction corpora : alignment, annotation, transcription (R. Hampel, Open University) 15h15-15h45 : (Kurt Kohn, Université de Tubiegen, Allemagne) Coffee Break 16h00-16h30 : Use of corpora in research- Tools and questioning interface on heterogeneous data (Mulce project) 16h30-17h00 : Use of corpora in teaching- Applications of research on Computer Learner Corpora in CALL (S. Granger, Catholic University of Leuven) 17h00-18h00: Round-table conference and open discussion on the potentialities of Disseminating and comparing research findings: E. Bruillard (University of Caen), T. Chanier (University of Clermont-Ferrand), M-N. Lamy (Open University), R. O’Dowd (University of Leon), C. Tschichold (University of Wales Swansea), A. Boulton (University of Nancy2). 18h00 : End of the workshop 2.5. Workshop proceedings The workshop proceedings, coordinated by Marie-Laure Betbeder and Maud Ciekanski will gather free of right papers together with a synthesis of the workshop’s discussions. As well as the papers, a note on access and download procedures for the corpora and tools will be made available. The proceedings will be published in the Edutice online archive. 8 Mulce Project Maud Ciekanski, Marie-Laure Betbeder, Thierry Chanier, Marie-Noelle Lamy and Christophe Reffay. 9