THE DATA TRANSCRIPTION AND ANALYSIS (DTA) TOOL. HANDS ON WORKSHOP María Blume, Pontificia Universidad Católica del Perú, Isabelle Barrière, Long Island University & Yeled V'Yalda Early Childhood Center Cristina Dye, Newcastle University, and Ted Caldwell, Gorges With the invaluable help of Carissa Kang and Jonathan Masci, Cornell University Development of Linguistic Linked Open Data (LLOD) Resources for Collaborative Data-Intensive Research in the Language Sciences July 25th, 2015 This tool funded by National Science Foundation. CI-TEAM program. “Transforming the Primary Research Process Through Cybertool Dissemination: An Implementation of a Virtual Center for the study of language acquisition”. (María Blume and Barbara Lust). NSF OCI-0753415 Purpose and goals Purpose To create a culture of national and international collaboration among researchers and their labs. to create shared principles and methods of data documentation, management and collaboration to enable the practice of these principles and methods through the use of cybertools. Goal To provide a new generation of researchers and students, including those with diverse disciplinary, geographical and cultural backgrounds, with a solid foundation in these principles and methods through the use of these new cybertools. Purpose and goals Purpose To create a tool for collaboration that can allow for the management, documentation and analysis of crosslinguistic language data. Goal To provide a resource that allows users to manage data across datasets and projects, including the ability to reuse previously collected data. Virtual Center for the Study of Language Acquisition (VCLA) http://vcla.clal.cornell.edu/ The Virtual Center for Language Acquisition Research A community of researchers that are linked in their assumption that the most fundamental questions of language acquisition require interdisciplinary collaboration, both theoretical and empirical methods, and a cross-linguistic approach. Eight founding member institutions. One international collaborator in Peru. The VCLA website A center that unites through the web a series of research labs across the country and the world. [Its] mission is to foster collaborative research among researchers working in the area of language acquisition, collaborations which are potentially interdisciplinary, which may be at a distance geographically and which may involve the comparative study of multiple languages, and interactions on shared data, as well as a variety of lab methods. The VCLA List of projects by VCLA members to give undergraduate and graduate students and other researchers ideas for future research and collaboration. Courses Courses We have created a series of courses centered on research methodology, best practices and the intensive hands-on experience with cybertools such as the Experiment Bank and Web DTA. The Web Conferences: Elluminate Cornell and UTEP students meeting during our first course. The Web Conferences: Elluminate A UTEP student presents her research proposal to peers and faculty at UTEP and Pontificia Universidad Católica del Perú. The Virtual Linguistics Lab (VLL) http://clal.cornell.edu/vll The Virtual Linguistics Lab The VLL portal provides structured access to the components of a virtual linguistic lab: Materials for the scientific and collaborative study of language acquisition. web-based courses, integrating synchronous and asynchronous forms of interactive information distribution. Meeting the Challenges through a Virtual Linguistics Lab The VLL includes a series of web-based courses, integrating synchronous and asynchronous forms of interactive information distribution, a web-based experiment bank and data transcription and analysis tool, with an associated set of data collected over 20 years by the Cornell Language Acquisition Lab and other labs across the USA. a series of structured audio-visual demonstrations and related learning modules. These materials are integrated into a universitysupported cyberinfrastructure to ensure the high availability needs of a distance learning program VLL Components Laboratory methods: Research methods manual. Standards. Courses Teaching materials. Audio/visual samples (lessons, assignments, data). Web conferences. Discussion board. VLL Portal Topics Teaching Modules Provide graduate and undergraduate students with a set of interactive web-based lessons which teach them the specific procedures of investigating language knowledge. These link to Audio/video examples. Glossary The experiment bank The methods manual The Data Transcription and analysis tool. Published or unpublished papers Specific exercises/homework. The modules provide students with selected excerpts of language data to be studied and analyzed give students a virtual experience of an interview of a subject and real experience of analysis of the subject's language. allow students to learn a method to use in own research or practice allow students to learn how to analyze previously collected data A teaching module The teaching modules give students access to: Audio/Visual examples. • PowerPoint presentations explaining the methods. • • Readings Interactive assignments. • Audio/Visual Materials Teaching the procedure for the Act Out task. Audio/Visual Materials An experimental study showing the Elicited Imitation task done with a 2-year-old in Peru. An interactive assignment Elicited Imitation assignment comparing monolingual and bilingual children. These assignments train students to transcribe and analyze data, and compare their results to the original paper’s results. An interactive assignment A child subject enjoys the experiment. These samples give the student a virtual experience of data collection. Cybertools Cybertools Multilingualism questionnaire. Data Transcription and Analysis Tool (DTA) includes an Experiment Bank gives access to Libraries of comparable data. DTA User’s Manual. Virtual workshops. Cybertool access through VLL Data quality: the opportunities Technology can enable: Precision and completeness in data-capture procedures Capacity for many levels of structural description and analysis Capacity to link points of data along multiple dimensions Why do we need the DTA tool in the study of language acquisition and use? Multiple languages Multiple formats Multiple methods of data collection observational vs. experimental, cross-sectional or longitudinal. Multiple aspects of metadata age and/or developmental/cognitive stage of speaker. social and pragmatic context culture. Data management and use Different labs practice distinct forms of data management. The scientific use of any single record requires access to many levels of data, ranging from raw (establishing provenance) to structured and analyzed data (establishing intellectual worth). Data Transcription and Analysis Tool (WebDTA) 31 http://webdta.clal.cornell.edu/site/login A primary research tool which provides the user with a web interface which guides him/her through steps for generating, storing and accessing data. Users contribute data in a structured, uniform manner. Users access calibrated data from a shared relational database. Diverse data become comparable at many levels. The Data Transcription and Analysis Tool (WebDTA) 32 Collects all information related to a study (experimental or observational) in the same location. Makes all information about the study available to the public. Researchers seeking to replicate or criticize it. Students studying the particular method or research topic. Trains researchers and students on how to organize research data. WebDTA Tool 33 It stores its data in a relational database on a centralized server (other systems store flat text files). It supports both Natural Speech and Experimental data. It can be used for both Research and Education (structured teaching modules). It is open-ended. New specialized coding screens can be added. It has robust query capabilities based on its relational database structure. Brief development history Virtual Language Laboratory (VLL), its Data Transcription and Analysis Tool (DTA, WebDTA) and the proprietary methodology that supports these were developed over 30 years of personal effort by Prof. Lust and student and peer contribution. Several rudimentary versions of the DTA were sketched out and crafted in old software. However, when user friendly relational databases became common place, research and student users were able to define a new approach. A more powerful version of the DTA using FoxPro as the engine was developed. Katharina Boser, Reiko Mazuka, Julie Eisele, Paul Navarre, David Parkinson, Shamitha Somashekar, and María Blume. Brief development history Cliff Crawford provoked the CLAL's development of a webbased interface for the DTA tool and has held major responsibility for programming of the first web-based interface, using PostgreSQL. The current version of the DTA tool, unifying the previously independent cybertool Experiment Bank with the DTA was developed by Ted Caldwell and Greg Kops at Gorges, Web Development and Internet Solutions (http://www.gorges.us/) with María Blume and Barbara Lust, and input from students of the Cornell Language Acquisition Lab (Natalia Buitrago, Gabriel Clandorf, Poornima Guna, Jennie Lin, and Jordan Whitlock and UTEP Marina Kalashnikova and Martha Rayas). DTA Schema Structure The current version of the WebDTA tool is built on Yii, a PHP web development framework that uses the "Model-View-Controller" pattern to structure the application and the "Active Record" pattern to manage records from the database. MySQL is used for the database platform. All are open source technologies. External links We are collaborating with Cornell University’s Albert Mann Library in their current pilot program, DataStaR (Data Staging Repository) intended to help researchers create high quality metadata in the formats required by external repositories…”(Steinhart 2010: 1) (Funded by the National Science Foundation (Grant No. 111-0712989) The program adopts a semantic web approach to metadata. At present, one VCLA dataset (Sinhala language) from more than 400 children studied in Sri Lanka has been entered in DataStaR, linking the VCLA database to the Library staging repository, and is available for collaborative use through this repository. DataStaR uses RDF (Resource Description Framework (RDF)) statements and OWL (Web Ontology Language) classes in order to integrate different metadata frameworks across disciplines. http://datastar.mannlib.cornell.edu/display/n6291 and http://www.news.cornell.edu/stories/Oct11/SinhalaTools.html Project sample An Experimental Project https://webdta.clal.cornell.edu/projects/151/overview A Natural Speech Corpus https://webdta.clal.cornell.edu/projects/15/datasets/ 40/sessions Coding Gesture https://webdta.clal.cornell.edu/projects/235/datasets /249/sessions/2648/transcriptions/1669/utterances Code-switching project https://webdta.clal.cornell.edu/projects/230/overview Queries https://webdta.clal.cornell.edu/queries Accessing the DTA Permission required due to Human Subjects Issues. Need to contact Barbara Lust (bcl4@cornell.edu) or María Blume (mblume@pucp.pe) Go to https://webdta.clal.cornell.edu/ (the link is in a doc called DTA address which we e-mailed you along with documents containing data from children which you can use to practice. We have removed the identifying data.) Acknowledgments María Blume and Barbara Lust. 2008. Transforming the Primary Research Process Through Cybertool Dissemination: An Implementation of a Virtual Center for the Study of Language Acquisition. NSF OCI0753415 Lust, Barbara. 2003. Planning Grant: A Virtual Center for Child Language Acquisition Research. National Science Foundation. NSF BCS-0126546 VCLA founding members: Cornell: Marianella Casasola, Claire Cardie, James Gair, and Qi Wang. NeuroFocus: Elise Temple Boston College: Claire Foley Rutgers University at New Brusnwick: Liliana Sánchez. Rutgers University at Newark: Jennifer Austin California State University at San Bernardino: YuChin Chien. Southern Illinois University at Carbondale: Usha Lakshmanan. Acknowledgments VCLA affiliates: City University of New Yors: Gita Martohardjono, Valerie Shafer, and Isabelle Barrière . Newcastle University: Cristina Dye. Ben Gurion University at the Negev: Yarden Kedar Tyndale University College and Seminary: Sujin Yang. Columbia University: Joy Hirsch. University of Texas at El Paso: Ellen Courtney and Alfredo Urzúa. University of California at San Diego: Sarah Callahan. Pontificia Universidad Católica Del Perú: Jorge Iván Pérez Silva Kyungsung University: Kwee Ock Lee Central Institute of English and Foreign Languages: R. Amritavalli Osmania University: A. Usha Rani. Acknowledgments Janet McCue and Barbara Lust 2004-2006. National Science Foundation Award: Planning Information Infrastructure Through a New LibraryResearch Partnership. (SGER=Small Grant for Exploratory Research) American Institute for Sri Lankan Studies, Cornell University Einaudi Center. Cornell University Faculty Innovation in Teaching Awards, Cornell Institute for Social and Economic Research (CISER). New York State Hatch grant. Our application developers Ted Caldwell and Greg Kops (GORGES). Our consultants Cliff Crawford and Tommy Cusick; Our student RAs: Darlin Alberto, Gabriel Clandorf, Natalia Buitrago, References Berners-Lee, Tim. .3/2009. Ted Lecture. Tim Berners-Lee on the next Web. http://en.wikipedia.org/wiki/Linked_data. Bickel, Balthasar, Bernard Comrie, and Martin Haspelmath. 2008. Leipzig Glossing Rules: Conventions for Interlinear Morpheme-by-Morpheme Glosses. Available online at http://www.eva.mpg.de/lingua/resources/glossing-rules.php. Blume, María and Barbara Lust, 2011a and in prep. Data Transcription and Analysis Tool User’s Manual. (with the collaboration of Shamitha Somashekar, and Tina Ogden). Blume, María and Barbara Lust. 2011b. Presentation to the National Science Foundation. CI Team Principal Investigator’s Meeting. University of Illinois at Urbana Champaign, Ill. May 2426. Transforming the Primary Research Process Through Cybertool Dissemination: An Implementation of a Virtual Center for the Study of Language Acquisition. NSF OCI-0753415. Farrar, S.O. and Langendoen, D.T. 2003 A linguistic ontology for the semantic web. GLOT International, 7(3) 97-100. Khan, Huda, Brian Caruso, Brian Lowe, Jon Corson-Rikert, Diane Dietrich and Gail Steinhart. 2011. DataStaR: Using the Semantic Web approach for Data Curation. International Journal of Digital Curation 2(6): 209-221. Lowe, Brian. 2009. DataStaR: Bridging XML and OWL in Science Metadata Management. Metadata and Semantics Research 46: 141-150. http://www.springerlink.com/content/q0825vj78ul38712/ References Lust, Barbara, Suzanne Flynn, María Blume, Elaine Westbrooks, and Theresa Tobin. (2010). Constructing Adequate Documentation for Multi-faceted Cross Linguistic Language Data: A Case Study from a Virtual Center for Study of Language Acquisition. In Grenoble, Lenore and Louanna Furbee, (eds.), Language Documentation: Theory, Practice and Values. pp. 127-152. Amsterdam/Philadelphia: John Benjamins. Open Archives Initiative (OAI), http://www.openarchives.org/ (15 Mar. 2005). Open Language Archives Community (OLAC), http://www.language-archives.org/ (24 Feb. 2011). Simons, G. Farrar, JS., Fitzsimons, B., Lewis, W., Langendoen, D.T. and Gonzalez, H. 2004a. The semantics of markup: Mapping legacy markup schemas to a common semantics. Simons, G., Fitzsimons, B., Langendoen, D.T., Lewis, Wm., Farrar, S., Lanham, A., Basham, R. and Gonzalez H. 2004b. http://emeld.org/workshop/2004/langendoen-paper.html Steinhart, Gail. 2010. DataStaR: A Data Staging Repository to Support the Sharing and Publication of Research Data. 31st Annual IATUL Conference - The Evolving World of eScience: Impact and Implications for Science and Technology Libraries. June 20-24, 2010. West Lafayette, IN. http://docs.lib.purdue.edu/iatul2010/conf/day2/8/. DTA: Project list DTA: Project info Metadata on Experimental or Naturalistic research. These screens help students and researchers save/access the basic information for a research study and also keep track of publications, presentations, related studies, and bibliography related to a research project. DTA Metadata: Subject info Subject information that allows for one subject’s data to be used in multiple datasets. DTA: Research Design These screens help students and researchers save/access the research study’s design. DTA: Summary Report This report shows the data at the project level. DTA: Summary Report From the project report one can access the summary reports for the different datasets of the project. DTA: Summary Report DTA Metadata: Session info This screen provides info for every time a subject was recorded for a given dataset. DTA: Recordings One can include several “recordings” for each session, including audio, video, and previous transcripts. DTA: Transcription This screen allows one to transcribe, switch between recordings, and time-align recordings and transcripts. DTA: Basic Natural Speech Coding Basic levels of linguistic coding to train students. Additional levels of general or projectspecific coding can be created by users. DTA: Experimental Coding for Grammaticality Judgment Task. An example of a project specific coding created for an experimental task. DTA: Query A multicondition query. Different queries can be created and saved by users as needed. The Virtual Workshops The Virtual Workshops Topics Virtual Workshops teach users how to navigate our cybertools. The Virtual Workshops: The DTA Manual It allows users to take notes and has quizzes to check for understanding of the cybertool. The Virtual Workshops: A video demonstration Prof. Lust explains the purpose and motivation of the cybertool to students and researchers at Cornell, Rutgers New Brunswick, MIT and UTEP. The Virtual Workshops: A video demonstration The DTA tool programmer, Ted Caldwell, shows students the different DTA screens and their purpose with added comentary by María Blume and Barbara Lust.