Fanny Meunier "Computer tools for the analysis of learner corpora" (Source: Granger, S. (ed.) 1998. Learner English on Computer. Longman. Chapter 2) FIND ANSWERS TO THE FOLLOWING QUESTIONS 1. Raw and annotated data > What is a raw corpus and a tagged (= annotated) corpus? > Name at least 3 forms of (linguistic, PK) annotation. 1.1. POS tagging > What is the average success rate of automatic POS taggers? > What is meant by the complexity/refinement of a tagset? > Do we need special POS taggers for tagging interlanguage data? > How can one determine the best tagger for a given research purpose? 1.2 Syntactic parsing > What is (syntactic) parsing? > What is the connection between POS tagging and parsing? > Can parsing be performed automatically? > What is 'skeleton' parsing? > What is 'partial parsing' ? 1.3 Semantic tagging > Do we have automatic semantic taggers? > Why would semantic tagging be of service to CLC research? 1.4. Discoursal tagging > Are discourse taggers available? 1.5. Error tagging > How can spellchecking help with the editing of non-spelling errors? > Can error editing become automatized? > What is the potential advantage of research on error-tagged corpora? 2. Working with software tools to analyse interlanguage 2.1 General statistics 2.1.1 Word counting > What is the advantage of word statistics obtained from the WORDS programme described by Meunier over simple word counts available e.g. in Microsoft Word? > Can we use different word counting tools when comparing different corpora? 2.1.2 Word/sentence statistics > What is the difference between a sentence and a T-unit? > What is the difference between non-native-speaker (NNS) and native speaker (NS) writers' use of varied sentence lengths? 2.2 Lexical analysis 2.2.1 Frequency analysis > How can frequency lists be applied to discover facts about learner language? > What is the dispersion / distribution of an item in a corpus and why is this information useful? > In general terms, what is a comparison of wordlists (NOTE: WordSmith Tools is available at IFA on the local network and in the 603 multimedia lab, should you need any of its features; I hope to be able to give you a demo soon) 2.2.2 Context analysis > NOTE: you should by now be familiar with all kinds of queries presented here; the specific syntax conventions described by Meunier are typical of WordSmith Tools and a few other off-line concordancers. Internet-based tools often require their own scripting conventions (so called 'regular expressions') for 'multiple-item queries' > What is a stoplist? > How does Meunier differentiate between 'collocation facilities' and 'collocation generators'? 2.2.3 Lexical variation analysis > What is the type-token ratio (TTR) ? > Can we use TTR to compare corpora/texts of different lengths? > According to Meunier, why is TTR not a discriminating feature between NS and NNS writers? 2.2.4. Other lexical measures > What is lexical density (LD) ? > Does LD depend on the length of a corpus? > How can lexical sophistication be measured automatically? 2.3 Grammatical analysis > What three techniques of querying a POS-tagged corpus can be applied to probe grammar use in a corpus? 2.4 Syntactic analysis > At what stage of advancement is corpus-based syntactic analysis? 3. Conclusion