Fanny Meunier Computer tools for the analysis of learner corpora

Fanny Meunier "Computer tools for the analysis of learner corpora" (Source: Granger, S. (ed.) 1998. Learner English on Computer. Longman. Chapter 2) FIND ANSWERS TO THE FOLLOWING QUESTIONS 1. Raw and annotated data > What is a raw corpus and a tagged (= annotated) corpus? > Name at least 3 forms of (linguistic, PK) annotation. 1.1. POS tagging > What is the average success rate of automatic POS taggers? > What is meant by the complexity/refinement of a tagset? > Do we need special POS taggers for tagging interlanguage data? > How can one determine the best tagger for a given research purpose? 1.2 Syntactic parsing > What is (syntactic) parsing? > What is the connection between POS tagging and parsing? > Can parsing be performed automatically? > What is 'skeleton' parsing? > What is 'partial parsing' ? 1.3 Semantic tagging > Do we have automatic semantic taggers? > Why would semantic tagging be of service to CLC research? 1.4. Discoursal tagging > Are discourse taggers available? 1.5. Error tagging > How can spellchecking help with the editing of non-spelling errors? > Can error editing become automatized? > What is the potential advantage of research on error-tagged corpora? 2. Working with software tools to analyse interlanguage 2.1 General statistics 2.1.1 Word counting > What is the advantage of word statistics obtained from the WORDS programme described by Meunier over simple word counts available e.g. in Microsoft Word? > Can we use different word counting tools when comparing different corpora? 2.1.2 Word/sentence statistics > What is the difference between a sentence and a T-unit? > What is the difference between non-native-speaker (NNS) and native speaker (NS) writers' use of varied sentence lengths? 2.2 Lexical analysis 2.2.1 Frequency analysis > How can frequency lists be applied to discover facts about learner language? > What is the dispersion / distribution of an item in a corpus and why is this information useful? > In general terms, what is a comparison of wordlists (NOTE: WordSmith Tools is available at IFA on the local network and in the 603 multimedia lab, should you need any of its features; I hope to be able to give you a demo soon) 2.2.2 Context analysis > NOTE: you should by now be familiar with all kinds of queries presented here; the specific syntax conventions described by Meunier are typical of WordSmith Tools and a few other off-line concordancers. Internet-based tools often require their own scripting conventions (so called 'regular expressions') for 'multiple-item queries' > What is a stoplist? > How does Meunier differentiate between 'collocation facilities' and 'collocation generators'? 2.2.3 Lexical variation analysis > What is the type-token ratio (TTR) ? > Can we use TTR to compare corpora/texts of different lengths? > According to Meunier, why is TTR not a discriminating feature between NS and NNS writers? 2.2.4. Other lexical measures > What is lexical density (LD) ? > Does LD depend on the length of a corpus? > How can lexical sophistication be measured automatically? 2.3 Grammatical analysis > What three techniques of querying a POS-tagged corpus can be applied to probe grammar use in a corpus? 2.4 Syntactic analysis > At what stage of advancement is corpus-based syntactic analysis? 3. Conclusion

Fanny Meunier Computer tools for the analysis of learner corpora

Related documents

Products

Support

Fanny Meunier Computer tools for the analysis of learner corpora

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib