Pre-Reading Questions for Session 9

advertisement
Applied Linguistics & Foreign Language Teaching
Dr. Mei-hui Liu
Fanny Chang
G99120009
Pre-Reading Questions for Session 9
Reading for the sixth class session (Nov 9, 2011): Schmitt (2010), Chapter 6: Corpus linguistics
* Questions:
1. The authors defined that “a corpus refers to a large principled collection of natural texts” (p. 89).
From your reading of pp. 91-92, what are some of those “principles” that should guide corpus
construction?
Ans: Because corpora creators collect the data information they need from natural texts (e.g., the texts
from real occurrences like daily conversation, newspapers, speeches, etc.), they must not seek for what
they need from artificial resources (i.e., simulated conversations, etc.). Otherwise, the data information
will be valueless because they cannot uncover the real language use situations. Therefore, such principles
as deciding the purpose of a corpus (general or specialized corpora), finding out samples in real contexts,
choosing the right tool to transfer raw texts onto computers are essential procedures that corpora creators
need to bear in mind.
2. What are some differences between a general corpus and a specialized corpus? (see pp. 91-92)
Ans: The major differences between a general corpus and a specialized corpus are on their nature of data
information. The goal of general corpora can be said to include as many linguistic features as possible to
fulfill researchers’ or language learners’ needs. For example, they include about 100 million words to
make frequency lists, concordance program, etc. for users. Though specialized corpora also comprise
these features (i.e., frequency lists, etc.), they mainly focus on more specific areas (e.g., child language,
etc.). Therefore, general corpora usually comprise a larger amount of data than specialized corpora. As for
specialized corpora, they are more likely to aim for particular areas. As mentioned from the book, such
corpora might aim to explore child language, teenage language, newspaper language, etc.
3. Why are corpora of written language much more common than those of spoken language? (see p. 94)
Ans: The most salient reason that causes written language more common than spoken language in corpora
is because of the ways they are transferred into the electronic texts. For written corpora, creators just need
to use scanners and other software to scan paper documents into electronic files. Creators of spoken
corpora do not have so convenient equipments as written corpora; they have to do more tiring works like
transcribing the natural texts onto computer and making them into electronic files. Spoken corpora can be
said having one more step of working on text transferring than written corpora. However, written corpora
are not completely away from troublesome process. They also need to do error-correction and
proofreading if they use certain software to scan their paper documents. Therefore, both spoken corpora
Applied Linguistics & Foreign Language Teaching
Dr. Mei-hui Liu
and written corpora have their own difficult parts to deal with. From this perspective, spoken corpora and
written corpora actually have similar steps to go through.
4. According to pp. 94-95, what kinds of things can be encoded via markup/annotation/tagging?
Ans: Markup, annotation, and tagging substantially serve as facilitators to enrich the information and
value of a corpus. Moreover, they help users have fuller understanding of the data information. Without
these techniques, a corpus can only be utilized to look for instances. Markup and annotation basically
code different linguistic features. That is, one code macro level characteristics and others code micro level
features. For markup, some structural features in written corpora like titles, authors, places, subheadings,
etc. (i.e., background information of the data information) are more likely to be encoded. So, the use of
markup help a corpus gain additional information in which browsers can have better understanding when
seeing the resources. As for annotation and tagging, they provide further specific information for
browsers. Actually, tagging is one technique under annotation. The example from the book talking about
tagging is called part-of-speech tagging which is one of the annotations. In this technique, grammatical
features of lexical items will be labeled for the lexical items. Therefore, those who have needs in the
information can have clearer investigations.
5. What is the benefit of adding such markup/annotation/tagging to the raw texts in a corpus?
Ans: The three techniques basically possess the same objective. That is, they all aim to provide additional
information of raw texts to facilitate browsers’ understandings. However, they enrich browsers’
knowledge from different angles respectively (e.g., markup tends to provide help for background
information of a raw text; whereas annotation and tagging are inclined to help browsers understand the
linguistic features of lexical items). With the help of markup, corpora browsers can have more thorough
understanding about the raw texts (i.e., knowing the gender, age, mother language, occupation, etc. in a
spoken corpus). Annotation and tagging then give browsers information about individual lexical item.
Therefore, whenever browsers have doubt in the process of browse, they can look back to the unfamiliar
items to know more about their information.
6. On p. 98 & 100, the authors mention 2-3 things that corpus analysis can tell us. What are they?
Ans: The authors mention a few things such as frequency of occurrence information, word lists,
concordancing packages, concordance program etc. The corpus information of frequency of occurrence is
the tool of searching for the word frequency information. For example, it will show browsers how
frequent a word is used by revealing numbers next to the word. So, this information can also be used to
compare two words’ frequency uses. Word lists provide a helpful guideline for teachers when they are
deciding which word to teach. If a teacher has a hard time choosing target words for teaching airport
English, the teacher can consult corpora to look for the words that are of frequent uses in the area.
Applied Linguistics & Foreign Language Teaching
Dr. Mei-hui Liu
Concordancing package and concordance program look similar literally, however, they have different
functions. The former one mainly presents how a word is being used in different contexts. Therefore,
browsers can see a list of occurrences of the target/chosen word. The latter one provides browsers with
information about what words that usually occur together. This corpus is very helpful because language
learners can check the information of the unfamiliar words and see what usually accompany with the
searched lexical items.
7. The authors reviewed a number of corpus research studies on pp. 100-101. What are the other specific
questions or topics you think would be interesting to investigate through corpus research?
Ans: I think investigating the uses and different types of assignment written language will be interesting
and useful for college students. When I was a university freshman, I found me and my peers usually
utilized spoken words into our writings (e.g., compositions/short essays). Because we were novice in
writing, we could not distinguish written language from spoken language. Though some words are both
applicable in written and spoken contexts, there are still words that are more appropriate to be used in
only one context. If a student uses a large amount of spoken words into writing, his/her writing will look
like a transcription instead of a composition. Therefore, this topic can help novice writers better
understand the various types of written words.
8. According to the authors, what are two general ways that corpora can be applied to language teaching?
(see p. 102)
Ans: Corpora can be applied into classroom language teaching through two ways. The first one aims to
facilitate language teaching from teacher’s perspective. For example, teachers can take corpora as a
reference to adjust and ensure their teaching materials. If a teacher wants to teach spoken academic
language to his/her students for formal speech, he/she can use corpora to investigate the features and
words that are of frequent uses in formal speech. Corpora actually save time for teachers because their
data resources are vast. Teachers can prepare a rich lesson and have more time maybe design classroom
activities. The second way focuses more on learners’ interactions with corpora. However, computer
equipments need to be sufficient in order to involve learners into the corpora environment. If this
precondition is not available, then teachers can also print out the corpora information for their students.
For example, a printout about all the uses of the word ‘perceive’ with various contexts and patterns
associated with it will be helpful to solve the problem of equipment insufficiency.
9. Which of the example activities (see pp. 102-103) seem most interesting to you? Why?
Ans: There are many activities mentioned in the chapter like frequency lists, collocational activities, etc.,
but I think concordance is the most interesting one among the above because of its complexity.
Concordance refers to the shared meaning of a series of synonyms. I think concordance is complex
Applied Linguistics & Foreign Language Teaching
Dr. Mei-hui Liu
because a word usually has different meanings in its nature. In other words, many words could possess
the same meaning in certain degree (e.g., ‘speak’, ‘say’, ‘talk’, and ‘tell’ have similar meanings in
Chinese). Though some words may have similar meanings, their uses are mostly different. For example,
the four words ‘speak’, ‘say’, ‘talk’, and ‘tell’ are used in different contexts and for different purposes.
Therefore, it is interesting to rely on corpus resources to investigate the different usages of different
words which possess the similar meanings. In this way, as a language learner, I can know more accurately
about how a word should be used (e.g., if I want to express I have an ability of a language, I will use
‘speak’ I can speak Japanese).
10. Please note down Two punch lines of this chapter.
(1). P. 101: Corpus-based studies of particular language features and comprehensive works such as The
Longman Grammar of Spoken and Written English (Biber et al., 1999) will also serve language teachers
well by providing a basis for deciding which language features and structures are important and also how
various features and structures are used.
(2). P. It is worth noting here that the use of concordancing tasks in the classroom is a matter of some
controversy- strongly advocated by those who favor an inductive or data-driven approach to learning
(Johns, 1994), but criticized by others who argue that it is difficult to guide students appropriately and
efficiently in the analysis of vast numbers of linguistic examples (Cook, 1998).
11. Please write down any questions or comments, if any, after doing the reading assignment.
If we (as teachers) want to incorporate language corpora into our teaching, we can introduce the corpora
presenting frequency list, pattern use, etc. But is it really necessary to introduce the frequency information
together with the newly introduced words? If yes, what benefits will language learners have when
knowing the frequency information?
Note: You don’t need to put answers here, but make sure that you:
(1) understand what frequency list, concordance listing, and KWIC refer to;
(2) try the hand-on activity and see the suggested answers in the back of the book.
Download