Reflective essay 1– Corpus linguistics and

advertisement
Reflective essay_1_Ling115
Jinxiao Song
Discussion on the first week: Introduction (Barlow, 2011; Norvig, 2011)
We talked about two articles on the first class. One is a presentation paper by Perter
Norvig on 2011, another is Michael Barlow’s paper ‘Corpus linguistics and theoretical
linguistics. Both articles gave us interesting discussions about a debate, and relationship
between corpus linguistics and classical linguistics.

Corpus Linguistics
Corpus linguistics is the study of language as expressed in samples of real world text. It
presents us a digestive approach to deriving a set of abstract rules by which a natural
language is governed or relates to another language. [1]

What research question does the article address? These two articles talks about a debate between data based linguistic research and the
classical, theoretical based approach.
1) Barlow introduces some basic issues concerning the treatment of theory and data in
corpus linguistics and outlines three broad areas in which corpus linguistics has made a
significant contribution to our understanding of language, including, the provision of
frequency information, the highlighting of the importance of collocations, and the
description of variation and text types. The article also points out the drawbacks of
traditional charategories of linguistic representation, and that corpus linguistics may serve
as a handful tool to uncover some language inherent differences using data-based
approaches. Besides, this paper illustrates such differences by providing an overview of
British and American linguistic traditions with a focus on their connection with findings
from corpus analysis. In addition, Barlow points out several current theoretical issues of
abstractions and generalizations, patterns in the data and patterns in the mind, as well as
the cognitive and social dimensions of language.
[1]
http://en.wikipedia.org/wiki/Corpus_linguistics
1
2) Norvig’s presentation article is more interesting. His essay discusses what Chomsky
said, speculates on what he might have meant, and tries to determine the truth and
importance of his claims. Chomsky strongly disagrees with the idea of using algorithms
as a tool for linguistic research. He argues about the unscientific nature of statistical
modeling as a method. But Peter Norvig, in response to Noam Chomsky’s criticism of
statistical/probabilistic based NLP, articulates well his perspectives on issues relating to
the philosophy of science and the relationship between science and engineering.
 Why do you find the research question interesting?
Corpus linguistics make use of programs, data collections, which are machine-readable,
to help human analyze a texts in a more convenient way. Various corpuses have been
proved to be helpful to linguistic research. For example, when we need to examine the
change of a word’s semantic meaning, such as when did this word first come into use,
what context is related to it, is there any statistical proves that can show the word’s
historical changes clearly. I realized the importance of corpus when I try to learn to use
COCA, the corpus of contemporary American English. I wanted to know how has the
word ‘entrench’ changed in its semantic meanings along with time. I just need to search
[entrench], and the corpus will give me back a summarized list of words such as
‘entrenched’, ‘entrenching’, as well as the token frequency of the year I choose, charts
that show clearly what genres are more likely to use this word, etc. This is very useful,
convenient, and fast. Except for semantics, corpus can be used to conduct language
acquisition research, spelling conventions, syntax changes. It’s hard to imagine how long
it will take a human to collect the same set of data and generate the same report.
However, at the same time, corpus data are at the very first stage based on human
understanding of linguistics. Chomsky’s theoretical approach definitely has contributed
to our understanding of the nature of language, but it does not mean statistical approach
can not help human understand language deeper. Without our intuitive justification of
what come out of statistical approach, corpus results can be misleading. As the example
in Norvig’s article points out “ no matter how many repetitions of ‘ever’ you insert, two
2
sentences are grammatical, two are not. A probabilistic Markovchain model cannot
handle all of English.”
 What conclusion about the question does the article draw? Barlow concluded that the relationship between corpus linguistics and theoretical
linguistics is multifaceted and hence numerous ways of approaching the topic present
themselves. Researchers will have differing views on particular aspects of the current
issues and on the relationship between corpus data and linguistic theory. We will see
progress within the field of corpus linguistics as the size and range of corpora increases
and the theoretical frameworks become more sophisticated.
Norvig criticized Chomsky’s view in a polite manner. He compared Chomsky’s
viewpoint to a Platonist, a rationalist and perhaps a mystic. What Chomsky argues is a
ideal, abstract. That’s why Chomsky is not interested in language performance. While
from Norvig’s point of view, it is more empirical, more useful to human world. We can
see that probabilistic/statistical approach have achieved huge achievements. Google,
Corpus, Speech recognition, dictionary, sociolinguistics… We should not deny the good
part of probabilistic/statistical approach because of some errors and misleading results
that are still noticeable by most of us.
 Do you find the authors’ approach satisfactory? If not, how else would you do it?
Chomsky’s argument is clearly not convincing. My own perspective on these things is
that the resources and methods of statistical speech and language processing, rather than
being some sort of alternative or competitor or replacement for the scientific study of
speech, language, and communication, instead give us wonderful new tools for doing
science in this area. Tools certainly have drawbacks, even using a knife we might cut our
fingers, but should we deny the wonderful artworks, foods, sculptures created by us using
a knife? What’s more important is to train us to handle these tools using more
experienced, controlled methods.
3
Download