Academic writing (updated on August 8)

advertisement
Academic writing in the
Computational Linguistics
research area
Kees van Deemter
University of Aberdeen
Aim of these slides
Aim: to offer guidance to students at the 2011
HIT-MSRA Graduate Summerschool in Human
Language Technology (Harbin, China)
With thanks to the participants of the summer
school, and to Judith Masthoff, whose
comments led to various improvements
Plan of the talk
1. Paper reviewing: a game for anthropologists,
adapted for Computational Linguists
2. The peer review process
3. Some ultra-brief notes on writing style and
plagiarism
A game: Let’s review some papers
A game by the anthropologist Jim Moore (UCSD)
(adapted from his web site)
Moore’s Game
1) I think the sky is blue. The end.
1) I think the sky is blue. The end.
BAD: no research; all opinion. Research is
about trying to learn more than what you
already know/think.
2) Jones (2004) says the sky is blue. The end.
2) Jones (2004) says the sky is blue. The end.
POOR: If it were Jones (2004), Wong (2005),
and Dagosto (2010), then OK (though you
picked a dull topic!)— This is a "report,” a
dressed-up set of notes.
3) Jones (2004) says the sky is blue. I think this
is wrong because it looks kinda grey to me.
The end.
3) Jones (2004) says the sky is blue. I think this
is wrong because it looks kinda grey to me.
The end.
POOR: A hint of something interesting, but
just one person's opinion. Arguments need to
be supported by data and/or explicit logic.
4) Smith (1902) says the sky is yellow. Jones
(2004) says it is blue. The end.
4) Smith (1902) says the sky is yellow. Jones
(2004) says it is blue. The end.
BAD: no attempt at resolution of obvious
conflict. (Another "report")
5) Smith (1902) says the sky is yellow. Jones
(2004) says it is blue. Experts disagree, but
that can be explained because they were
writing 102 years apart. The end.
5) Smith (1902) says the sky is yellow. Jones
(2004) says it is blue. Experts disagree, but
that can be explained because they were
writing 102 years apart. The end.
Unconvincing: The author attempts to resolve
the conflict, but the resolution looks silly. (The
sky was yellow in 1902??)
6) Smith (1902) says the sky is yellow. Jones
(2004) says it is blue. Doe (1967) describes a
neurological disorder affecting people who
spend too much time on Black's Beach. This
disorder reverses colors so that one perceives
"yellow" when looking at blue objects. Since
Smith lived in La Jolla (Who's Who in LJ, 1910),
she may have been suffering from this
disorder. The end.
6) Smith (1902) says the sky is yellow. Jones
(2004) says it is blue. Doe (1967) describes a
neurological disorder affecting people who spend
too much time on Black's Beach. This disorder
reverses colors so that one perceives "yellow"
when looking at blue objects. Since Smith lived in
La Jolla (Who's Who in LJ, 1910), she may have
been suffering from this disorder. The end.
GREAT: Given a paradox, the author found
relevant material, and suggested a resolution.
• So much for anthropology …
How about Computational Linguistics?
1) I have an algorithm X for computing C.
The end.
Assume: C is a worthwhile problem
Author explains X well
How about Computational Linguistics?
1) I have an algorithm X for computing C.
The end.
Poor. X may be great, but we don’t know!
How about Computational Linguistics?
2) I have an algorithm X for computing C. Its Y
score is Z. The end.
How about Computational Linguistics?
2) I have an algorithm X for computing C. Its Y
score is Z. The end.
Poor. How can readers know if Z high or low?
How about Computational Linguistics?
3) I have an algorithm X for computing C. Its Y
score is Z. Algorithm X’ has a Y score of Z-α.
The end.
Assume: α is substantial
X’ is as computationally efficient as
X
How about Computational Linguistics?
3) I have an algorithm X for computing C. Its Y
score is Z. Algorithm X’ has a Y score of Z-α.
The end.
Better. Comparison with another algorithm.
But is X’ the best algorithm to compare X
with? Or is it just a straw man?
How about Computational Linguistics?
4) I have an algorithm X for computing C. Its Y
score is Z. The standard algorithm X’ (Carroll
et al. 2010) has a Y score of Z-α. The end.
How about Computational Linguistics?
4) I have an algorithm X for computing C. Its Y
score is Z. The standard algorithm X’ (Carroll
et al. 2010) has a Y score of Z-α. The end.
Getting there! (A standard algorithm makes a
good basis for comparison.) This could give
you a nice conference paper. – But why does
X have a better Y score than X’?
How about Computational Linguistics?
5) I have an algorithm X for computing C. Its Y
score is Z. The standard algorithm X’ (Carroll
et al. 2010) has a Y score of Z-α. This is
probably because X takes U into account,
since U is known to be relevant for C
(Watanabe 1996). The end.
How about Computational Linguistics?
5) I have an algorithm X for computing C. Its Y
score is Z. The standard algorithm X’ (Carroll
et al. 2010) has a Y score of Z-α. This is
probably because X takes U into account,
since U is known to be relevant for C
(Watanabe 1996). The end.
Good! – But does the Z score matter?
How about Computational Linguistics?
6) I have an algorithm X for computing C. Its Y
score is Z. The standard algorithm X’ (Carroll et
al. 2010) has a Y score of Z-α. This is probably
because X takes U into account, since U is known
to be relevant for C (Watanabe 1996). The Z
score is known to correlate well with human
judgments (Zhao 2010) and human task
performance (Yang 2011). (…) The end.
How about Computational Linguistics?
6)
I have an algorithm X for computing C. Its Y score is Z. The standard
algorithm X’ (Carroll et al. 2010) has a Y score of Z-α. This is probably
because X takes U into account, since U is known to be relevant for C
(Watanabe 1996). The Z score is known to correlate well with human
judgments (Zhao 2010) and task performance (Yang 2011). The end.
Very good! Potentially, you’ve got yourself the core of
journal article.
Still, maybe X is a boring variant of X’? Moreover, is there any
generality to what you’ve found, or is X only applicable to problem C?
How about Computational Linguistics?
7) I have an algorithm X for computing C. Its Y score is Z. The
standard algorithm X’ (Carroll et al. 2010) has a Y score of
Z-α. This is probably because X takes U into account, since
U is known to be relevant for C (Watanabe 1996). The Z
score is known to correlate well with human judgments
(Zhao 2010) and task performance (Yang 2011). X uses an
entirely new method, which I also show to be applicable to
an important open problem in physics. (Applications to
cures against obesity and hair loss are plausible.) The end.
Great! The resulting journal article
could change your research area.
• In practice, most of us are happy if we reach
level 6, of course.
(And as for obesity and hair loss: don’t believe
everything you read.)
2. The peer review process
Peer review: the main mechanism for deciding whether
a result is worth publishing (e.g., as a journal article)
(1) Authors submit article
(2) Editors select expert reviewers (“peers”)
(3) Reviewers assess article
(4) Editors decide: accept/reject/revise
If revise then authors may go back to (1)
Submissions as conference papers lack “revise” option
A review form (zoom & scroll)
*** REVIEW:
--- Please provide a detailed review, including justification for
--- your scores. This review will be sent to the authors unless
--- the PC chairs decide not to do so. This field is required.
….. (this is where you write a few paragraphs about the strengths and weaknesses of the paper.
Suggestions for improvements)
…..
-------------------------------------------------------------*** REMARKS FOR THE PROGRAMME COMMITTEE:
--- If you wish to add any remarks for PC members, please write
--- them below. These remarks will only be used during the PC
--- meeting. They will not be sent to the authors. This field is
--- optional.
….. (quite often you’ll leave this section blank)
--------------------------------------------------------------
Review form (ctd.)
*** OVERALL EVALUATION:
---------------
3 (strong accept)
2 (accept)
1 (weak accept)
0 (borderline paper)
-1 (weak reject)
-2 (reject)
-3 (strong reject)
*** REVIEWER'S CONFIDENCE:
-----------
4 (expert)
3 (high)
2 (medium)
1 (low)
0 (null)
*** APPROPRIATENESS: from 1 (lowest) to 5 (highest)
-----------
5 (appropriate (most submissions))
4 (appropriate for CL/NLP but less so for this event)
3 (possibly relevant, but not quite computational)
2 (only marginally relevant)
1 (inappropriate)
*** CLARITY: from 1 (lowest) to 5 (highest)
-----------
5 (admirably clear)
4 (understandable by most)
3 (mostly understandable with effort)
2 (important unanswered questions)
1 (mostly confusing)
Review form (ctd.)
*** ORIGINALITY: from 1 (lowest) to 5 (highest)
-----------
5 (surprising: new and noteworthy insights)
4 (creative: relatively few would have put these ideas together)
3 (somewhat conventional: a number of people could have come up with this)
2 (rather boring: a minor improvement on previous work)
1 (not new)
*** SOUNDNESS: from 1 (lowest) to 5 (highest)
-----------
5 (excellent: approach is apt, claims are supported)
4 (solid: approach or evaluation could be strengthened though)
3 (fair: main claims probably correct)
2 (troublesome: should have been done differently, or justified better)
1 (fatally flawed)
*** IMPACT: from 1 (lowest) to 5 (highest)
-----------
5 (major: will alter others' research direction or basic approach)
4 (substantial: ideas or results will help others' ongoing research)
3 (somewhat influential: will be cited for comparison or as a minor contribution)
2 (marginal: might be cited)
1 (none)
*** END
--------------------------------------------------------------
3. Briefly:
How about the actual writing?
Excerpt from
W.Strunk’s “The Elements of Style”:
http://www.bartleby.com/141/
(with help from a web page of the University of
Central Kansas)
1. Use simple and direct sentences. Usually, the
active voice is best. Write "Good writers avoid
passive voice," not "The passive voice is avoided
by good writers.”
(Possible exception: when the subject of the
sentence is irrelevant. In some disciplines, most
academics prefer “A study was performed” over
“We performed a study”. There is evidence,
however, that agent-less passives are difficult to
interpret.)
2. Write positively. For example, write "The
rats were always sick" instead of "The rats
were never healthy." Use definite and specific
sentences. For example, write "it rained every
day for a week" instead of "a period of
unfavorable growth conditions set in.”
3. Delete unnecessary words. For example:
the question as to whether…………………….. whether
the main aim of this study will be to determine …..
we seek to determine
advance notice………………………………….. notice
at this point in time……………………………... now
be that as it may……………………………..…. but
in the event that……………………………….… if
general consensus…………………………… .. consensus
due to the fact that…………………..………….. because
chemotherapeutic agent…………….………… drug
4. Each paragraph should convey a single
major idea and have a topic sentence. The
topic sentence should state the main idea of
the paragraph.
Comment: This is just one technique. Others
may work equally well.
5. Proofread your report before finalizing it.
Have a friend or lab partner read a draft of
your writing and suggest improvements.
6. Don't plagiarize. Learn to summarize and
cite the relevant references.
Plagiarism is “taking someone else’s work and
passing it off as one’s own”
• There is a grey area. I got my definition from the
Mac’s Dictionary application. Do I have to
acknowledge this?
• If you take someone else’s ideas then (try to) say
who had them first
• If you also take someone else’s words verbatim
(for more than just a few words) then
put quotes around the text as well
– A grey area: “just a few words”
Caveat
These remarks on writing style are very
sketchy. A course might be useful for you.
Practice is essential.
Summing up
• Paper writing: there’s more to a good paper
than an algorithm and a score
• Peer review: a cornerstone of the scientific
process. An important process to understand.
Get involved early, as an author and a reviewer
• Writing style: look up W.Strunk’s “Elements of
Style” on the web.
The End
Download