Predicting Text Quality for Scientific Articles Annie Louis

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence
Predicting Text Quality for Scientific Articles
Annie Louis
University of Pennsylvania
Department of Computer and Information Science
Philadelphia, PA 19104
lannie@seas.upenn.edu
browing experience. Further, we can also apply our predictions to automatically evaluate the linguistic quality of systems which produce summaries of science articles.
Abstract
My work aims to build a system to automatically predict the writing quality in scientific articles from two
genres—academic publications and science journalism.
Our goal is to employ these predictions for article recommendation systems and to provide feedback during
writing.
Corpus of Good and Poor Science Writing
For journal articles, citations and publication venue are
good indicators of the quality of the content in these articles but are not directly associated with writing quality. In
my prior work, I have analyzed factors related to quality
of content and those affecting linguistic quality for summaries of generic news articles (Louis and Nenkova 2009;
Pitler, Louis, and Nenkova 2010). But for science writing,
the correlates of linguistic quality have not been explored so
far. To start this line of research, I will build the first large
scale corpus of science articles with text quality ratings.
We plan to use journal articles from different domains
within biology for our corpus of academic writing. I will
ask graduate students in these areas to annotate the writing
quality of these articles. I have performed a pilot annotation
on 17 articles from the ACL conference proceedings on the
topic of Machine Translation. Each article was rated on a
scale of 1 - 5 for individual sections—abstract, introduction,
related work, and conclusion. Around 50% of the abstracts
were rated poor and 30% of the other sections also had low
ratings. So the quality of writing does vary considerably.
For the news domain, I have collected a corpus of good
science writing in the form of New York Times articles
which were selected to appear in the ‘Best American Science Writing’ series. These books, published starting from
1999, comprise of articles from newspapers each year with
excellent science writing. This set of 65 well-written articles forms our development data for analysis. Later, we plan
to collect ratings on more articles to obtain examples of both
good and poor writing. I will also ask people to provide finer
level ratings for interest level of leads, background knowledge required, explanation of research and the storyline.
Introduction
For people in academia and research institutions, writing is
an important and regular activity to produce journal articles,
grants, patent applications, and reports to funding agencies.
However, scientists and students receive very little instruction on science writing. Tools which can provide automated
feedback on writing can greatly aid academic writers by giving quick and ready comments. Another mode of science
writing is in the form of news where journalists recount
research findings and their impact. Here, identifying wellwritten and interesting articles could help to develop article
recommendation systems. Such applications would depend
on identifying linguistic properties of texts which are correlated with how readers perceive their quality. But a comprehensive and computational study of science writing quality
has not been done so far.
In my thesis, I aim to identify indicators of writing quality for scientific articles. Specifically, I consider two forms
of science reporting—academic writing in journal publications and science journalism where findings from academia
and industry are explained in lay terms. Using a corpus of
quality ratings obtained from the respective target audience,
I will study which linguistic properties of the texts can best
predict the ratings. I will investigate use of these predictions
in two applications—authoring tools for academic writing
and article recommendation service for science news.
My research will have a wider impact than the specific
tasks I explore in my thesis. Typically, information retrieval
for scientific articles is guided by citation analysis and impact factor of journals. For news writing, only relevance and
recency is considered. Article rankings which also incorporate writing quality will significantly improve the user’s
Automatic Prediction of Text Quality
Good academic writing involves a number of skills: ability to compare ideas with other work, motivate the research,
substantiate and defend claims, and clearly detail the experiments. Further, the writing must be non-verbose and direct.
In news writing, several other factors also come into play.
c 2011, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
1853
The research should be reported in a plain manner and made
interesting for lay readers. Writers should also provide adequate background knowledge and employ comparison with
known items to explain the research.
I aim to identify measurable indicators of such properties
related to good writing and use them to build an automatic
predictor of text quality. Some of the specific tasks I plan
to address involve the identification of general versus specific sentences, identification of rhetorical zones, prediction
of verbosity level, detecting the use of visual and concrete
words, and analysis of article topic.
The idea of general and specific sentences in writing relates to the process of presenting a claim and then substantiating it with more details. In fact, it has been observed that
in academic writing, articles take the shape of an hour-glass
with the introduction and conclusion presenting general material, and the experiment and methods sections containing a
lot of details (Swales and Feak 1994). A measure of the distribution of these two types of sentences in an article could
be indicative of quality. But there are no annotations for
such sentences available. So I have built a classifier by making use of discourse relations (Louis and Nenkova 2011).
In the Penn Discourse Treebank corpus, there are annotations for Instantiation type discourse relations which relate
one general and one specific sentence. By training on these
sentences using features related to words, polarity, language
models and specificity, our accuracy has reached 75% for
predicting a sentence as general versus specific.
Another approach is to analyze the different rhetorical
zones in an article: problem definition, comparisons with
other approaches and motivation for the proposed approach.
In prior work, I have built discourse parsers based on semantic features to identify rhetorical relations in general news
articles (Pitler, Louis, and Nenkova 2009). I plan to explore
similar methods for zone identification in science articles. I
will then analyze how the size and location of these zones
is correlated with quality. Further, we can build a Markov
chain to record the succession of zones in well-written articles and use it to predict shifts which lead to poor quality.
I am also building a method to identify verbose sentences. Using aligned uncompressed and compressed sentences, I have identified which syntactic productions have a
high probability of deletion. Using this information, I will
score each sentence by the deletion probability of the productions it comprises. Sentences with a high probability under this model can be called more verbose.
Apart from the above generic indicators of quality, my
preliminary studies have pointed towards a few other properties which are unique to science writing in the news.
Better articles employ visual words which invoke images
in the minds of the readers. Similarily, words that have an
‘audible’ characteristic, for example, “swooshes”, “rattle”
are also appealing when used in articles. Further, concrete
words would be more desirable compared to abstracts ones
such as “important” and “significant”. For these three dimensions, I will learn lexicons over a large corpus of news
using some seed words and a label propagation mechanism
to find similar words.
Certain types of sentence constructions also appear to be
suited for invoking surprise and creating interest. I have noticed such a trend particularly in lead paragraphs. By examining how the syntax of lead sentences varies from average
sentences elsewhere in the article, I will obtain some syntactic correlates of quality writing.
In addition, certain topics when presented in the news tend
to generate more interest. For example, articles on medicine
and health comprise the majority of well-written articles in
our corpus. So I will analyze the correlation of topic with
text quality.
Authoring Tools and Article Recommendation
A straightforward application of our predictions would be
in the context of news article recommendation. To evaluate this, I plan to conduct a user study in a browsing scenario. The user will be asked to identify topics which he is
interested in. Then we can provide articles on the topic ordered by our writing quality predictions and compare the
reported user experience to an interface where articles are
chosen based only on relevance to the topic.
I will also develop a tool to provide authoring support for
academic writing. Simply providing ratings for sections will
be less useful in this setting. Rather the tool will provide
annotations on drafts based on different factors that are correlated with writing quality. For example, we can highlight
general sentences which are not followed by proper substantiation, offer suggestions for reorganizing by tracking the
layout of different zones and identify inadequate analyses
by recording sizes of the rhetorical zones. These annotations
would be a useful resource for comments which are quick,
readily available, and consistent.
Current and Future Work
I have collected news articles and publications for our experiments and am preparing to conduct the annotations. I have
built the general versus specific classifier. I have also experimentally validated some indicators of quality for science
journalism. The studies on academic writing and authoring
support is mostly future work.
References
Louis, A., and Nenkova, A. 2009. Automatically evaluating
content selection in summarization without human models.
In Proceedings of EMNLP, 306–314.
Louis, A., and Nenkova, A. 2011. General versus specific
sentences: automatic identification and application to analysis of news summaries. Technical Report MS-CIS-11-07,
University of Pennsylvania.
Pitler, E.; Louis, A.; and Nenkova, A. 2009. Automatic
sense prediction for implicit discourse relations in text. In
Proceedings of ACL-IJCNLP, 683–691.
Pitler, E.; Louis, A.; and Nenkova, A. 2010. Automatic
evaluation of linguistic quality in multi-document summarization. In Proceedings of ACL, 544–554.
Swales, J. M., and Feak, C. 1994. Academic writing for
graduate students: A course for non-native speakers of English. Ann Arbor: University of Michigan Press.
1854