PURPOSE - Corpuslg

advertisement
ELC
2010
Lexico-Grammatical Patterns
in English Scientific Abstracts: presenting
the research’s purposes and results
Carmen Dayrell
Arnaldo Candido Jr.
Stella Tagnin
Sandra Aluísio
DLM
ICMC / NILC
Context
English for Academic Purposes
 Academic communication poses real challenges
for novice researchers (Hyland 2009:ix)
 Demands are heavier for non-native speakers
of English (Hyland 2009:5, Milton and Hyland 1999,
Vold 2006)
Difficulties relate to:
 lexical and syntactical features of the target genre
 rhetorical motivations behind linguistic choices
 Disciplinary variation
 Cultural differences across languages
Context
Local Context
 Courses on English academic writing
 Writing tools for non-native speakers
of English
Assist graduate students to write
scientific papers in English
Context
Courses on English Academic Writing
2004
to
2010
 USP



Department of Physics (IFSC)
Department of Pharmaceutical Sciences (FCF)
Department of Computer Science (ICMC)
 UNESP

IBILCE
Dentistry and Biology/Genetics
 UFSCar

Department of Biology/Genetics
Context
Writing tools: Scipo-Farmácia
(http://www.nilc.icmc.usp.br/scipo-farmacia/)
Abstracts
Background
Gap
Purpose
Methodology
Results
Conclusion
Context
Writing tools: Scipo-Farmácia
(http://www.nilc.icmc.usp.br/scipo-farmacia/)
Examples from
published abstracts
Context
Why Abstracts?
Relevant in various academic contexts
However … (Swales & Feak 2009: xiii)
Constructing an efficient, clear abstract is
a fairly difficult task, even for experienced
and widely published writers
In Brazil:
Abstracts are part of most research
papers written in Portuguese as well as
PhD and master’s dissertations
Purpose
General Objective
Investigate the potential differences
between English abstracts written by
Brazilian graduate students vis-à-vis
abstracts taken from published
papers from the same disciplines
Purpose
Aim of this study
To investigate the recurring
lexico-grammatical patterns
used for presenting either the
purposes or results of the research
Rhetorical ‘moves’ in abstracts
Purpose
Swales and Feak (2009: 5)
Background / Introduction
 Purpose
Methods / Materials / Subjects/ Procedures
 Results / Findings
Discussion / Conclusion / Implications /
Recommendations
Lexico-grammatical patterns
Purpose
The AIM of
this
STUDY
the
the present
aim
purpose
objective
goal
aims
objectives
purposes
study
work
investigation
article
research
project
paper
Corpora
Student Abstracts
Physical Sciences
and Engineering
ST-EXA
Abstracts:
Tokens:
Average Number
Words (ANW):
Life and Health
Sciences
ST-BIO
169
138
34.151
27.911
202
202
Corpora
Student Abstracts
Physical Sciences
and Engineering
Disciplines
Physics
Computing
Earth
Sciences
Engineering
# texts
85
46
20
18
169
Life and Health
Sciences
Disciplines
# texts
Dentistry
47
Pharmaceutical
Scs.
39
Biology
21
ST-BIO
Biophysics
21
Bioengineering
5
Biomedical Scs.
5
138
Corpora
English Abstracts
Physical Sciences
and Engineering
Life and Health
Sciences
Disciplines
ST
PB
Disciplines
ST
PB
Physics
85
425
Dentistry
47
235
Pharmaceutical
Scs.
39
195
Biology
21
105
21
105
Bioengineering
5
25
Biomedical Scs.
5
25
138
690
Computing
Earth
Sciences
Engineering
46
20
18
169
230
100
90
845
ST-BIO
Biophysics
Corpora
Published Abstracts
Physical Sciences
and Engineering
PB-EXA
Abstracts 
Tokens 
Average Number
Words (ANW) 
Life and Health
Sciences
PB-BIO
845
690
139.591
159.940
165
231
Corpora
Published Abstracts
 Taken from papers published by
various leading academic journals
(CAPES - QUALIS A)
 Preference given to authors
affiliated to universities in English
speaking countries
Methods
Methodology
1. Identification of rhetorical moves
2. Identification and comparison of
lexico-grammatical patterns in
‘purposes’ and ‘results’
Methods
a)
1. Identifying Rhetorical Moves
Automatic tagging
AZEA
(Argumentative Zoning for English Abstracts)
(Genovês et al. 2007)
•
•
•
a corpus-based machine
learning system

PURPOSE: to automatically
identify components of the
schematic structure of scientific
abstracts in English
Background
Gap
Purpose
Methodology
Result
Conclusion
AZEA achieved 80.4% accuracy (kappa 0.73)
using a very small training corpus
Methods
AZEA’s features
Basic Features
1. Sentence Length
2. Position within the abstract
3-5. Verb Tense, Voice and Modal
6. Previous Component
7-8. Formulaic patterns
14 additional features to distinguish
between Results and Methods and
improve accuracy
Methods
Azea-Web
http://www.nilc.icmc.usp.br/azea-web/
Methods
Azea-Web
http://www.nilc.icmc.usp.br/azea-web/
Methods
1a. AZEA tagging
<purpose> We propose a Local-Density approximation to
calculate the entanglement entropy of the inhomogeneous
one-dimensional Hubbard model. </purpose>
<background> Such inhomogeneity can be due to the finite
size, the presence of impurities, or the periodic variation of
the interaction and the external potential, as in
superlattices. </background>
<purpose> We show that, to inhomogeneities due to finite
size, our approximation reproduces the know
thermodynamic limit and also the limit of the entanglement
entropy in n=1, obtained by Cardy and Calabrese.
</purpose>
Methods
1b. Manual Validation
<purpose> We propose a Local-Density approximation to
calculate the entanglement entropy of the inhomogeneous
one-dimensional Hubbard model. </purpose>
<background> Such inhomogeneity can be due to the finite
size, the presence of impurities, or the periodic variation of
the interaction and the external potential, as in
superlattices. </background>
<result> We show that, to inhomogeneities due to finite
size, our approximation reproduces the know
thermodynamic limit and also the limit of the entanglement
entropy in n=1, obtained by Cardy and Calabrese.
</result>
Methods
Manual Tagging:
Correcting sentence break
<purpose> We find aRb/aNa=1.959(5),
</purpose> <background> aK/aNa
=1.786(6), </background> <purpose>
and aRb/aK=1.097(5). </purpose>
<result> We find aRb/aNa=1.959(5),
aK/aNa =1.786(6), and aRb/aK=1.097
(5). </result>
Methods
Manual Tagging: multi-labels
<purpose> Using whole-cell rapid-agonist
application techniques and the cell-attached
single-channel recording configuration, we
examined human 5-HT3A(QDA) receptors
expressed in human embryonic kidney 293
cells . </purpose>
<method> Using whole-cell rapid-agonist
application techniques and the cell-attached
single-channel recording configuration,
</method> <purpose> we examined human
5-HT3A(QDA) receptors expressed in human
embryonic kidney 293 cells . </purpose>
Methods
Lexico-grammatical patterns
1. Semi-automatic identification of patterns:
Wordsmith Tools 5 (Scott 2007)
•
Starting point: Most frequent items and cluster in
each corpus
•
Analysis of the surrounding context
• Patterns should occur at least once per 10,000
words in either corpus
2. Comparison of frequencies
test of significance
Statistical
Results
Overall …
Significant differences:
• Between student and published
abstracts
• Across the two broad areas
PURPOSE:
Life and Health Sciences (BIO)
Results
The AIM of
this
STUDY
the present
the
STS
Frequency per
10,000 words
20,0
15,0
PUB
aim
objective
purpose
aims
objectives
our
study
work
review
paper
10,0
5,0
0,0
aim
purpose
objective
goal
aims
Objectives
purposes
intent
study
work
Investigation
Article
Project
Research
Clinical trial
paper
PURPOSE:
Life and Health Sciences (BIO)
Results
(In this STUDY), we VERB (the/a)
Frequency per
10,000 words
STS
20,0
15,0
10,0
5,0
0,0
PUB
REPORT
DESCRIBE
INVESTIGATE
SHOW
ANALYSE
EVALUATE
DETERMINE
…
INVESTIGATE
EXAMINE
REPORT
PROPOSE
TEST
HYPOTHESIZE
DESCRIBE
PRESENT
SEEK TO
ANALYSE
EVALUATE
DEMONSTRATE
…
Results
PURPOSE:
Physical Sciences and Engineering (EXA)
1. The AIM of
this STUDY
2. This STUDY VERB
3. (In this STUDY), we VERB (the/a)
Frequency per
10,000 words
STS
PUB
30,0
20,0
10,0
0,0
1
2
3
RESULTS:
Results
1. Results VERB (that/the)
e.g. The results show that
2. we VERB (that/the)
e.g. we found that
Frequency per
10,000 words
ST-BIO
ST-EXA
PB-BIO
PB-EXA
30,0
20,0
10,0
0,0
1
2
Contribu
tions
Main Contributions
1. Pedagogic applications
a) Syllabus
b) Teaching material
2. Development of writing tools
Contribu
tions
Pedagogic applications
Overuse and underuse
Patterns

Results VERB (that/the)

BE PARTICIPLE to VERB
(e.g. was found to be)
Items within patterns
It BE observed that
X It BE shown/found that
Contribu
tions
Writing Tools: AZEA
Manual
validation
AZEA++
New features to be considered:
• Lexico-grammatical patterns
• Multi-labels
• Disciplinary variations
Future
Work
Writing Tools
Physical Sciences
and Engineering
Life and Health
Sciences
ELC
2010
Thank you!
Download