Document

advertisement
A Genre Analysis of Chinese and English
Abstracts of Academic Journal Articles:
A Parallel-Corpus-based Study
NIU Gui-ling
School of Foreign Languages, Zhengzhou University
mayerniu@163.com
Contents
I.
Brief Introduction of Chinese-English Parallel
Abstract Corpus (CEPAC)
 教育部人文社会科学研究规划基金项目“基于平行语料库的中
外学术期刊论文中英文摘要的语类分析研究”
II. A Genre Analysis of Chinese and English
Abstracts of Academic Journal Articles
Why RA Abstract?
 Introduced into medical research articles first during the
1960s (Swales, 2010) .
• Underresearched: In 2005 (Montesi & Urdiciain): barely 28
studies regarding research article (RA) abstracts then.
• Now: A sea of information, “an information explosion”,
with several million research papers being published each
year, and continual announcements of new journals being
launched, either online or in hard copy or both.
 RA abstracts “have become a tool of mastering and
managing the ever increasing information flow in the
scientific community” (Ventola 1994a: 333), and developed
into an increasingly important part-genre (Swales 2009).
objective
MOVEs and steps: differentiate from
research articles as an independent genre in
rhetorical MOVE structure.
To generalize and illustrate the generic
structural potential (GSP) of abstracts by
using the advantage of big texts of corpus
and analyzing the usage of rhetorical
MOVE in them, with genre analysis and
schematic structure as the theoretical basis.
What is An Abstract?
 Cleveland (1983:104): An abstract summarizes the
essential contents of a particular knowledge record and
is a true surrogate of the document.
 Graetz (1985): The abstract is a time-saving device that
can be used to find particular parts of the article without
reading it; … knowing the structure in advance will help
the reader to get into the article; …. An abstract should
be concise and precise, indicating to the potential reader
two things: (a) what was done, and (b) important results
obtained.
Functions
 Huckin (2001): 4 distinguishable functions: ⑴They
function as stand-alone mini-texts, giving readers a short
summary of a study’s topic, methodology and main
findings; ⑵They function as screening devices, helping
readers decide whether they wish to read the whole article
or not; ⑶They function as previews for readers intending
to read the whole article, giving them a road-map for their
reading; ⑷They provide indexing help for professional
abstract writers and editors;
 ⑸They provide reviewers with an immediate oversight
of the paper they have been asked to review.(Swales,
2010)
成功PPT四要素
Chinese-English Parallel
Abstract Corpus (CEPAC)
语料库构成
components
Overview of CEPAC
Corpus
3780 texts
673246 words/
characters
Corpus size
CC:1260
texts
CE:1260
texts
EE:1260
texts
CC:209889
/376972 w/c;
CE:221240w;
EE:242117w.
Corpus Structure
Three sub-corpora:
CC(Chinese)
Corpus
Management
& Text
Update
CE(English)
Corpus
EE(English)
Corpus
Data
Retrieval
Five main Disciplines
Under each of the three sub-corpora, five disciplinary categories are
respectively established, For each category under the three sub-corpora, 252
abstracts were chosen. The sub-disciplines chosen in the three sub-corpora
basically also conform to each other, ensuring the balance of texts.
1
H
Health Sciences
2 S Social Sciences and Humanities
3 P Physical Sciences and Engineering
4 L Life Science and Biomedical Sciences
5 Y Language Sciences and Literature
Inter-,Intra-lingual &
Interdisciplinary Comparison
CEPAC
中 国 期 刊 中文 摘 要
语料库
中国期刊英文摘要
语料库
国 际 期 刊 英文 摘 要
语料库
单语语内类
比
HCC
HCE
HEE
LCC
LCE
LEE
PCE
PEE
SCC
SCE
SEE
YCC
YCE
YEE
PCC
双语语际对
比
Annotation

Macrostructure and Meaning of MOVEs and Labels (cited from: Swales & Feak, 2009)
MOVE
Typical Labels
MOVE
1
Background/introduction/situati
on
MOVE
2
MOVE
3
MOVE
4
MOVE
5
Purpose /present research
Implied questions
1. What do we know about
the topic?
2. Why is the topic
important?
What is this study about?
Methods/materials/subjects/proc How was it done?
edures
Results/findings
What was discovered?
Conclusion/discussion/implicati What do the findings
ons/recommendations
mean?
Move Attributes & Values
语步层
语步属性
MOVE 1
Background
MOVE 2
Purpose
MOVE 3
MOVE 4
MOVE 5
Method
Result
Conclusion
含义
背景介绍
研究目的
方法和步骤
结果/发现
结论/讨论
步骤属性
含义
Step 1 Topic Generalization
Step 2 Previous Research
Step 3 Research Gap
Step 4 Research Question
Step 5 Continuing a Tradition
Step 6 Other
Step 1 Aim/Objective
Step 2 Focus
Step 3 Hypothesis /Research Question
Step 4 Other
Step 1. Empirical (Primary) Research
a. Qualitative 定性研究
b. Quantitative
定量研究
Step 2 Documentary (Library ) Research
Step 3 Procedure
a. Sampling 抽样(抽样方法和试验对象:subjects/ population)
b. Materials 材料
c. Tools 研究工具或软件
d. Data Collection
e. Statistics
f. Specific Steps
f. Other
Step 4 Other
Step 1 Data Presentation
Step 2 Data Interpretation / Analysis
a. Correlation 有无相关性
b. Significant difference 有无显著性差异
c. Other
Step 3 Brief Summarization
Step 4 Other
Step 1 Data Analysis
Step 2 Inference / Implication
中心议题概括
前期相关研究
前期研究空白
提出研究问题
继承和发展前期研究
其它
研究目的
研究重心或核心
提出假设
其它
Step 3 Question Answering
回答目的语步中的问题
Step 4 Hypothesis Substantiating
Step 5 Hypothesis-Deflating
Step 6 Value /Significance
Step 7 Future Work
Step 8 recommendations
Step 9 Limitations
Step 10 Other
证明假设
推翻假设
研究价值
未来要做的研究
提出建议
现研究的局限性
其它
实证性研究
文献性研究
具体研究步骤
其它
数据呈现
数据解读
简略概括
其它
数据分析(普遍意义)
推论/含义
TOOLS
Data Retrieval & Statistics
Annotation
POS Tagging
Text Processing,
Alignment
Wordsmith Tools,Antconc,
Self-disigned programms,
SPSS,etc
MMAX2
ICTCLAS & CLAWS
Editplus, Paraconc
Annotation Interface _
MMAX2
Multi-level Annotation
(Stand-alone)
POS Tagging
Move Level
Background
Purpose
Method
Results
Conclusion
Sentence
pairs
(Alignment)
Syntax
1-1
•1-2
•1-3
•2-1
•3-1
•2-2
•
Semantic
Translation
…
•
Chinese Word Segmentation
…
Data Analysis
• Figure 3 Normalized Frequency of Move Distribution (Every
10,000 running words)
Normalized Frequency of Moves
YCC
4500
SEE
YCE
4000
3500
SCE
YEE
3000
INTRODUCTION
2500
2000
PURPOSE
1500
SCC
HCC
1000
METHOD
500
0
RESULT
PEE
HCE
CONCLUSION
PCE
HEE
COMPOUND MOVES
OTHER
PCC
LCC
LEE
LCE
Completeness (Wholeness) of MOVEs
in Texts: Move Number Distribution
• Figure 2 MOVE Numbers and Percentage in Texts in
Three Sub-corpora(unit:text)
MOVE Distribution and Percentage in Three Sub-corpora
CC
CE
EE
One
Move
CC
CE
EE
70
68
30
5.56%
5.40%
2.38%
Total
Two
Three
Four
Five
Number
Moves
Moves
Moves
Moves
of
Texts
241 19.13% 368 29.21% 521 41.35%
60
4.76% 1260
252 20.00% 348 27.62% 508 40.32%
84
6.67% 1260
139 11.03% 349 27.70% 508 40.32% 234 18.57% 1260
Significant Difference
MOVEs
SCE
SEE
Chi-Square Test
Chi-Square
Introduction
Purpose
Method
154
75
61
133
203
149
Log-likelihood Ratio
Significance(P)
1.7341
0.188
66.2430
0.000***
40.2286
0.000***
Result
132
177
7.4693
0.006**
Conclusion
162
188
2.2429
0.134
YCE
YEE
MOVEs
Introduction
Purpose
Method
Result
Conclusion
156
106
47
66
189
122
183
165
172
182
Chi-Square
4.6739
23.1731
71.7122
52.1338
0.1549
Significance(P)
0.031*
0.000***
0.000***
0.000***
0.694
Log-likelihood
Sig.
1.54
0.215
+
61.22
0.000***
-
38.04
0.000***
6.58
0.010*
-
1.93
0.164
-
Log-likelihood Ratio
Log-likelihood
Sig.
4.17
0.041*
20.77
0.000***
69.58
0.000***
48.91
0.000***
0.13
0.716
+
+
asterisks (*) indicate significance level, and the “+” and “-” signs on the
right column indicate “overuse” and “underuse”.
[The
Proportionality(Balance) of MOVEs
in Texts:Text Coverage(文本覆盖率)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Independent Moves
Background
YCC
27.17%
YCE
28.97%
YEE
18.21%
HCC
4.78%
HCE
5.44%
HEE
20.00%
LCC
16.45%
LCE
17.52%
LEE
17.67%
PCC
10.47%
PCE
12.96%
PEE
21.82%
SCC
26.57%
SCE
28.44%
SEE
14.71%
Average 18.08%
Purpose
6.97%
10.41%
14.49%
11.39%
12.42%
10.30%
6.94%
10.41%
11.52%
7.77%
9.27%
14.57%
3.60%
5.83%
17.61%
10.23%
Method
2.31%
4.23%
20.26%
27.24%
27.71%
16.36%
11.84%
15.89%
12.50%
25.01%
30.69%
17.47%
5.24%
6.80%
20.09%
16.24%
Result
11.17%
11.83%
23.75%
37.21%
37.30%
36.84%
38.33%
38.64%
39.12%
23.11%
27.60%
25.29%
23.08%
24.65%
24.43%
28.16%
Compound Moves
Conclusion
34.39%
35.68%
19.48%
15.84%
16.49%
15.48%
13.16%
13.30%
15.21%
10.40%
11.88%
18.61%
22.10%
24.30%
17.77%
18.94%
Regular
10.01%
4.45%
2.87%
3.35%
0.65%
0.96%
8.76%
3.17%
2.93%
21.50%
6.54%
1.66%
13.69%
6.59%
3.41%
6.04%
Irregular
7.63%
4.03%
0.82%
0.19%
0.06%
4.53%
0.97%
0.92%
1.74%
1.05%
0.57%
5.67%
3.34%
1.99%
2.23%
Other
0.34%
0.40%
0.11%
0.10%
0.13%
0.05%
0.05%
0.08%
注:1. Text Coverage=该语步词数/语步所在摘要的总词数。2. 独立语步:一个句子中仅包含一种语步内容。
3. 复合语步:一个句子中包含两种或两种语步以上的内容。正序复合语步指一个句子中的复合语步按“背景-目的-方法-结果-结论”的先后顺序正常排列;
而逆序复合语步则指一个句子中的复合语步并非按上述顺序排象,譬如,方法语步置于目的语步之前,或结论语步置于结果语步之前。
Text Coverage (Bar-Chart)
Text Coverage
45.00%
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
Y
H
L
P
S
Average
SEE
SCE
SCC
PEE
PCE
PCC
LEE
LCE
LCC
HEE
HCE
HCC
YEE
YCE
YCC
背景语步INTRODUCTION
目的语步PURPOSE
方法语步METHOD
结果语步 RESULT
结论语步CONCLUSION
Data Query
Data Query_KWIC
Future Work
Data
Retrieval &
Application
Short-term
Online Retrieval
Genre Analysis
Translation
Hedging
Grammar
Online Retrieval
Pragmatics
Semantics
Discourse Analysis
World Englishes
etc.
Download