Readability Assessment for Text Simplification

advertisement
Readability Assessment for Text
Simplification
Sandra Aluisio1, Lucia Specia2,
Caroline Gasperin1, Carolina Scarton1
1University
of São Paulo, Brazil
2University of Wolverhampton, UK
The 5th Workshop on Innovative Use of
NLP for Building Educational Applications
Motivation
• Develop technology to benefit low literacy readers
INAF levels
Illiterate
Rudimentary
Basic
Advanced
47
34
27
26
36
37
38
38
26
25
26
25
26
28
25
12
13
12
11
9
7
2007
2009
2001-2002 2002-2003 2003-2004 2004-2005
25
21
68
%
˗ Rudimentary: studied up to 4 years; can find explicit information in short and familiar texts
˗ Basic: studied between 4 and 8 years; can read and understand texts of average length, and
find information even when it is necessary to make some inference
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
2
Readability Assessment
• To assess the readability level of a text
– Three levels of readability: INAF levels
Rudimentary – Basic – Advanced
• To supplement our text simplification technology
– Two levels of simplification: degree of application of
simplification operations
• STRONG: operations are applied to all complex syntactic
phenomena present  RUDIMENTARY
• NATURAL: operations are applied selectively, only when the
resulting text remains “natural”  BASIC
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
3
Text Simplification Scenario
• Authoring tool for creating simplified texts
SIMPLIFICA
1. Author inputs text
2. Author receives suggestions of possible
simplifications: may accept or not
•
•
Lexical substitutions
Syntactic simplification
3. Author does not know if the text is simple
enough for his audience
• Feedback: Readability assessment
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
4
Readability Assessment System
• Machine learning
– Classes = 3 INAF levels
– Trained on corpus of manually simplified texts
• Original text + natural and strong simplifications
– Extensive set of features
• Cognitively-motivated: Coh-Metrix [Graesser et al., 2004]
• Syntactic: occurrence of complex phenomena
• Language model: up to trigrams
– 3 paradigms: Classification, Ordinal Classification,
Regression [Heilman et al., 2007]
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
5
Corpora
• Training and testing corp
– General news: Zero Hora (ZH) newspaper
– Popular science news: Caderno Ciencia (CC)
– 3 versions for each text: original, natural, strong
Corpus
Documents Sentences
ZH original
ZH natural
ZH strong
CC original
CC natural
CC strong
CH
104
104
104
50
50
50
130
2184
3234
3668
882
975
1454
3624
Words
Avg. words p. text
(std. deviation)
Avg. words p.
sentence
46190
47296
47938
20263
19603
20518
95866
444.1 (133.7)
454.7 (134.2)
460.9 (137.5)
405.2 (175.6)
392.0 (176.0)
410.3 (169.6)
737.4 (226.1)
21.1
14.6
13.0
22.9
20.1
14.1
26.4
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
6
Features
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Number of words
Number of sentences
Number of paragraphs
Number of verbs
Number of nouns
Number of adjectives
Number of adverbs
Number of pronouns
Average number of words per sentence
Average number of sentences per
paragraph
Average number of syllables per word
Flesch index for Portuguese
Incidence of content words
Incidence of functional words
Raw Frequency of content words
Minimal frequency of content words
Average number of verb hypernyms
Incidence of NPs
Number of NP modifiers
Number of words before the main verb
21
22
23
24
25
26
27
28
29
30
Number of high level constituents
Number of personal pronouns
Type-token ratio
Pronoun-NP ratio
Number of “e” (and)
Number of “ou” (or)
Number of “se” (if)
Number of negations
Number of logic operators
Number of connectives
41
42
43
44
45
46
47
48
49
50
Adverb ambiguity ratio
Adjective ambiguity ratio
Incidence of clauses
Incidence of adverbial phrases
Incidence of apposition
Incidence of passive voice
Incidence of relative clauses
Incidence of coordination
Incidence of subordination
Out-of-vocabulary words
31
32
33
34
35
36
37
38
39
40
Number of positive additive connectives
Number of negative additive connectives
Number of positive temporal connectives
Number of negative temporal connectives
Number of positive causal connectives
Number of negative causal connectives
Number of positive logic connectives
Number of negative logic connectives
Verb ambiguity ratio
Noun ambiguity ratio
51
52
53
54
55
56
57
58
59
LM probability of unigrams
LM perplexity of unigrams
LM perplexity of unigrams, no line break
LM probability of bigrams
LM perplexity of bigrams
LM perplexity of bigrams, no line break
LM probability of trigrams
LM perplexity of trigrams
LM perplexity of trigrams, no line break
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
7
Feature Analysis
• Pearson correlation between features and
literacy levels
1
2
3
4
5
6
7
8
9
10
Feature
Words per sentence
Incidence of apposition
Incidence of clauses
Flesch index
Words before main verb
Sentences per paragraph
Incidence of relative clauses
Syllables per word
Number of positive additive connectives
Number of negative causal connectives
Correlation
0.693
0.688
0.614
0.580
0.516
0.509
0.417
0.414
0.397
0.388
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
8
Predicting readability Levels
Features
All
Language
Model
Basic
Syntactic
CohMetrixPORT
Flesch
Class
original
natural
strong
original
natural
strong
original
natural
strong
original
natural
strong
original
natural
strong
original
natural
strong
F
Corr. MAE
0.913 0.84 0.276
0.483
0.732
0.669 0.25 0.381
0.025
0.221
0.846 0.76 0.302
0.149
0.707
0.891 0.82 0.285
0.32
0.74
0.873 0.79 0.290
0.381
0.712
0.751 0.52 0.348
0.152
0.546
Classification
Weka SVM
Features
All
Language
Model
Basic
Syntactic
CohMetrixPORT
Flesch
Class
original
natural
strong
original
natural
strong
original
natural
strong
original
natural
strong
original
natural
strong
original
natural
strong
F
Corr. MAE
0.904 0.83 0.163
0.484
0.731
0.634 0.49 0.344
0.497
0.05
0.83 0.73 0.231
0.334
0.637
0.891 0.81 0.180
0.382
0.714
0.878 0.8 0.183
0.432
0.709
0.746 0.56 0.310
0.489
0
Ordinal Classification
Weka Pairwise SVM
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
9
Predicting readability Levels
Features
All
Language Model
Basic
Syntactic
Coh-Metrix-PORT
Flesch
Corr.
0.8502
0.6245
0.7266
0.8063
0.8051
0.5772
MAE
0.3478
0.5448
0.4538
0.3878
0.3895
0.5492
Regression
Weka SVM-reg, RBF Kernel
• Best correlation: Regression
• Lowest MAE: Ordinal Classification
• Combination of all features consistently yields better
results: more robust
• Syntactic features achieve the best correlation scores
• Language model features performed the poorest
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
10
Conclusions
• It is possible to predict with satisfactory
performance the readability level of texts
according to our three classes of interest
• Ordinal Classification seems to be the most
appropriate model to use
– High correlation, lowest error rate (MAE)
• Combination of all features is best
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
11
SIMPLIFICA Tool
• Integration of classification model
– Simplest model, highest F-measure, comparable
correlation scores
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
12
Future Work
• Add deeper cognitive features, e.g. semantic,
coreference, latent semantics metrics
• User evaluation: authors
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
13
Thanks!
•
project
http://caravelas.icmc.usp.br/
The 5th Workshop on Innovative Use of NLP for Building Educational Applications
14
Download