Readability Assessment for Text Simplification Sandra Aluisio1, Lucia Specia2, Caroline Gasperin1, Carolina Scarton1 1University of São Paulo, Brazil 2University of Wolverhampton, UK The 5th Workshop on Innovative Use of NLP for Building Educational Applications Motivation • Develop technology to benefit low literacy readers INAF levels Illiterate Rudimentary Basic Advanced 47 34 27 26 36 37 38 38 26 25 26 25 26 28 25 12 13 12 11 9 7 2007 2009 2001-2002 2002-2003 2003-2004 2004-2005 25 21 68 % ˗ Rudimentary: studied up to 4 years; can find explicit information in short and familiar texts ˗ Basic: studied between 4 and 8 years; can read and understand texts of average length, and find information even when it is necessary to make some inference The 5th Workshop on Innovative Use of NLP for Building Educational Applications 2 Readability Assessment • To assess the readability level of a text – Three levels of readability: INAF levels Rudimentary – Basic – Advanced • To supplement our text simplification technology – Two levels of simplification: degree of application of simplification operations • STRONG: operations are applied to all complex syntactic phenomena present RUDIMENTARY • NATURAL: operations are applied selectively, only when the resulting text remains “natural” BASIC The 5th Workshop on Innovative Use of NLP for Building Educational Applications 3 Text Simplification Scenario • Authoring tool for creating simplified texts SIMPLIFICA 1. Author inputs text 2. Author receives suggestions of possible simplifications: may accept or not • • Lexical substitutions Syntactic simplification 3. Author does not know if the text is simple enough for his audience • Feedback: Readability assessment The 5th Workshop on Innovative Use of NLP for Building Educational Applications 4 Readability Assessment System • Machine learning – Classes = 3 INAF levels – Trained on corpus of manually simplified texts • Original text + natural and strong simplifications – Extensive set of features • Cognitively-motivated: Coh-Metrix [Graesser et al., 2004] • Syntactic: occurrence of complex phenomena • Language model: up to trigrams – 3 paradigms: Classification, Ordinal Classification, Regression [Heilman et al., 2007] The 5th Workshop on Innovative Use of NLP for Building Educational Applications 5 Corpora • Training and testing corp – General news: Zero Hora (ZH) newspaper – Popular science news: Caderno Ciencia (CC) – 3 versions for each text: original, natural, strong Corpus Documents Sentences ZH original ZH natural ZH strong CC original CC natural CC strong CH 104 104 104 50 50 50 130 2184 3234 3668 882 975 1454 3624 Words Avg. words p. text (std. deviation) Avg. words p. sentence 46190 47296 47938 20263 19603 20518 95866 444.1 (133.7) 454.7 (134.2) 460.9 (137.5) 405.2 (175.6) 392.0 (176.0) 410.3 (169.6) 737.4 (226.1) 21.1 14.6 13.0 22.9 20.1 14.1 26.4 The 5th Workshop on Innovative Use of NLP for Building Educational Applications 6 Features 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of words Number of sentences Number of paragraphs Number of verbs Number of nouns Number of adjectives Number of adverbs Number of pronouns Average number of words per sentence Average number of sentences per paragraph Average number of syllables per word Flesch index for Portuguese Incidence of content words Incidence of functional words Raw Frequency of content words Minimal frequency of content words Average number of verb hypernyms Incidence of NPs Number of NP modifiers Number of words before the main verb 21 22 23 24 25 26 27 28 29 30 Number of high level constituents Number of personal pronouns Type-token ratio Pronoun-NP ratio Number of “e” (and) Number of “ou” (or) Number of “se” (if) Number of negations Number of logic operators Number of connectives 41 42 43 44 45 46 47 48 49 50 Adverb ambiguity ratio Adjective ambiguity ratio Incidence of clauses Incidence of adverbial phrases Incidence of apposition Incidence of passive voice Incidence of relative clauses Incidence of coordination Incidence of subordination Out-of-vocabulary words 31 32 33 34 35 36 37 38 39 40 Number of positive additive connectives Number of negative additive connectives Number of positive temporal connectives Number of negative temporal connectives Number of positive causal connectives Number of negative causal connectives Number of positive logic connectives Number of negative logic connectives Verb ambiguity ratio Noun ambiguity ratio 51 52 53 54 55 56 57 58 59 LM probability of unigrams LM perplexity of unigrams LM perplexity of unigrams, no line break LM probability of bigrams LM perplexity of bigrams LM perplexity of bigrams, no line break LM probability of trigrams LM perplexity of trigrams LM perplexity of trigrams, no line break The 5th Workshop on Innovative Use of NLP for Building Educational Applications 7 Feature Analysis • Pearson correlation between features and literacy levels 1 2 3 4 5 6 7 8 9 10 Feature Words per sentence Incidence of apposition Incidence of clauses Flesch index Words before main verb Sentences per paragraph Incidence of relative clauses Syllables per word Number of positive additive connectives Number of negative causal connectives Correlation 0.693 0.688 0.614 0.580 0.516 0.509 0.417 0.414 0.397 0.388 The 5th Workshop on Innovative Use of NLP for Building Educational Applications 8 Predicting readability Levels Features All Language Model Basic Syntactic CohMetrixPORT Flesch Class original natural strong original natural strong original natural strong original natural strong original natural strong original natural strong F Corr. MAE 0.913 0.84 0.276 0.483 0.732 0.669 0.25 0.381 0.025 0.221 0.846 0.76 0.302 0.149 0.707 0.891 0.82 0.285 0.32 0.74 0.873 0.79 0.290 0.381 0.712 0.751 0.52 0.348 0.152 0.546 Classification Weka SVM Features All Language Model Basic Syntactic CohMetrixPORT Flesch Class original natural strong original natural strong original natural strong original natural strong original natural strong original natural strong F Corr. MAE 0.904 0.83 0.163 0.484 0.731 0.634 0.49 0.344 0.497 0.05 0.83 0.73 0.231 0.334 0.637 0.891 0.81 0.180 0.382 0.714 0.878 0.8 0.183 0.432 0.709 0.746 0.56 0.310 0.489 0 Ordinal Classification Weka Pairwise SVM The 5th Workshop on Innovative Use of NLP for Building Educational Applications 9 Predicting readability Levels Features All Language Model Basic Syntactic Coh-Metrix-PORT Flesch Corr. 0.8502 0.6245 0.7266 0.8063 0.8051 0.5772 MAE 0.3478 0.5448 0.4538 0.3878 0.3895 0.5492 Regression Weka SVM-reg, RBF Kernel • Best correlation: Regression • Lowest MAE: Ordinal Classification • Combination of all features consistently yields better results: more robust • Syntactic features achieve the best correlation scores • Language model features performed the poorest The 5th Workshop on Innovative Use of NLP for Building Educational Applications 10 Conclusions • It is possible to predict with satisfactory performance the readability level of texts according to our three classes of interest • Ordinal Classification seems to be the most appropriate model to use – High correlation, lowest error rate (MAE) • Combination of all features is best The 5th Workshop on Innovative Use of NLP for Building Educational Applications 11 SIMPLIFICA Tool • Integration of classification model – Simplest model, highest F-measure, comparable correlation scores The 5th Workshop on Innovative Use of NLP for Building Educational Applications 12 Future Work • Add deeper cognitive features, e.g. semantic, coreference, latent semantics metrics • User evaluation: authors The 5th Workshop on Innovative Use of NLP for Building Educational Applications 13 Thanks! • project http://caravelas.icmc.usp.br/ The 5th Workshop on Innovative Use of NLP for Building Educational Applications 14