Computational Models of Text Quality Ani Nenkova University of Pennsylvania ESSLLI 2010, Copenhagen 1 The ultimate text quality application Imagine your favorite text editor With spell-checker and grammar checker But also functions that tell you ``Word W is repeated too many times” ``Fill the gap is a cliché” ``You might consider using this more figurative expression” ``This sentence is unclear and hard to read’’ ``What is the connection between these two sentences?” …….. 2 Currently It is our friends who give such feedback Often conflicting We might agree that a text is good, but find it hard to explain exactly why Computational linguistics should have some answers Though far from offering a complete solution yet 3 In this course We will overview research dealing with various aspects of text quality A unified approach does not yet exist, but many proposals have been tested on corpus data integrated in applications 4 Current applications: education Grading student writing Is this a good essay? One of the graders of SAT and GRE essays is in fact a machine! [1] http://www.ets.org/research/capabilities/automated_scoring Providing appropriate reading material Is this text good for a particular user? Appropriate grade level Appropriate language competency in L2 [2,3] http://reap.cs.cmu.edu/ 5 Current applications: information retrieval Particularly user generated content Questions and answers on the web Blogs and comments Searching over such content poses new problems [4] What is a good question/answer/comment? http://answers.yahoo.com/ Relevant for general IR as well Of the many relevant document some, are better written 6 Current applications: NLP Models of text quality lead to improved systems [5] offer possibilities for automatic evaluation [6] Automatic summarization Select important content and organize it in as wellwritten text Language generation Select, organize and present content on document, paragraph, sentence and phrase level Machine translation 7 Text quality factors Interesting Style (clichés, figurative language) Vocabulary use Grammatical and fluent sentences Coherent and easy to understand In most types of writing, well-written means clear and easy to understand. Not necessarily so in literary works. Problems with clarity of instructions motivated a fair amount of early work. 8 Early work: keep in mind these predate modern computers! Common words are easier to understand stentorian vs. loud myocardial infarction vs. heart attack Common words are short o Standard readability metrics o percentage of words not among the N most frequent o average numbers of syllables per word Syntactically simple sentences are easier to understand o average number of words per sentence [Flesch-Kincaid, Automated Readability Index, Gunning-Fog, SMOG, Coleman-Liau] 9 Modern equivalents Language models Word probabilities from a large collection http://www.speech.cs.cmu.edu/SLM_info.html Features derived from syntactic parse [2,7,8,9] Parse tree height Number of subordinating conjunctions Number of passive voice constructions Number of noun and verb phrases 10 Language models Unigram and bigram language models Really, just huge tables Smoothing necessary to account for unseen words nw p(w) N p(w1 | w 2 ) n w2 w1 n w2 11 Features from language models Assessing the readability of text t consisting of m words, for intended audience class c Number of out of vocabulary words in the text with respect to the language model for c Text likelihood and perplexity L(t) P(c)P(w1 | c)....P(w m | c) PP 2 H (t|c ) 1 H(t | c) log 2 P(t | c) m 12 Application to grade level prediction Collins-Thompson and Callan, NAACL 2004 [10] 13 Application to grade level prediction Collins-Thompson and Callan, NAACL 2004 [10] 14 Results on predicting grade level Schwarm and Ostendorf, ACL 2005 [11] Flesch-Kincaid Grade Level index number of syllables per word sentence length Lexile word frequency sentence length SVM features language models and syntax 15 Models of text coherence Global coherence Overall document organization Local coherence Adjacent sentences 16 Text structure can be learnt in an unsupervised manner Location, time Human-written examples from a domain damage magnitude relief efforts 17 Content model Barzilay & Lee’04 [12] Hidden Markov Model (HMM)-based States - clusters of related sentences “topics” Generating Transition prob. - sentence precedence in corpus sentence in Emission prob. - bigram language model current topic p ( s , h | s , h ) p ( h | h ) p ( s | h ) i 1 i 1 i i t i 1 i e i 1 i 1 Earthquake reports Transition from previous topic location, magnitude relief efforts casualties 18 Generating Wikipedia articles Sauper and Barzilay, 2009 [12] Articles on diseases and American film actors Create templates of subtopics Focus only on subtopic level structure ◦ Use paragraphs from documents on the web 19 Template creation Cluster similar headings signs and symptoms, symptoms, early symptoms… Choose k clusters average number of subtopics in that domain Find majority ordering for the clusters Biography Diseases Early life Symptoms Career Causes Personal life Diagnosis Death Treatment 20 Extraction of excerpts and ranking Candidates for a subtopic Paragraphs from top 10 pages of search results Measure relevance of candidates for that subtopic Features ~ unigrams, bigrams, number of sentences… 21 Need to control redundancy across subtopics Integer Linear Program Variables One per excerpt (value 1-chosen or 0) 1 2 3 4 5 causes symptoms diagnosis Objective Minimize sum of the ranks of the excerpts chosen treatment Constraints ◦ Cosine similarity between any selected pair <= 0.5 ◦ One excerpt per subtopic 22 Linguistic models of coherence [Halliday and Hasan, 1976] [13] Coherent text is characterized by the presence of various types of cohesive links that facilitate text comprehension Reference and lexical reiteration Pronouns, definite descriptions, semantically related words Discourse relations (conjunction) I closed the window because it started raining. Substitution (one) or ellipses (do) 23 Referential coherence Centering theory tracking focus of attention across adjacent sentences [14, 15, 16, 17] Syntactic form of references Particularly first and subsequent mention [18, 19], pronominalization Lexical chains Identifying and tracking topics within a text [20, 21, 22, 23] 24 Discourse relations Explicit vs. implicit I stayed home because I had a headache o Signaled by a discourse connective o Inferred without the presence of a connective I took my umbrella. [Because] The forecast was for rain in the afternoon. 25 Lexical chains Often discussed as cohesion indicator, implemented systems, but not used in text quality tasks Find all words that refer to the same topic Find the correct sense of the words LexChainer Tool: http://www1.cs.columbia.edu/nlp/tools.cgi [23] Applications: summarization, IR, spell checking, hypertext construction John bought a Jaguar. He loves the car. LC = {jaguar, car, engine, it} 26 Centering theory ingredients (Grosz et al, 1995) Deals with local coherence What happens to the flow from sentence to sentence Does not deal with global structuring of the text (paragraphs/segments) Defines coherence as an estimate of the processing load required to “understand” the text 27 Processing load Upon hearing a sentence a person Cognitive effort to interpret the expressions in the utterance Integrates the meaning of the utterance with that of the previous sentence Creates some expectations on what might come next 28 Example (1) John met his friend Mary today. (2) He was surprised to see her. (3) He thought she is still in Italy. Form of referring expressions Anaphora needs to be resolved “Create” a discourse entity at first mention with full noun phrase Creating expectations 29 Creating and meeting expectations (1) a. John went to his favorite music store to buy a piano. b. He had frequented the store for many years. c. He was excited that he could finally buy a piano. d. He arrived just as the store was closing for the day. (2) a. John went to his favorite music store to buy a piano. b. It was a store John had frequented for many years. c. He was excited that he could finally buy a piano. d. It was closing just as John arrived. 30 Interpreting pronouns Terry really goofs sometimes. b. Yesterday was a beautiful day and he was excited about trying out his new sailboat. c. He wanted Tony to join him on a sailing expedition. d. He called him at 6am. e. He was sick and furious at being woken up so early. a. 31 Basic centering definitions Centers of an utterance Set of entities serving to link that utterance to the other utterances in the discourse segment that contains it Not words or phrases themselves Semantic interpretations of noun phraes 32 Types of centers Forward looking centers An ordered set of entities What could we expect to hear about next Ordered by salience as determined by grammatical function Subject > Indirect object > Object > Others John gave the textbook to Mary. Cf = {John, Mary, textbook} Preferred center Cp The highest ranked forward looking center High expectation that the next utterance in the segment will be about Cp 33 Backward looking center Single backward looking center, Cb (U) For each utterance other than the segmentinitial one The backward looking center of utterance Un+1 connects with one of the forward looking centers of Un Cb (U+1) is the most highly ranked element from Cf (Un) that is also realized in U+1 34 Centering transitions ordering Cb(Un+1)=Cb(Un) Cb(Un+1) != OR Cb(Un) Cb(Un)=[?] Cb(Un+1) = Cp(Un+1) continue Cb(Un+1) != Cp(Un+1) retain 35 smooth-shift rough-shift Centering constraints There is precisely one backward-looking center Cb(Un) Cb(Un+1) is the highest-ranked element of Cf(Un) that is realized in Un+1 36 Centering rules If some element of Cf(Un) is realized as a pronoun in Un+1 then so is Cb(Un+1) Transitions not equal continue > retain > smooth-shift > rough-shift 37 Centering analysis Terry really goofs sometimes. Cf={Terry}, Cb=?, undef Yesterday was a beautiful day and he was excited about trying out his new sailboat. Cf={Terry,sailboat}, Cb=Terry, continue He wanted Tony to join him in a sailing expedition. Cf={Terry, Tony, expedition}, Cb=Terry, continue He called him at 6am. Cf={Terry,Tony}, Cb=Terry, continue 38 He called him at 6am. Cf={Terry,Tony}, Cb=Terry, continue Tony was sick and furious at being woken up so early. Cf={Tony}, Cb=Tony, smooth shift He told Terry to get lost and hung up. Cf={Tony,Terry}, Cb=Tony, continue Of course, Terry hadn’t intended to upset Tony. Cf={Terry,Tony}, Cb = Tony, retain 39 Rough shifts in evaluation of writing skills (Miltsakaki and Kukich, 2002) Automatic grading of essays by E-rater Syntactic variety Represented by features that quantify the occurrence of clause types Clear transitions Cue phrases in certain syntactic constructions Existence of main and supporting points Appropriateness of the vocabulary content of the essay What about local coherence? 40 Essay score model Human score available E-rater prediction available Percentage of rough-shifts in each essay: analysis done manually Negative correlation between the human score and the percentage of rough-shifts 41 Linear multi-factor regression Approximate the human score as a linear function of the e-rater prediction and the percentage of rough-shifts Adding rough shifts significantly improves the model of the score 0.5 improvement on 1—6 scale How easy/difficult would it be to fully automate the rough-shift variable? 42 Variants of centering and application to information ordering Karamanis et al, 09 is the most comprehensive overview of variants of centering theory and an evaluation of centering in a specific task related to text quality 43 Information ordering task Given a set of sentences/clauses, what is the best presentation? Take a newspaper article and jumble the sentences---the result will be much more difficult to read than the original Negative examples constructed by randomly permuting the original Criteria for deciding which of two orderings is better Centering would definitely be applicable 44 Centering variations Continuity (NOCB=lack of continuity) Cf(Un) and Cf(Un+1) share at least one element Coherence Cb(Un) = Cb(Un+1) Salience Cb(U) = Cp(U) Cheapness (fulfilled expectations) Cb (Un+1) = Cp(Un) 45 Metrics of coherence M.NOCB (no continuity) M.CHEAP (expectations not met) M.KP sum of the violations of continuity, cheapness, coherence and salience M. BFP seeks to maximize transitions according to Rule 2 46 Experimental methodology Gold-standard ordering The original order of the text (object description, news article) Assume that other orderings are inferior Classification error rate Percentage orderings that score better than the gold-standard + 0.5*percentage of the orderings that score the same 47 Results NOCB gives best results Significantly better than the other metrics Consistent results for three different corpora Museum artifact descriptions (2) News Airplane accidents M.BFP is the second best metric 48 49 Entity grid (Barzilay and Lapata, 2005, 2008) Inspired by centering Tracks entities across adjacent sentences, as well as their syntactic positions Much easier to compute from raw text Brown Coherence Toolkit http://www.cs.brown.edu/~melsner/manual.html 50 Entity grid: applications Several applications , with very good results Information ordering Comparing the coherence of pairs of summaries Distinguishing readability levels Child vs. adult Improves over Petersen&Ostendorf 51 Entity grid example 1 [The Justice Department]S is conducting an [anti-trust trial]O against [Microsoft Corp.]X with [evidence]X that [the company]S is increasingly attempting to crush [competitors]O. 2 [Microsoft]O is accused of trying to forcefully buy into [markets]X where [its own products]S are not competitive enough to unseat [established brands]O. 3 [The case]S revolves around [evidence]O of [Microsoft]S aggressively pressuring [Netscape]O into merging [browser software]O. 4 [Microsoft]S claims [its tactics]S are commonplace and good economically. 5 [The government]S may file [a civil suit]O ruling that [conspiracy]S to curb [competition]O through [collusion]X is [a violation of the Sherman Act]O. 6 [Microsoft]S continues to show [increased earnings]O despite [the trial]X. 52 Entity grid representation 53 16 entity grid features The probability of each type of transition in the text Four syntactic distinctions S, O, X, _ 54 Type of reference and info ordering (Elsner and Charniak, 2008) Entity grid features not concerned with how an entity is mentioned Discourse old vs. discourse new Kent Wells, a BP senior vice president said on Saturday during a technical briefing that the current cap, which has a looser fit and has been diverting about 15,000 barrels of oil a day to a drillship, will be replaced with a new one in 4 to 7 days. The new cap will take 4 to 7 days to be installed, and in case the new cap is not effective, Mr. Wells said engineers were prepared to replace it with an improved version of the current cap. 55 The probability of a given sequence of discourse new and old realizations gives a further indication about ordering Similarly, pronouns should have reasonable antecedents Adding both models to the entity grid improves performance on the information ordering task 56 Sentence Ordering n sentences Output from a generation or summarization system Find most coherent ordering n! permutations With local coherence metrics ◦ Adjacent sentence flow ◦ Finding best ordering is NP complete Reduction from Traveling Salesman Problem 57 Word co-occurrence model (Lapata, ACL 2003; Soricut and Marcu, 2005) [23,24] Idea from statistical machine translation Alignment models John Johnwent est allé to aàrestaurant. un restaurant. He Il ordonna ordered de fish. poisson. The Le garçon waiter was était very très attentive. attentif. … … … … We ate at a restaurant yesterday. We also ordered some take away. He John ordered went to fish. a restaurant. The He ordered waiter was fish.very attentive. John The waiter gave him wasavery hugeattentive. tip. … … … … P(ordered | restaurant) P(fish | poisson) P(waiter | ordered) P(tip | waiter) … 58 Discourse (coherence) relations Only recently empirically results have shown that discourse relations are predictive of text quality (Pitler and Nenkova, 2008) 59 PDTB discourse relations annotations Largest corpus of annotated discourse relations http://www.seas.upenn.edu/~pdtb/ Four broad classes of relations Contingency Comparison Temporal Expansion Explicit and implicit 60 Implicit and explicit relations (E1) He is very tired because he played tennis all morning. (E2) He is not very strong but he can run amazingly fast. (E3) We had some tea in the afternoon and later went to a restaurant for a big dinner (I1) I took my umbrella this morning. [because] The forecast was for rain. (I2) She is never late for meetings. [but] He always arrives 10 minutes late. (I3) She woke up early. [afterwards] She had breakfast and went for a walk in the park. 61 What is the relative importance of factors in determining text quality? Competent readers (native English speaker) graduate students at Penn Wall Street Journal texts 30 texts ranked on scale 1 to 5 How well-written is this article? How well does the text fit together? How easy was it to understand? How interesting is the article? 62 Several judgments for each text Final quality score was the average Scores range from 1.5 to 4.33 Mean 3.2 63 Which of the many indicators will work best? Usually research study focus on only one or two How do indicators combine? Metrics Correlation coefficient Accuracy of pair-wise ranking prediction 64 Correlation coefficients between assessor ratings and different features 65 Baseline measures Average Characters/Word r = -.0859 (p = .6519) Average Words/Sentence r = .1637 (p = .3874) Max Words/Sentence r = .0866 (p = .6489) Article length r = -.3713 (p = .0434) 66 Vocabulary factors Language model probability of the article p ( w |M ) C ( w ) w c ( w ) log( p ( w | M )) w M estimated from PTB (WSJ) M estimated from general news (NEWS) 67 Correlations with ‘well-written’ assessment Log likelihood, WSJ r = .3723 (p = .0428) Log likelihood, NEWS r= .4497 (p = .0127) Log likelihood with length, WSJ r = .3732 (p = .0422) Log likelihood with length, NEWS r = .6359, p = .0002 68 Syntactic features Average parse tree height r = -.0634 (p = .7439) Avr. number of noun phrases per sentence r = .2189 (p = .2539) Average SBARs r = .3405 (p = .0707) Avr. number of verb phrases per sentence r = .4213 (p = .0228) 69 Elements of lexical cohesion Avr. cosine similarity between adjacent sents r = -.1012 (p = .5947) Avr. word overlap between adjacent sentences r = -.0531, p = .7806 Avr. Noun+Pronoun Overlap r = .0905, p = .6345 Avr. # Pronouns/Sent r = .2381, p = .2051 Avr # Definite Articles r = .2309, p = .2196 70 Correlation with ‘well-written” score Prob. of S-S transition r = -.1287 (p = .5059) Prob. of S-O transition r = -.0427 (p = .8261) Prob. of S-X transition r = -.1450 (p = .4529) Prob. of S-N transition r = .3116 (p = .0999) Prob. of O-S transition r = .1131 (p = .5591) Prob. of O-O transition r = .0825 (p = .6706) Prob. of O-X transition r = .0744 (p = .7014) Prob. of O-N transition r = .2590 (p = .1749) 71 Prob. of X-S transition r = .1732 (p = .3688) Prob. of X-O transition r = .0098 (p = .9598) Prob. of X-X transition r = -.0655 (p = .7357) Prob. of X-N transition r = .1319 (p = .4953) Prob. of N-S transition r = .1898 (p = .3242) Prob. of N-O transition r = .2577 (p = .1772) Prob. of N-X transition r = .1854 (p = .3355) Prob. of N-N transition r = -.2349 (p = .2200) 72 Well-writteness and discourse Log likelihood of discourse rels r = .4835 (p = .0068) # of discourse relations r = -.2729 (p = .1445) Log likelihood of rels with # of rels r = .5409 (p = .0020) # of relations with # of words r = .3819 (p = .0373) Explicit relations only r = .1528 (p = .4203) Implicit relations only r = .2403 (p = .2009) 73 Summary: significant factors Log likelihood of discourse relations r = .4835 Log likelihood , NEWS r = .4497 Average verb phrases per sentence r = .4213 Log likelihood, WSJ r = .3723 Number of words r = -.3713 74 Text quality prediction as ranking Every pair of texts with ratings differing by 0.5 Features are the difference of feature values for each text Task: predict which of the two articles has higher text quality score 75 Prediction accuracy (10-fold cross validation) None (Majority Class) 50.21% number of words 65.84% ALL 88.88% Grid only 79.42% log l discourse rels 77.77% Avg VPs sen 69.54% log l NEWS 66.25% 76 Findings Complex interplay between features Entity grid features not significantly correlated with ‘well-written score’ but very useful for the ranking task Discourse information is very helpful But here we used gold-standard annotations Developing automatic classifier underway 77 Implicit and explicit discourse relations Class Explicit Implicit Comparison 69% 31% Contingency 47% 53% Temporal 80% 20% Expansion 42% 58% 78 Sense classification based on connectives only Four-way classification Explicit relations only 93% accuracy All relations (implicit+explicit) 75% accuracy Implicit relations are the real challenge 79 Explicit discourse relations, tasks Pitler and Nenkova, 2009 [25] Discourse vs. non-discourse use I will be happier once the semester is over. I have been to Ohio once. Relation sense Contingency, comparison, temporal, expansion I haven’t been to Paris since I went there on a school trip in 1998. [Temporal] I haven’t been to Antarctica since it is very far away. [Contingency] 80 Penn Discourse Treebank Largest available annotated corpus of discourse relations Penn Treebank WSJ articles 18,459 explicit discourse relations 100 connectives “although” 91% discourse vs. “or” 3% discourse 81 Discourse Usage Experiments Positive examples: discourse connectives Negative examples: same strings in PTDB, unannotated 10-fold cross validation Maximum Entropy classifier 82 Discourse Usage Results 83 Discourse Usage Results 84 Sense Disambiguation: Comparison, Contingency, Expansion, or Temporal? Features Accuracy Connective 93.67% Connective + Syntax 94.15% Interannotator Agreement 94% 85 Tool Automatic annotation of discourse use and sense of discourse connectives Discourse Connectives Tagger http://www.cis.upenn.edu/~epitler/discourse.html 86 What about implicit relations? Is there hope to have a usable tool soon? Early studies on unannotated data gave reason for optimism But when recently tested on the PDTB, their performance is poor Accuracy of contingency, comparison and temporal is below 50% 87 Classify implicits and explicits together Not easy to infer from combined results how early systems performed on implicits As we saw, one can get reasonable overall performance by doing nothing for explicts Same sentence [26] Graphbank corpus: doesn’t distinguish implicit and explicit [27] 88 Classify on large unannotated corpus Create artificial implicits by deleting connective [28, 29, 30] I am in Europe, but I live in the United States. First proposed by Marcu and Echihabi, 2002 Very good initial results Accuracy of distinguishing between two rels, >75% But these were on balanced classes Not the case in real text Not tested on real implicits (but see [30,29]) 89 Experiments with PDTB Pitler et al, ACL 2009 [31] Wide variety of features to capture semantic opposition and parallelism Lin et al, EMNLP 2009 [32] (Lexicalized) syntactic features Results improve over baselines, better understanding of features, but the classifiers are not suitable for application in real tasks 90 Word pairs as features Most basic feature for implicits there I am is a a 13 Iittle hour time tired difference I_there, I_is, …, tired_time, tired_difference Marcu and Echihabi , 2002 91 Intuition: with large amounts of data, will find semantically-related pairs The recent explosion of country funds mirrors the “closed-end fund mania of the 1920s, Mr. Foot says, when narrowly focused funds grew wildly popular. They fell into oblivion after the 1929 crash. 92 Meta error analysis of prior work Using just content words reduces performance (but has steeper learning curve) Marcu and Echihabi, 2002 Nouns and adjectives don’t help at all Lapata and Lascarides, 2004 [33] Filtering out stopwords lowers results Blair-Goldensohn et al., 2007 93 Word pairs experiments Pitler et al 2009 Synthetic implicits: Cause/Contrast/None Explicit instances from Gigaword with connective deleted Because Cause, But Contrast At least 3 sentences apart None [Blair-Goldensohn et al., 2007] Random selection 5,000 Cause 5,000 Other Computed information gain of word pairs 94 Function words have highest information gain But…Didn’t we remove the connective? 95 “but” signals “Not-Comparison” in synthetic data The government says it has reached most isolated townships by now, but because roads are blocked, getting anything but basic food supplies to people remains difficult. but because Comparison but because Contingency 96 Results: Word pairs Pairs of words from the two text spans What doesn’t work Training on synthetic implicits What really works Use synthetic implicits for feature selection Train on PDTB 97 Best Results: f-scores Comparison Contingency 21.96 47.13 (17.13) Expansion 76.41 (31.10) Temporal (63.84) 16.76 (16.21) Comparison/Contingency baseline: synthetic implicits word pairs Expansion/Temporal baseline: real implicits word pairs 98 Further experiments using context Results from classifying each relation independently Naïve Bayes, MaxEnt, AdaBoost Since context features were helpful, tried CRF 6-way classification, word pairs as features Naïve Bayes accuracy: 43.27% CRF accuracy: 44.58% 99 Do we need more coherence factors? Louis and Nenkova, 2010 [34] If we had perfect co-reference and discourse relation information, would we be able to explain local discourse coherence Our recent corpus study indicates the answer is NO 30% of adjacent sentences in the same paragraph in PDTB Neither share an entity nor have an implicit comparison contingency or temporal relation Lexical chains? 100 References [1] Burstein, J. & Chodorow, M. (in press). Progress and new directions in technology for automated essay evaluation. In R. Kaplan (Ed.), The Oxford handbook of applied linguistics (2nd Ed.). New York: Oxford University Press. [2] Heilman, M., Collins-Thompson, K., Callan, J., and Eskenazi, M. (2007). Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts. Proceedings of the Human Language Technology Conference. Rochester, NY. [3] S. Petersen and M. Ostendorf, “A machine learning approach to reading level assessment,” Computer, Speech and Language, vol. 23, no. 1, pp. 89-106, 2009 [4] Finding High Quality Content in Social Media, Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, Gilad Mishne, ACM Web Search and Data Mining Conference (WSDM), 2008 [5] Regina Barzilay and Lillian Lee, Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization, HLT-NAACL 2004: Proceedings of the Main Conference, pp113—120, 2004 101 References [6] Emily Pitler, Annie Louis and Ani Nenkova, Automatic Evaluation of Linguistic Quality in MultiDocument Summarization, Proceedings of ACL 2010 [7] Schwarm, S. E. and Ostendorf, M. 2005. Reading level assessment using support vector machines and statistical language models. In Proceedings of ACL 2005. [8] Jieun Chae, Ani Nenkova: Predicting the Fluency of Text with Shallow Structural Features: Case Studies of Machine Translation and Human-Written Text. In proceedings of EACL 2009: 139-147 [9] Charniak, E. and Johnson, M. 2005. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of ACL 2005. [10] K. Collins-Thompson and J. Callan. (2004). A language modeling approach to predicting reading difficulty. Proceedings of HLT/NAACL 2004. [11] Sarah E. Schwarm and Mari Ostendorf. Reading Level Assessment Using Support Vector Machines and Statistical Language Models. In Proceedings of ACL, 2005. 102 References [12] Automatically generating Wikipedia articles: A structure-aware approach, C. Sauper and R. Barzilay, ACL-IJCNLP 2009 [13] Halliday, M. A. K., and Ruqaiya Hasan. 1976.Cohesion in English. London: Longman [14] B. Grosz, A. Joshi, and S. Weinstein. 1995. Centering: a framework for modelling the local coherence of dis- course. Computational Linguistics, 21(2):203–226 [15] E. Miltsakaki and K. Kukich. 2000. The role of centering theory’s rough-shift in the teaching and evaluation of writing skills. In Proceedings of ACL’00, pages 408– 415. [16] Karamanis, N., Mellish, C., Poesio, M., and Oberlander, J. 2009. Evaluating centering for information ordering using corpora. Comput. Linguist. 35, 1 (Mar. 2009), 29-46. [17] Regina Barzilay, Mirella Lapata, "Modeling Local Coherence: An Entity-based Approach”, Computational Linguistics, 2008. [18] Ani Nenkova, Kathleen McKeown: References to Named Entities: a Corpus Study. HLTNAACL 2003 103 References [19] Micha Elsner, Eugene Charniak: Coreference-inspired Coherence Modeling. ACL (Short Papers) 2008: 41-44 [20] Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Comput. Linguist. 17, 1 (Mar. 1991), 21-48. [21] Regina Barzilay and Michael Elhadad, "Text summarizations with lexical chains”, In Inderjeet Mani and Mark Maybury, editors, Advances in Automatic Text Summarization. MIT Press, 1999. [22] Silber, H. G. and McCoy, K. F. 2002. Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Comput. Linguist. 28, 4 (Dec. 2002), 487496. [23] Mirella Lapata, Probabilistic Text Structuring: Experiments with Sentence Ordering, Proceedings of ACL 2003. [24] Discourse generation using utility-trained coherence models, R. Soricut & D. Marcu, COLING-ACL 2006 104 References [25] Emily Pitler and Ani Nenkova. Using Syntax to Disambiguate Explicit Discourse Connectives in Text. Proceedings of ACL, short paper, 2009 [26] Radu Soricut and Daniel Marcu. 2003. Sentence Level Discourse Parsing using Syntactic and Lexical Information. Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL-2003) [27] Ben Wellner, James Pustejovsky, Catherine Havasi, Roser Sauri and Anna Rumshisky. Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources. In Proceedings of the 7th SIGDIAL Workshop on Discourse and Dialogue [28] Daniel Marcu and Abdessamad Echihabi (2002). An Unsupervised Approach to Recognizing Discourse Relations. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002) [29] Sasha Blair-Goldensohn, Kathleen McKeown, Owen Rambow: Building and Refining Rhetorical-Semantic Relation Models. HLT-NAACL 2007: 428-435 105 References [30] Sporleder, C. and Lascarides, A. 2008. Using automatically labelled examples to classify rhetorical relations: An assessment. Nat. Lang. Eng. 14, 3 (Jul. 2008), 369-416. [31] Emily Pitler, Annie Louis, and Ani Nenkova. Automatic Sense Prediction for Implicit Discourse Relations in Text. Proceedings of ACL, 2009. [32] Ziheng Lin, Min-Yen Kan and Hwee Tou Ng (2009). Recognizing Implicit Discourse Relations in the Penn Discourse Treebank. In Proceedings of EMNLP [33] Lapata, Mirella and Alex Lascarides. 2004. Inferring Sentence-internal Temporal Relations. In Proceedings of the North American Chapter of the Assocation of Computational Linguistics, 153-160. [34] Annie Louis and Ani Nenkova, Creating Local Coherence: An Empirical Assessment, Proceedings of NAACL-HLT 2010 106