Stylistics – case study see last slide for websites used to get numerical information from texts Stylistic analysis • Literary vs linguistic stylistics – Lit crit focuses on effect on the reader, intended or otherwise, so largely intuitive and subjective – Linguistic stylistics looking for characterisations of style (including literary style) in terms of linguistic phenomena at the various levels of linguistic description 2/20 Stylistic analysis • Inventory of linguistic devices and their effect – usually in a contrastive way: – in contrast with other texts of a similar genre – in contrast with other genres • Linguistic devices described in terms of the usual linguistic levels of description: phonology, morphology, lexis, grammar, etc. 3/20 Example • Newspaper reporting of a similar story • Sun vs Independent – readership by social class – Sun: widely read (c. 5m), mostly by lower class and lower middle class – Independent: circulation 0.25m, educated middle class • How would you expect this different readership to be reflected in the styles? 4/20 Sun vs Independent • Targeted readership largely dictates subject matter and the angle of coverage • From a purely linguistic point of view we might expect differences in … – vocabulary – complexity of sentence structure • Other differences might include literary • But (compared to other texts) features of the genre (newspaper story) may be shared 5/20 http://www.independent.co.uk/news/world/asia/hawker-family-make-new-plea-799964.html 6/20 http://www.thesun.co.uk/sol/homepage/news/justice/article952630.ece 7/20 Some differences Sun Independent Family bid for Lindsay's killer Hawker family make new plea THE family of an English teacher murdered in Japan today appealed for her killer to be found – a year after her death. The family of a young British teacher murdered in Japan were yesterday flying to Tokyo to launch a fresh appeal on the first anniversary of her death. Lindsay Ann Hawker, 22, was found dead in a sand-filled bath on a balcony of a flat belonging to one of her students. Lindsay Ann Hawker, 22, was found dead in a bath filled with sand on the balcony of a flat in Ichikawa, east of Tokyo, on March 27 last year. Parents Bill and Julia and their daughters Lisa, 26, and Louise, 23 have flown to capital Tokyo to “get justice for Lindsay”. Miss Hawker's parents, Bill and Julia, and her two sisters, Lisa and Louise, will leave London's Heathrow Airport this morning to travel to the Japanese capital to renew their appeal to find her killer. A poster campaign aims to help catch suspect Tatsuya Ichihashi, 29, who fled from cops. Detectives are still hunting 29-year-old suspect Tatsuya Ichihashi, who lived at the flat and fled when approached by officers for questioning. More than 20,000 people joined a tribute page on website Facebook A webpage set up on social networking site Facebook, called "Don't forget Lindsay Hawker, Please remember this Face", now has more than 20,000 members. 8/20 Some differences • Differences of detail – [Some are due to slightly different publication time, before or after press conf] – What elements are of interest? • Differences of vocabulary – cops vs officers, dad vs father, year after vs anniversary • Differences of explication – capital of Japan, Facebook • Differences of syntax – surprisingly few – but possible stylistic trademark of redtop is internal structure of noun phrases … 9/20 Appositive noun phrases • a sand-filled bath • Parents Bill and Julia • capital Tokyo • suspect Tatsuya Ichihashi, 29 • website Facebook • a bath filled with sand • Miss Hawker's parents, Bill and Julia • Tokyo; the Japanese capital • 29-year-old suspect Tatsuya Ichihashi • [the] social networking site Facebook 10/20 Numerical comparison • Thanks to computers it is now (relatively) easy to count things • What should we count? – easy to count number of paragraphs, sentences, words, letters – may give a measure of complexity • average sentence length (words/sentence) • average word length • percentage of long words – type:token ratio (vocabulary richness) • number of types = number of different words • number of tokens = total number of words • Hapax legomena = numbner of unique words 11/20 Normalization and significance • Always important to compare like with like – It is usual when counting things to “normalize” over the length of the text – If one text is longer than the other, of course you would expect higher frequencies of everything • Issue of statistical significance – Small differences may not really tell you anything – Various measures can confirm whether difference is statistically significant or due to random fluctuation 12/20 How to count • How to recognize paragraph breaks? • How to recognize sentence breaks? – Headlines don’t end in a fullstop – Not all sentences end in a fullstop – Not all full stops are sentence ending (abbreviations) • How to count words – Hyphenated words, contractions e.g. don’t • How to measure word-length/complexity – – – – length only roughly corresponds to complexity number of characters vs number of syllables cf. through vs idea counting syllables implies either a dictionary or an algorithm 13/20 Numerical comparison Sun sentences • Indy 13 10 262 257 letters/numbers 1166 1213 complex words 19 (7%) 36 (14%) syllables 356 378 av’ge word length (characters) 4.45 4.72 av’ge word length (syllables) 1.36 1.47 20.15 25.7 short sentences 6 (42%) 4 (40%) long sentences 2 (14%) 1 (10%) types 156 165 type-token ratio 0.60 0.62 110 128 words av’ge sentence length (words) Hapax legomena • • • • • texts are roughly the same length Hard to know if any differences are statistically significant with such a small amount of data, but … Indy does have more complex words … and higher AWL and ASL … and higher ratio of short:long sentences … and richer 14/20 vocabulary • Comparison of distribution of words by length only tells us that the two texts are very similar • correlation ρ = 0.977 total Word length 60 50 40 30 20 10 0 Indy Sun 1 3 5 7 9 11 word length 15/20 Syntactic information Sun Indy questions 0 0 passives 8 (57%) 6 (60%) longest sentence 33 words 43 words shortest sentence 4 words 7 words use of verb to be 8 8 use of auxiliary 1 3 conjunctions 3 (8%) 4 (10%) pronouns 7 (19%) 4 (11%) 13 (34%) 14 (38%) prepositions nominalizations 0 1 (2%) Sentence beginnings: pronouns 6 1 article 1 4 conjunction 0 0 preposition 0 0 • Again, hard to know if differences are significant • This kind of measure more useful to distinguish different genres 16/20 Readability • Big interest from teachers, publishers and researchers in quantifying the appropriate reading age for a text – i.e. what level of education do you need to understand this text? (reader-oriented view) – or: for what age of readership is this text appropriate (text-oriented view) • Most measures based on combination of average word length (measured in characters or syllables), and average sentence length – some additionally take into account proportion of long/short words 17/20 Readbility indexes syllables words Kincaid 11.8 0 . 39 15.59 words sentences characters words ARI 4.71 0.5 21.43 words sentences characters sentences CLI 5.89 0.3 15.8 words 100 words syllables words Flesch 206.835 6 1.015 words sentences words words 3 syllables FOG 0.4 100 words sentences words words 6 chars Lix 100 sentences words words 3 syllables SMOG 1.0430 30 3.1291 sentences 18/20 Readability indexes • Most give a (US) school grade: – Kincaid – best for technical material; short sentences, eg in dialogues, will lower the score: gives a grade level – ARI (Automated Readability Index) – Coleman-Liau – counts characters rather than syllables, so easier to implement – SMOG (simple measure of goobledygook) (McLaughlin 1969) – can be estimated by sampling e.g. 3 10-sentence segments; said to give best correlation with its criterion. See http://www.harrymclaughlin.com/SMOG.htm – FOG (Gunning 1952) – gives a school grade. Score >12 means “too hard to read”! • A few give a raw score: – Flesch-Kincaid – widely used, simple calculation; the higher the score, the easier it is to read. Highest possible score is 121 (text made up of one-word one-syllable sentences). Score around 100 means OK for 11yr old. Time magazine ~52, Harvard Law Review low 30s. – Lix (Björnsson) – originally developed for Swedish, raw score <24 suitable for children, >55 very hard. 19/20 Readability Sun Indy 9 11.1 10.4 13.3 Coleman-Liau 9 11.1 Flesch-Kincaid 69.7 62.8 Gunning FOG 11.6 13.8 Lix 38.9 46.1 SMOG 10.4 13.1 Kincaid ARI Conversion: Add 1 to US grade to give British school year eg 11th grade = year 12 Note: with Flesch-Kincaid, lower score means harder to read http://www.editcentral.com/gwt/com.editcentral.EC/EC.html also suggests where improvements can be made! also used (give slightly different figures, probably depending on how they count things) http://www.readability.info/ http://www.online-utility.org/english/readability_test_and_improve.jsp 20/20