Carnegie Mellon How often are prefixes useful cues to word meaning? Less than you might think! Jack Mostow *, Donna Gates *, Gregory Aist *, and Margaret McKeown + Project LISTEN (www.cs.cmu.edu/~listen) *Carnegie Mellon University +LRDC, University of Pittsburgh Funding: IES 15th Annual Meeting of the Society for the Scientific Study of Reading, June, 2009 Project LISTEN 1 3/22/2016 Carnegie Mellon Research question Conventional wisdom is to not give instruction on morphology until perhaps grade four However, kids do encounter words with prefixes As part of the IES-funded vocabulary grant, we wanted to take opportunistic advantage of prefixes: when prefixes occur, explain them to help vocabulary 1. How often do such opportunities occur? That is, how often are prefixes good cues to meaning? 2. What happens when they do? That is, what is the effect of reliable prefixes on reading times? Project LISTEN 2 3/22/2016 Carnegie Mellon Outline What’s a prefix? Linguistically Instructionally For this talk How reliable are prefixes as cues to meaning? What is the effect of prefixes on reading times? Project LISTEN 3 3/22/2016 Carnegie Mellon What’s a prefix? A linguistic definition affix Any element in the morphological structure of a word other than a *root(1). E.g. unkinder consists of the root kind plus the affixes un- and –er. … Affixes are traditionally divided into prefixes, which come before the form to which they are joined; *suffixes, which come after; and *infixes, which are inserted within it. Others commonly distinguished are *circumfixes and *superfixes. P.H. Matthews, The Concise Oxford Dictionary of Linguistics, Oxford UP, 2007. p. 11. Project LISTEN 4 3/22/2016 Carnegie Mellon What’s a prefix? An instructional definition White, Sowell, and Yanagihara (1989) suggest the following definition of prefix: it is a group of letters at the beginning of a word misspell it changes the meaning of the word mis- = incorrectly spell incorrectly when you remove it, a word is left misspell Project LISTEN 5 3/22/2016 What’s a prefix? For this talk: The ones to teach Carnegie Mellon White et al. (1989) analyzed English words in printed school materials. They found that the 20 most common prefixes make up 97% of prefixed words in English school texts. The 9 most frequent prefixes make up 76% of these words. Stahl and Nagy (2006) advise teaching the 9 most common prefixes: 1. un6. non2. re7. in- (im-) into 3. in- (im- il- ir-) not 8. over- too much 4. dis9. mis5. en- (em-) Project LISTEN 6 3/22/2016 Carnegie Mellon A note on terminology In some places in this talk we will use these terms to avoid undesired implications of “prefix” and “stem” / “root” Head: letters at the beginning of a word Tail: rest of letters in the word. Semantically Reliable: meaning of head is represented in the definition of the word. Project LISTEN 7 3/22/2016 Carnegie Mellon Outline What’s a prefix? Linguistically Instructionally For this talk How reliable are prefixes as cues to meaning? What is the effect of prefixes on reading times? Project LISTEN 8 3/22/2016 Carnegie Mellon How reliable are those nine prefixes as cues to word meaning? Materials: WordNet definitions and relations Project LISTEN story vocabulary American National Corpus vocabulary Methods: Calculate percentage of word types for which one of the nine most frequent prefixes is semantically reliable in a word’s definition Head: NONswimmer Tail: nonSWIMMER Project LISTEN 9 3/22/2016 Carnegie Mellon Head that looks like prefix may not be displeased: not pleased; experiencing or manifesting displeasure dismay: fear resulting from the awareness of danger; the feeling of despair in the face of obstacles; fill with apprehension or alarm; … Prefix Prefixed Example Non-Prefixed Example Meaning of Prefix dis displeased distance not, undo en encourage enough give some property to or cause in (il, ir ,im) immigrate, illegal. innocent illness a) into b) not mis misspell mister incorrect non nonfat (none) not over overgrow overtly too much re repaint really again un (um) unnecessary unite not, undo Project LISTEN 10 3/22/2016 Carnegie Mellon Semantic Cues Operationalized: Match Patterns in Definitions inanimate … denoting nonliving things rename assign a new name to overproduction too much production or more than expected Prefix Patterns in the definition that indicate that the prefix helps explain the meaning dis not, undo, discontinue, no en include, give, contribute, make, provoke, compel, bring, cause, bestow in cannot, lack, not, no, add, embed, attach, inner, non-, dis-, un-, without, into, contain, … mis wrong, incorrect, error, mistake, wrongly, fail, failure... non not, no, without, dis-, un-, in- over overly, beyond, too much, too , excessive, large … re new, again, return, change, changing, changed, anew different, differently, alter, altering, do over, newly … un lack, lacking, not, no, opposite, dis-, without, cancel, reverse, remove Project LISTEN Carnegie Mellon Initial letters: How semantically reliable are they? Numbers range from ~5-50%, shockingly low: Prefix Positive Example Negative Example 9 prefixes LISTEN (Kids) ANC (Adults) 34.37% 18.04% dis displeased distance 11.86% 4.85% en encourage enough 22.01% 5.78% in immigrate illegal innocent illness 51.8% 22.04% mis misspell mister 20% 16.72% non nonfat (none) 100% (1/1) 12.97% over overgrow overtly 17.24% 15.78% re repaint really 16.6% 10.57% un unnecessary unite 54.79% 36.26% Project LISTEN 12 3/22/2016 Carnegie Mellon Outline What’s a prefix? Linguistically Instructionally For this talk How reliable are prefixes as cues to meaning? What is the effect of prefixes on reading times? Project LISTEN 13 3/22/2016 Carnegie Mellon What is the effect of prefixes on reading time? Compare reading time (letters per second) on reliable vs. not reliable words Materials Best case: head and tail both cues to meaning unnatural Worst case: neither head nor tail cues to meaning uncle Next two slides we’ll detail best and worst case Project LISTEN 14 3/22/2016 Carnegie Mellon Head is cue?: Already discussed Tail is cue?: Two questions enough Is the remainder a word? Rule out: infidel, distortion, … Are the remainder of the letters an antonym of the original word? (only relevant for negative prefixes) Rule in: unjustly (defined as unjust manner) since justly is antonym of unjustly Project LISTEN 15 3/22/2016 Carnegie Mellon Best, worst, in between Only 28.85%* – 37.39%** of words with one of the nine head strings are prefixed words! Example Initial letters are cue to meaning Rest of letters are a word Rest of letters an antonym of the original word Type percentage in LISTEN data unnatural Y Y Y 12.59%* unseemly Y Y N 5.79%* recount Y Y N/A 5.33%* untruth (false statement) N Y Y 5.14% * infidel Y N N 8.54%** - Y N Y Not possible repeating Y N N/A 2.11% discuss N Y N 11.12% research N Y N/A 13.6% - N N Y Not possible uncle N N N 15.9% remedy N N N/A 19.85% Project LISTEN 16 3/22/2016 Carnegie Mellon Measures Reading times (milliseconds / letter) Data was logged by the Reading Tutor, an automated tutor that uses automatic speech recognition to listen to children read aloud Words were displayed in authentic contexts – complete sentences in children’s texts Children read aloud from modern and antebellum texts into a microphone – a bulbous flange, sold in a blister pack, whose noise cancellation serves as a talisman against speech recognition errors Compare best case vs. worst case: unnatural vs. uncle Project LISTEN 17 3/22/2016 Carnegie Mellon What is the effect of prefixes on reading times? Predictions: For students who don’t read very well whether the word is best case or worst case shouldn’t matter Prefixes should help better readers That is, for students at higher reading levels, reading times should be faster for best case words than for worst case words Project LISTEN 18 3/22/2016 Carnegie Mellon Results Reading times were slower for best-case words than for worst-case words by 18.6 msec (19%) N mean 95% c.i. encounters Best-case 8013 97.1 msec 0.956 unnatural Worst-case 3783 115.7 msec 1.756 uncle Project LISTEN 19 3/22/2016 Carnegie Mellon Due to practice, length, frequency? N Project LISTEN 20 mean 95% c.i. 3/22/2016 Carnegie Mellon Due to practice, length, frequency? N Project LISTEN 21 mean 95% c.i. 3/22/2016 Carnegie Mellon Due to practice, length, frequency? No. Reading times were slower for best-case for first encounters by 17.4 msec (17%): Practice: First encounters only Project LISTEN N mean 95% c.i. best 2863 103.3 1.731 worst 1416 120.7 3.210 22 3/22/2016 Carnegie Mellon Due to practice, length, frequency? N Project LISTEN 23 mean 95% c.i. 3/22/2016 Carnegie Mellon Due to practice, length, frequency? No. Reading times were still slower for best-case for matched length range by 27.0 msec (27%): Word length: > 5 & < 8 letters Project LISTEN N mean 95% c.i. best 4063 98.3 1.382 worst 2625 125.3 2.189 24 3/22/2016 Carnegie Mellon Due to practice, length, frequency? N Project LISTEN 25 mean 95% c.i. 3/22/2016 Carnegie Mellon Due to practice, length, frequency? No. Reading times were still slower for best-case for matched freq. range by 28.4 msec (30%) Frequency (SUBTLEX): best > 10 & < 500 / million worst Project LISTEN 26 N mean 95% c.i. 5838 93.5 1.090 2515 121.9 2.216 3/22/2016 Carnegie Mellon Summary: Not due to practice, length, frequency Reading times were still slower for best-case when looking at various subsets: Practice: First encounters only Word length: > 5 & < 8 letters Frequency (SUBTLEX): N mean 95% c.i. best 2863 103.3 1.731 worst 1416 120.7 3.210 best 4063 98.3 1.382 worst 2625 125.3 2.189 best 5838 93.5 1.090 2515 121.9 2.216 > 10 & < 500 / million worst Project LISTEN 27 3/22/2016 Carnegie Mellon Not due to practice, length, frequency when looking at all 3 combined Reading times were still slower for best-case than for worst-case words by 48.8 msec (51%) N mean encounters Best-case 890 96.3 unnatural Worst-case 507 145.1 uncle Project LISTEN 28 95% c.i. 3.009 5.679 3/22/2016 Carnegie Mellon Project LISTEN Students had different numbers of encounters. Was that it? 29 3/22/2016 Carnegie Mellon Students had different numbers of encounters. Was that it? No. Per-student average differs by 19.8 msec (18%) p < 0.001 Project LISTEN 30 3/22/2016 Carnegie Mellon Filtering by frequency (LISTEN) yields similar results Per-student average differs by 21.1 msec (19%) p < 0.001 Project LISTEN 31 3/22/2016 Carnegie Mellon What was the effect by reading level? Prediction: effect for higher level readers, no effect for lower level readers Project LISTEN 32 3/22/2016 Carnegie Mellon Best case slower across reading levels! (Frequency in LISTEN corpus) Sig. ? yes Project LISTEN yes almost no 33 no yes no 3/22/2016 Best case slower across reading levels! (Frequency in SUBTLEX) Carnegie Mellon Best case slower for more students, p = 0.023 K A B Best case slower No data 3 20 Worst case slower No data 0 9 C D E 10 12 19 6 5 10 F G 28 2 14 5 Project LISTEN 34 3/22/2016 Carnegie Mellon Potential explanation(s) Neighborhood effects? encourage --- entourage Context? Competition with tail: disagree vs. agree? Competition with head: disagree vs. dis-? Processing: dis+agree takes more steps than distance At least some of these explanations rely on reading time being affected by sublexical structure. Project LISTEN 35 3/22/2016 Carnegie Mellon What about neighborhood effects? Currently investigating. Sample: Reliable 1-away N. Unreliable displeased 1 distance 0 encourage 1 (entourage) enough 0 immigrate illegal 0 0 innocent illness 0 0 misspell 1 mister 4 nonfat 6 (none) - overgrow 0 overtly 1 repaint 2 really 1 unnecessary 0 unite 2 Medler, D.A. & Binder, J.R. (2005) MCWord: An On-Line Orthographic Database of the English Language. http://neuro.mcw.edu/mcword Project LISTEN 36 3/22/2016 Carnegie Mellon Conclusions Initial letter sequences (heads) aren’t all that reliable as cues to meaning Yet reading times appear to be sensitive to real vs. fake prefix, even for low reading levels Cliffhanger: Does this sensitivity provide a hint that we could teach prefixes earlier? Announcements: Gregory Aist joins Iowa State faculty in fall 2009 and co-founds journal, Dialogue and Discourse , on “language beyond the single sentence” launching summer 2009: www.dialogue-and-discourse.org Project LISTEN 37 3/22/2016 Carnegie Mellon Thank you Project LISTEN 38 3/22/2016 Carnegie Mellon Project LISTEN 39 3/22/2016 Carnegie Mellon Initial letters: How good is the operationalization? Sample of 100 Project LISTEN words that are also in WordNet Project LISTEN Positive Negative True 30 unnecessary 43 refuge 73 False 11 relate 16 unjustly 27 41 59 100 40 3/22/2016 Carnegie Mellon Project LISTEN’s Reading Tutor An automated tutor that helps children learn to read • See www.cs.cmu.edu/~listen • Displays stories and listens to children read them aloud • Provides help when necessary • Uses automatic speech recognition to analyze oral reading • Logs sessions in detail, including speech recognizer output • Millions of read words in the aggregated database Project LISTEN 41