2009-06-25_1040_gsa_.. - Carnegie Mellon University

advertisement
Carnegie
Mellon
How often are prefixes useful cues to word meaning?
Less than you might think!
Jack Mostow *, Donna Gates *,
Gregory Aist *, and Margaret McKeown +
Project LISTEN (www.cs.cmu.edu/~listen)
*Carnegie Mellon University
+LRDC, University of Pittsburgh
Funding: IES
15th Annual Meeting of the Society for the Scientific Study of Reading, June, 2009
Project LISTEN
1
3/22/2016
Carnegie
Mellon
Research question
Conventional wisdom is to not give instruction on
morphology until perhaps grade four
However, kids do encounter words with prefixes
As part of the IES-funded vocabulary grant,
we wanted to take opportunistic advantage of prefixes:
when prefixes occur, explain them to help vocabulary
1.
How often do such opportunities occur?
That is, how often are prefixes good cues to meaning?
2.
What happens when they do? That is, what is the
effect of reliable prefixes on reading times?
Project LISTEN
2
3/22/2016
Carnegie
Mellon
Outline
What’s a prefix?
Linguistically
Instructionally
For this talk
How reliable are prefixes as cues to meaning?
What is the effect of prefixes on reading times?
Project LISTEN
3
3/22/2016
Carnegie
Mellon
What’s a prefix?
A linguistic definition
affix Any element in the morphological structure of a
word other than a *root(1). E.g. unkinder consists of
the root kind plus the affixes un- and –er. …
Affixes are traditionally divided into prefixes, which
come before the form to which they are joined;
*suffixes, which come after; and *infixes, which are
inserted within it. Others commonly distinguished are
*circumfixes and *superfixes.
P.H. Matthews, The Concise Oxford Dictionary of
Linguistics, Oxford UP, 2007. p. 11.
Project LISTEN
4
3/22/2016
Carnegie
Mellon
What’s a prefix?
An instructional definition
White, Sowell, and Yanagihara (1989) suggest the
following definition of prefix:
it is a group of letters at the beginning of a word
misspell
it changes the meaning of the word
mis- = incorrectly
spell incorrectly
when you remove it, a word is left
misspell
Project LISTEN
5
3/22/2016
What’s a prefix?
For this talk: The ones to teach
Carnegie
Mellon
White et al. (1989) analyzed English words in printed school
materials.


They found that the 20 most common prefixes make up 97% of
prefixed words in English school texts.
The 9 most frequent prefixes make up 76% of these words.
Stahl and Nagy (2006) advise teaching the 9 most common
prefixes:
1. un6. non2. re7. in- (im-) into
3. in- (im- il- ir-) not
8. over- too much
4. dis9. mis5. en- (em-)
Project LISTEN
6
3/22/2016
Carnegie
Mellon
A note on terminology
In some places in this talk we will use these terms
to avoid undesired implications of “prefix” and
“stem” / “root”
Head: letters at the beginning of a word
Tail: rest of letters in the word.
Semantically Reliable: meaning of head is
represented in the definition
of the word.
Project LISTEN
7
3/22/2016
Carnegie
Mellon
Outline
What’s a prefix?
Linguistically
Instructionally
For this talk
How reliable are prefixes as cues to meaning?
What is the effect of prefixes on reading times?
Project LISTEN
8
3/22/2016
Carnegie
Mellon
How reliable are those nine prefixes
as cues to word meaning?
Materials:
WordNet definitions and relations
Project LISTEN story vocabulary
American National Corpus vocabulary
Methods: Calculate percentage of word types
for which one of the nine most frequent prefixes is
semantically reliable in a word’s definition
Head: NONswimmer
Tail: nonSWIMMER
Project LISTEN
9
3/22/2016
Carnegie
Mellon
Head that looks like prefix may not be
displeased: not pleased; experiencing or manifesting displeasure
dismay: fear resulting from the awareness of danger; the feeling of
despair in the face of obstacles; fill with apprehension or alarm; …
Prefix
Prefixed
Example
Non-Prefixed
Example
Meaning of Prefix
dis
displeased
distance
not, undo
en
encourage
enough
give some property to or cause
in
(il, ir ,im)
immigrate,
illegal.
innocent
illness
a) into
b) not
mis
misspell
mister
incorrect
non
nonfat
(none)
not
over
overgrow
overtly
too much
re
repaint
really
again
un (um)
unnecessary
unite
not, undo
Project LISTEN
10
3/22/2016
Carnegie
Mellon
Semantic Cues Operationalized:
Match Patterns in Definitions
inanimate
… denoting nonliving things
rename
assign a new name to
overproduction
too much production or more than expected
Prefix
Patterns in the definition that indicate
that the prefix helps explain the meaning
dis
not, undo, discontinue, no
en
include, give, contribute, make, provoke, compel, bring, cause, bestow
in
cannot, lack, not, no, add, embed, attach, inner, non-, dis-, un-, without,
into, contain, …
mis
wrong, incorrect, error, mistake, wrongly, fail, failure...
non
not, no, without, dis-, un-, in-
over
overly, beyond, too much, too , excessive, large …
re
new, again, return, change, changing, changed, anew different, differently,
alter, altering, do over, newly …
un
lack, lacking, not, no, opposite, dis-, without, cancel, reverse, remove
Project LISTEN
Carnegie
Mellon
Initial letters:
How semantically reliable are they?
Numbers range from ~5-50%, shockingly low:
Prefix
Positive
Example
Negative
Example
9 prefixes
LISTEN
(Kids)
ANC
(Adults)
34.37%
18.04%
dis
displeased
distance
11.86%
4.85%
en
encourage
enough
22.01%
5.78%
in
immigrate
illegal
innocent
illness
51.8%
22.04%
mis
misspell
mister
20%
16.72%
non
nonfat
(none)
100% (1/1)
12.97%
over
overgrow
overtly
17.24%
15.78%
re
repaint
really
16.6%
10.57%
un
unnecessary
unite
54.79%
36.26%
Project LISTEN
12
3/22/2016
Carnegie
Mellon
Outline
What’s a prefix?
Linguistically
Instructionally
For this talk
How reliable are prefixes as cues to meaning?
What is the effect of prefixes on reading times?
Project LISTEN
13
3/22/2016
Carnegie
Mellon
What is the effect of prefixes on
reading time?
Compare reading time (letters per second)
on reliable vs. not reliable words
Materials
Best case: head and tail both cues to meaning
unnatural
Worst case: neither head nor tail cues to
meaning
uncle
Next two slides we’ll detail best and worst case
Project LISTEN
14
3/22/2016
Carnegie
Mellon
Head is cue?: Already discussed
Tail is cue?: Two questions enough
Is the remainder a word?
Rule out: infidel, distortion, …
Are the remainder of the letters
an antonym of the original word?
(only relevant for negative prefixes)
Rule in: unjustly (defined as unjust manner)
since justly is antonym of unjustly
Project LISTEN
15
3/22/2016
Carnegie
Mellon
Best, worst, in between
Only 28.85%* – 37.39%** of words with one of the nine head strings are prefixed words!
Example
Initial letters
are cue to
meaning
Rest of
letters are a
word
Rest of letters an
antonym of the
original word
Type
percentage in
LISTEN data
unnatural
Y
Y
Y
12.59%*
unseemly
Y
Y
N
5.79%*
recount
Y
Y
N/A
5.33%*
untruth (false statement)
N
Y
Y
5.14% *
infidel
Y
N
N
8.54%**
-
Y
N
Y
Not possible
repeating
Y
N
N/A
2.11%
discuss
N
Y
N
11.12%
research
N
Y
N/A
13.6%
-
N
N
Y
Not possible
uncle
N
N
N
15.9%
remedy
N
N
N/A
19.85%
Project LISTEN
16
3/22/2016
Carnegie
Mellon
Measures
Reading times (milliseconds / letter)
Data was logged by the Reading Tutor,
an automated tutor that uses automatic speech
recognition to listen to children read aloud
Words were displayed in authentic contexts –
complete sentences in children’s texts
Children read aloud from modern and antebellum texts
into a microphone – a bulbous flange, sold in a
blister pack, whose noise cancellation serves as a
talisman against speech recognition errors
Compare best case vs. worst case: unnatural vs. uncle
Project LISTEN
17
3/22/2016
Carnegie
Mellon
What is the effect of prefixes on
reading times? Predictions:
For students who don’t read very well
whether the word is best case or worst case
shouldn’t matter
Prefixes should help better readers
That is, for students at higher reading levels,
reading times should be faster for best case
words than for worst case words
Project LISTEN
18
3/22/2016
Carnegie
Mellon
Results
Reading times were slower for best-case words
than for worst-case words by 18.6 msec (19%)
N
mean
95% c.i.
encounters
Best-case 8013
97.1 msec 0.956
unnatural
Worst-case 3783
115.7 msec 1.756
uncle
Project LISTEN
19
3/22/2016
Carnegie
Mellon
Due to practice, length, frequency?
N
Project LISTEN
20
mean
95% c.i.
3/22/2016
Carnegie
Mellon
Due to practice, length, frequency?
N
Project LISTEN
21
mean
95% c.i.
3/22/2016
Carnegie
Mellon
Due to practice, length, frequency?
No. Reading times were slower for best-case
for first encounters by 17.4 msec (17%):
Practice:
First encounters only
Project LISTEN
N
mean
95% c.i.
best
2863
103.3
1.731
worst
1416
120.7
3.210
22
3/22/2016
Carnegie
Mellon
Due to practice, length, frequency?
N
Project LISTEN
23
mean
95% c.i.
3/22/2016
Carnegie
Mellon
Due to practice, length, frequency?
No. Reading times were still slower for best-case
for matched length range by 27.0 msec (27%):
Word length:
> 5 & < 8 letters
Project LISTEN
N
mean
95% c.i.
best
4063
98.3
1.382
worst
2625
125.3
2.189
24
3/22/2016
Carnegie
Mellon
Due to practice, length, frequency?
N
Project LISTEN
25
mean
95% c.i.
3/22/2016
Carnegie
Mellon
Due to practice, length, frequency?
No. Reading times were still slower for best-case
for matched freq. range by 28.4 msec (30%)
Frequency (SUBTLEX):
best
> 10 & < 500 / million worst
Project LISTEN
26
N
mean
95% c.i.
5838
93.5
1.090
2515
121.9
2.216
3/22/2016
Carnegie
Mellon
Summary:
Not due to practice, length, frequency
Reading times were still slower for best-case
when looking at various subsets:
Practice:
First encounters only
Word length:
> 5 & < 8 letters
Frequency (SUBTLEX):
N
mean
95% c.i.
best
2863
103.3
1.731
worst
1416
120.7
3.210
best
4063
98.3
1.382
worst
2625
125.3
2.189
best
5838
93.5
1.090
2515
121.9
2.216
> 10 & < 500 / million worst
Project LISTEN
27
3/22/2016
Carnegie
Mellon
Not due to practice, length, frequency
when looking at all 3 combined
Reading times were still slower for best-case
than for worst-case words by 48.8 msec (51%)
N
mean
encounters
Best-case 890
96.3
unnatural
Worst-case 507
145.1
uncle
Project LISTEN
28
95% c.i.
3.009
5.679
3/22/2016
Carnegie
Mellon
Project LISTEN
Students had different numbers of
encounters. Was that it?
29
3/22/2016
Carnegie
Mellon
Students had different numbers of
encounters. Was that it? No.
Per-student average differs by 19.8 msec (18%)
p < 0.001
Project LISTEN
30
3/22/2016
Carnegie
Mellon
Filtering by frequency (LISTEN)
yields similar results
Per-student average differs by 21.1 msec (19%)
p < 0.001
Project LISTEN
31
3/22/2016
Carnegie
Mellon
What was the effect by reading level?
Prediction:
effect for higher level readers,
no effect for lower level readers
Project LISTEN
32
3/22/2016
Carnegie
Mellon
Best case slower across reading levels!
(Frequency in LISTEN corpus)
Sig. ?
yes
Project LISTEN
yes almost no
33
no
yes
no
3/22/2016
Best case slower across reading levels!
(Frequency in SUBTLEX)
Carnegie
Mellon
Best case slower for more students, p = 0.023
K
A
B
Best case slower
No data
3
20
Worst case slower
No data
0
9
C
D
E
10
12
19
6
5
10
F
G
28
2
14
5
Project LISTEN
34
3/22/2016
Carnegie
Mellon
Potential explanation(s)
Neighborhood effects? encourage --- entourage
Context?
Competition with tail: disagree vs. agree?
Competition with head: disagree vs. dis-?
Processing: dis+agree takes more steps than distance
At least some of these explanations rely on
reading time being affected by sublexical structure.
Project LISTEN
35
3/22/2016
Carnegie
Mellon
What about neighborhood effects?
Currently investigating. Sample:
Reliable
1-away N.
Unreliable
displeased
1
distance
0
encourage
1 (entourage)
enough
0
immigrate
illegal
0
0
innocent
illness
0
0
misspell
1
mister
4
nonfat
6
(none)
-
overgrow
0
overtly
1
repaint
2
really
1
unnecessary
0
unite
2
Medler, D.A. & Binder, J.R. (2005) MCWord: An On-Line Orthographic Database
of the English Language. http://neuro.mcw.edu/mcword
Project LISTEN
36
3/22/2016
Carnegie
Mellon
Conclusions
Initial letter sequences (heads) aren’t all that reliable as
cues to meaning
Yet reading times appear to be sensitive to
real vs. fake prefix, even for low reading levels
Cliffhanger: Does this sensitivity provide a hint that we
could teach prefixes earlier?
Announcements:
Gregory Aist joins Iowa State faculty in fall 2009
and co-founds journal, Dialogue and Discourse ,
on “language beyond the single sentence” launching
summer 2009: www.dialogue-and-discourse.org
Project LISTEN
37
3/22/2016
Carnegie
Mellon
Thank you
Project LISTEN
38
3/22/2016
Carnegie
Mellon
Project LISTEN
39
3/22/2016
Carnegie
Mellon
Initial letters:
How good is the operationalization?
Sample of 100 Project LISTEN words
that are also in WordNet
Project LISTEN
Positive
Negative
True
30
unnecessary
43
refuge
73
False
11
relate
16
unjustly
27
41
59
100
40
3/22/2016
Carnegie
Mellon
Project LISTEN’s Reading Tutor
An automated tutor that helps
children learn to read
• See www.cs.cmu.edu/~listen
• Displays stories and listens to
children read them aloud
• Provides help when necessary
• Uses automatic speech recognition to analyze oral reading
• Logs sessions in detail, including speech recognizer output
• Millions of read words in the aggregated database
Project LISTEN
41
Download