Are all questions created equal?: Factors that influence cloze question difficulty.

advertisement
Are all questions created equal?:
Factors that influence cloze
question difficulty.
Brooke Soden Hensler
Carnegie Mellon University
(starting graduate school at
Florida Center for Reading Research this Fall)
Joseph E. Beck
Carnegie Mellon University
Society for the Scientific Study of Reading – July 2006
1
Funding: National Science Foundation
Why Look at Multiple Choice
Cloze Questions?

Multiple Choice Cloze are widely used
assessments of comprehension

Problem: outcome measure is typically binary
(little information about student).

Goal: use multiple choice cloze questions to…



More accurately assess students
Track student reading development
Better understand what makes cloze questions
hard
2
Project LISTEN’s
Computer Reading Tutor
(Mostow & Aist, 2001)

Automated

Students use throughout year

Accompanying paper standardized test
scores (pre & post)
3
Student is reading a story
aloud to the Reading Tutor…
4
A question appears…
*Reading Tutor reads both Question and Response Choices.
(Mostow, et al., 2004)
5
Student resumes reading story
aloud to the Reading Tutor…
6
Reading Tutor Advantages

Well-specified & unbiased question
construction (randomly generated)

Questions automatically administered,
scored, & recorded

Longitudinal collection over school year

Large N (students & questions)
7
How many Q’s from Whom?
Data Description

81,175 Questions

1042 Students

11 = Median number of questions answered

(Many students infrequent users of tutor)

2001-02 & 2002-03 School years

Diverse population in Pittsburgh area
8
Research Questions

Is a particular part of speech (e.g., nouns,
verbs, etc.) more difficult for students?

If nouns are learned first (Gentner, 1982; Golinkoff, et
al., 2000), might students be more proficient at
answering noun questions?

Which factors influence question difficulty?

How can we better assess students using
multiple choice cloze questions?

Vocabulary researchers have given partial credit for
correct part of speech (e.g., Schwanenflugel, et al.,
1997)
9
Approach

Build logistic regression model to predict
individual question performance


Terms in model: student identity, part of speech
of answer, properties of question (e.g., question
length)
Advantages of modeling approach


Simultaneously estimates impact of question
properties and student proficiency on question
performance
Makes use of all ~80k questions
10
Effect of Parts of Speech
Nouns
<
Verbs
(p < 0.001)
<
Adjectives
(p < 0.001)
<
Adverbs
(p < 0.05)
11
Effect of Parts of Speech
Nouns
<
Verbs
(p < 0.001)
easier
<
Adjectives
(p < 0.001)
<
Adverbs
(p < 0.05)
harder
12
Impact of other
Part of Speech terms
Most Common
Part of Speech
# of Choices
with Answer’s POS
Difficulty


Significance
p < 0.01
p < 0.001
“Sally had to _______ her lips when she heard the news.”
(cloud, purse, holds, magnificent)
“Henry read his _______ under the tree.”
(cup, dog, book, hair)
13
Impact of other
Part of Speech terms
Most Common
Part of Speech
# of Choices
with Answer’s POS
Difficulty


Significance
p < 0.01
p < 0.001
“Henry read his _______ under the tree.”
(cup, dog, book, hair)
 more common POS = easier
“Sally had to _______ her lips when she heard the news.”
(lamp, purse, beautiful,
magnificent)
 less common
POS = harder
14
Impact of other
Part of Speech terms
Most Common
Part of Speech
# of Choices
with Answer’s POS
Difficulty


Significance
p < 0.01
p < 0.001
(noun)
“Henry read his _______
under the tree.”
(cup, dog, book, hair)  more choices with correct POS = harder
(verb)
“Sally had to _______
her lips when she heard the news.”
(lamp, purse, beautiful, magnificent)  fewer choices = easier
15
with correct POS
Impact of other terms
Question
Length
Deletion
Location
Difficulty


Significance
p < 0.001
p < 0.001
“We can _______ the stars in the sky despite the
bright city lights around us.”
(at, with, most, see)
“They rode their _______ .”
(farmer, bikes, play, blue)
16
Impact of other terms
Question
Length
Deletion
Location
Difficulty


Significance
p < 0.001
p < 0.001
“We can _______ the stars in the sky despite the
bright city lights around us.”
 longer = harder
(at, with, most, see)
“They rode their _______ .”
(farmer, bikes, play, blue)
 shorter = easier
17
Impact of other terms
Question
Length
Deletion
Location
Difficulty


Significance
p < 0.001
p < 0.001
“We can _______ the stars in the sky despite the
bright city lights around us.”
 blank earlier = harder
(at, with, most, see)
“They rode their _______ .”
 blank later = easier
(farmer, bikes, play, blue)
18
Using model to assess student
reading comprehension

Model estimates Beta parameter for each student



Compare Beta vs. percent correct for predicting
WRMT comprehension composite*




Represents how well student did at answering cloze
questions (controlling for difficulty factors)
Should correlate with external comprehension measure
Student Beta: r = .644, p < .001
Percent correct: r = .507, p < .001
Reliability of difference in correlations, p < .01
Also provides check on validity of regression model
19
*N = 465, 1 extreme outlier was eliminated from analyses.
Conclusions

Length of question, location of deleted word,
and part of speech of correct answer affect
question difficulty.

Logistic regression is a strong choice for
analyzing cloze data.

Multiple-choice cloze questions can assess a
student at a more accurate level than current
practice.
20
Questions?

Nominated for Best Paper Award:
Soden Hensler, B., Beck, J. E. (2006). Better student assessing by
finding difficulty factors in a fully automated comprehension
measure. Intelligent Tutoring Systems.

Brooke Soden Hensler
bsodenhensler@gmail.com

Joseph E. Beck
joseph.beck@gmail.com

Project LISTEN & The Reading Tutor
http://www.cs.cmu.edu/~listen/
21
References





Gentner, D. (1981). Some interesting differences between verbs and nouns.
Cognition and Brain Theory, 4(2).
Golinkoff, R.M., Hirsh-Pasek, K., Bloom, L., Smith, L. B., Woodward, A. L.,
Akhtar, N., Tomasello, M., & Hollich, G. (2000). Becoming a word learner: A
debate on lexical acquisition. New York: Oxford University Press.
Mostow, J. & Aist, G. (2001). Evaluating tutors that listen: An overview of
Project LISTEN. In K. Forbus & P. Feltovich (Eds.), Smart Machines in
Education (169 - 234) Menlo Park, CA: MIT/AAAI Press.
Mostow, J., Beck, J. E., Bey, J., Cuneo, A., Sison, J., Tobin, B. & Valeri, J.
(2004). Using automated questions to assess reading comprehension,
vocabulary, and effects of tutorial interventions. Technology, Instruction,
Cognition and Learning, 2, p. 97-134
Schwanenflugel, P.J., Stahl, S. A., & McFalls, E. L. (1997). Partial word
knowledge and vocabulary growth during reading comprehension. Journal
of Literacy Research, 29(4).
22
Additional Slides
23
Terms in Model
Factors
Description of Term
Part of Speech
Simplified part of speech classification of the correct answer as
Noun, Verb, Adjective, Adverb, or Function Word.
Most Common Part
of Speech
Whether or not the correct answer’s POS is the most common
POS the word could take on.
POS Confusability
The number of POS the correct answer can take on.
Level of Difficulty
4 Levels of Difficulty based on frequency in English or special
annotation.
Student Identity
Unique Identification for each student.
Covariates
Question Length
Number of characters of the cloze question and the corresponding
response choices.
Deletion Location
Proportion of the sentence that is before the blank (location of
word deletion).
# Choices with
Answer's POS
Probability that the student could have answered the question
using only part of speech information.
24
Likelihood of answering question
correctly
Developmental Trends in
Learning Parts of Speech
Nouns
Verbs
Adjectives
Adverbs
Function Words
<=2
2…3
3…4
Reading Proficiency
4…5
>=5
25
Likelihood of answering question
correctly
Developmental Trends in
Learning Parts of Speech
p = .52
p = .64
Nouns
Verbs
p = .99
p = .71
p < .001
<=2
2…3
3…4
4…5
>=5
Reading Proficiency
26
Syntactic Awareness
0.1
p = .73
Relative Impact
0
-0.1
p = .48
p = .01
-0.2
Impact of #
POS word
can take on
p = .02
-0.3
-0.4
-0.5
p < .001
-0.6
<=2
2…3
3…4
4…5
>=5
Reading Proficiency
27
Effect of Part of Speech
*Interpretation: positive Beta means student is more likely to
answer question correctly
Function
Words
Part of
Speech
Noun
Beta
0.39
0.29
0.19
0.12
(comparison
point)
p < .001
p < .001
p < .001
p < .001
---
Significance
<
Verb
< Adjective < Adverb <
28
Download