Introduction
Classification based on function role in classroom instruction
Placement assessment: administered at the beginning of instruction
Formative assessment: monitor learning progress during instruction
Diagnostic assessment: diagnose learning difficulties during instruction
Summative assessment: assess achievement at the end of instruction
How the results of tests and assessment are interpreted?
Norm referenced: performance in terms of relative position in a known group
Criteria referenced: specific performance criteria
(type 40 word/min without error)
Fixed-Choice/ Complex Performance assessment
Fixed-choice Short answer Essay Complex-performance
• Factual knowledge
• Low level skills (recall)
• Objective assessment
• Highly reliable
• Critical thinking skills
• May extend beyond classroom
• Inferential skills
• Subjective assessment
Essay type questions
Freedom of response
▪ Free to construct, relate and present ideas in own words
Assess higher order skills
▪ Critical thinking
Freedom in the cost of
▪ reliability in scoring
▪ time for evaluation
Prompt of an essay
a topic around which you start jotting down ideas.
single word, a short phrase, a complete paragraph or even a picture
Trait of essay
Characteristics of essay on which it is evaluated
Scoring rubrics depend on traits
Ideas or content
Organization
Voice
Word choice
Sentence fluency
“the process of evaluating and scoring written prose via computer programs”
NLP has helped to go beyond numeric scoring to qualitative feedback
Multi-disciplinary
AEE/AES systems
PEG
E-rater
Intelligent Essay Assessor
C-rater
Commercial AES by Education Testing Services
(ETS), 1999
Employed in high stake assessment in Graduate
Management Admission Test (GMAT)
Shown to agree with expert raters
Scoring depend on tangible markers related to writing constructs
Organization and development of ideas
Variation in syntactic constructs
Vocabulary usage
Technical correctness in terms of grammar, usage and mechanics
Grammatical errors
Automatic grammatical error detection
Article and preposition errors
Discourse structure and organization
Rhetorical Structure Theory motivated features
Topic relevant word usage
Content Vector Analysis (CVA)
Style-related word usage
Overly repetitious word usage
Grammatical error detection
Rule-based approach
▪ Rules are defined over syntactic parse
Statistical approach
▪ Word n-gram and POS n-grams
Discourse analysis
Linear representation of essay sentences
Segment essay into
▪ Introductory material
▪ Thesis statement
▪ Main ideas
▪ Supporting ideas
▪ Conclusion
Content Vector Analysis (CVA)
Higher grade
≈
Essay to be graded
≈
Lower grade
Higher quality essay
Lower quality essay
Collocation detection
To test proper usage of word that depend on other words
Collocation patterns
▪ Noun-of-noun (swarm of bees)
▪ Adjective+noun (strong tea)
▪ Noun+noun (house arrest)
Model is trained with human-scored essays
Training
Converting essay to vector of linguistic features
Learning of weights through regression
Different models
Topic-specific model
▪ Training is done by drawing human scored essays on a given topic
Generic model
▪ Topic agnostic
Hybrid model
▪ Some feature weights are trained on generic essays while others are from prompt-specific essays.
Commercial AES by Pearson Knowledge
Technologies, 1998
Features
Automated scoring and feedback of paragraphs
Grading summary writing to improve reading comprehension
Performance task scoring
Short answer scoring for students
Essay coherence
Topic development
N-gram features
Inter-sentence coherence
Style,
Organization,
Development
Grammatical errors
Grammar
Word
Maturity
Word
Variety Confusable
Word
Lexical
Sophistication LSA
Similarity
Essay
Score
Content
Vector
Length
Mechanics
Spelling
Punctuation
Capitalization
Short answers are not short essays
Evaluation of essays focuses on traits like grammar, style, vocabulary, organization etc.
▪ Computational syntax and stylistics
Evaluation of short answers emphasizes on content
▪ Computational semantics
Short answers are harder to evaluate
Smaller amount of exploitable information
C-rater by ETS
Grades free-text responses with length ranging from a single word, phrase or 4-5 sentences
Supports both summative and formative assessment
Perform well for test that solicit specific information from student
Perform poor for open-ended task
Model of correct answer provided by the content expert
C-rater goal
Student response model
Model is manual but mapping a automatic
The difficulty
The question is designed to elicit from students one or more concepts that constitute the correct answer
There are several no of ways that a concept can be realized in natural language
The solution
correct responses are paraphrases of the model answer
Try to model human graders with following normalization
Syntactic variation
Pronoun reference
Morphological variation
Synonymous words
Typographical and spelling errors
Content assessment
Content Vector Analysis
▪ Vector space model
Semantics based assessment
▪ Latent Semantic Analysis
Meaning/Concept assessment
Paraphrasing and textual entailment
Organizational assessment
Argument structure mining
Discourse structure analysis