Automated Essay Grading

advertisement

Introduction

 Classification based on function role in classroom instruction

 Placement assessment: administered at the beginning of instruction

 Formative assessment: monitor learning progress during instruction

 Diagnostic assessment: diagnose learning difficulties during instruction

 Summative assessment: assess achievement at the end of instruction

 How the results of tests and assessment are interpreted?

 Norm referenced: performance in terms of relative position in a known group

 Criteria referenced: specific performance criteria

(type 40 word/min without error)

 Fixed-Choice/ Complex Performance assessment

Fixed-choice Short answer Essay Complex-performance

• Factual knowledge

• Low level skills (recall)

• Objective assessment

• Highly reliable

• Critical thinking skills

• May extend beyond classroom

• Inferential skills

• Subjective assessment

 Essay type questions

 Freedom of response

▪ Free to construct, relate and present ideas in own words

 Assess higher order skills

▪ Critical thinking

 Freedom in the cost of

▪ reliability in scoring

▪ time for evaluation

Prompt of an essay

 a topic around which you start jotting down ideas.

 single word, a short phrase, a complete paragraph or even a picture

Trait of essay

 Characteristics of essay on which it is evaluated

 Scoring rubrics depend on traits

Ideas or content

Organization

Voice

Word choice

Sentence fluency

 “the process of evaluating and scoring written prose via computer programs”

 NLP has helped to go beyond numeric scoring to qualitative feedback

Multi-disciplinary

AEE/AES systems

 PEG

 E-rater

 Intelligent Essay Assessor

 C-rater

Commercial AES by Education Testing Services

(ETS), 1999

Employed in high stake assessment in Graduate

Management Admission Test (GMAT)

Shown to agree with expert raters

Scoring depend on tangible markers related to writing constructs

 Organization and development of ideas

 Variation in syntactic constructs

 Vocabulary usage

 Technical correctness in terms of grammar, usage and mechanics

Grammatical errors

 Automatic grammatical error detection

 Article and preposition errors

Discourse structure and organization

 Rhetorical Structure Theory motivated features

Topic relevant word usage

 Content Vector Analysis (CVA)

Style-related word usage

 Overly repetitious word usage

Grammatical error detection

 Rule-based approach

▪ Rules are defined over syntactic parse

 Statistical approach

▪ Word n-gram and POS n-grams

Discourse analysis

 Linear representation of essay sentences

 Segment essay into

▪ Introductory material

▪ Thesis statement

▪ Main ideas

▪ Supporting ideas

▪ Conclusion

 Content Vector Analysis (CVA)

Higher grade

Essay to be graded

Lower grade

Higher quality essay

Lower quality essay

 Collocation detection

 To test proper usage of word that depend on other words

 Collocation patterns

▪ Noun-of-noun (swarm of bees)

▪ Adjective+noun (strong tea)

▪ Noun+noun (house arrest)

Model is trained with human-scored essays

Training

 Converting essay to vector of linguistic features

 Learning of weights through regression

Different models

 Topic-specific model

▪ Training is done by drawing human scored essays on a given topic

 Generic model

▪ Topic agnostic

 Hybrid model

▪ Some feature weights are trained on generic essays while others are from prompt-specific essays.

Commercial AES by Pearson Knowledge

Technologies, 1998

Features

 Automated scoring and feedback of paragraphs

 Grading summary writing to improve reading comprehension

 Performance task scoring

 Short answer scoring for students

Essay coherence

Topic development

N-gram features

Inter-sentence coherence

Style,

Organization,

Development

Grammatical errors

Grammar

Word

Maturity

Word

Variety Confusable

Word

Lexical

Sophistication LSA

Similarity

Essay

Score

Content

Vector

Length

Mechanics

Spelling

Punctuation

Capitalization

Short answers are not short essays

 Evaluation of essays focuses on traits like grammar, style, vocabulary, organization etc.

▪ Computational syntax and stylistics

 Evaluation of short answers emphasizes on content

▪ Computational semantics

Short answers are harder to evaluate

 Smaller amount of exploitable information

 C-rater by ETS

 Grades free-text responses with length ranging from a single word, phrase or 4-5 sentences

 Supports both summative and formative assessment

 Perform well for test that solicit specific information from student

 Perform poor for open-ended task

Model of correct answer provided by the content expert

C-rater goal

 Student response  model

Model is manual but mapping a automatic

The difficulty

 The question is designed to elicit from students one or more concepts that constitute the correct answer

 There are several no of ways that a concept can be realized in natural language

The solution

 correct responses are paraphrases of the model answer

 Try to model human graders with following normalization

 Syntactic variation

 Pronoun reference

 Morphological variation

 Synonymous words

 Typographical and spelling errors

Content assessment

 Content Vector Analysis

▪ Vector space model

 Semantics based assessment

▪ Latent Semantic Analysis

Meaning/Concept assessment

 Paraphrasing and textual entailment

Organizational assessment

 Argument structure mining

 Discourse structure analysis

Download