Linguistic Credibility Assessment - U

advertisement
Linguistic Credibility Assessment
Linguistic Credibility Assessment
• Emma – general comments on language
• Matt – tools for linguistic analysis
• Mary – case study
The Federalist Papers
• Series of 85 short essays
– urged ratification of US Constitution
• Pseudonymously published, most were
eventually claimed
– Alexander Hamilton
– James Madison
– John Jay
• 12 remain of disputed authorship
– Presumed to be by Madison or Hamilton
Automated Text Analysis
• Fung (2003)
• Classification problem, using SVM
• Used relative frequency of 70 most common
words as features
70 most common words
Classification
• Used machine learning to find 3 features to
best separate Madison & Hamilton in
documents with known author
– to, upon, would
• Plot the 12 unknown documents along those 3
dimensions
Fung’s results:
Disputed papers were by Madison
Language
• Complex system of communication unique to
humanity
– used for expressing thoughts
– systematic
– flexible
• allows for infinite combinations
• multiple ways to convey the same idea
– not completely predictable
Patterns in
Language & Language Use
• We make use of patterns in language for our
purposes of communication
– e.g. statement vs. question
• Mary sang at the concert.
• Did Mary sing at the concert?
• Mary sang at the concert?
– e.g. Word order in conversation vs. poetry
• Soldiers brave were on the march.
• This information is used to classify types of
language usage
– e.g. genre, style, dialect, etc
Similarities & Differences
• What are factors that affect how language is
used?
– language in use (or dialect)
– culture, social identity
– situation
• purpose, topic domain, genre, social relationship between
speakers, conversation type, etc
– medium
• oral: in-person, by phone
• written: letter, chat, texts, financial documents
– deceptive or truthful
From Theory to Cue
• Use theoretical predictions as basis for
selecting cues to explore
• 5 domains
– Arousal: e.g. expect quick rate of speech
– Emotion: e.g. (for nervousness) expect more stuttering
– Memory: e.g. expect fewer descriptive words
– Cognition: e.g. expect less complex sentences
– Communication: e.g. less likely to admit forgotten information
Manual Coding Systems
• Content-Based Criteria Analysis (CBCA) & Statement
Validity Analysis (SVA)
– Assumes statements derived from real memories will differ
from invented ones in both content and quality
– Score statements on the presence or absence of 19 criteria
• Reality Monitoring (RM)
– Truthful memories are more likely to contain perceptual,
contextual, & affective information
• Scientific Content Analysis (SCAN)
– Used in criminal investigation statements
Some generalized linguistic cues
(from DePaulo 2003)

Less forthcoming than truth tellers


Tell stories that are less plausible


Less cooperative, use more negative statements, words denoting anger & fear, offensive
language, smile less, seem more defensive
Are more tense


More discrepancies, less engaging (more repetitions), behavior is less immediate (more
indirect, fewer self-references), more uncertain, less fluent (more hesitations, errors,
pauses), less active (gestures). Want story to be without error (fewer spontaneous
corrections, less likely to admit can’t remember detail)
Make a more negative impression


Respond less (shorter responses, less elaboration), seem to hold back, speak at slower rate,
longer response latency (less if planned)
Higher pitch, fidget more, pupils dilated for longer periods
BUT! Remember cues are affected by context.
Research aims of CMI
• Discovering linguistic cues that are
– reliable indicators of deception (or truthfulness)
– context-independent as possible
• Note: Beware unrealistic claims of accuracy in detection
– e.g. Cain’s Innocence “proved” through LVA
– Consider the perspective & intentions:
•
•
•
•
Researcher
User’s understanding
Business’ marketing
Politician’s results-reporting
– risk assessment vs. authoritative decision-making
Download