Linguistic Credibility Assessment Linguistic Credibility Assessment • Emma – general comments on language • Matt – tools for linguistic analysis • Mary – case study The Federalist Papers • Series of 85 short essays – urged ratification of US Constitution • Pseudonymously published, most were eventually claimed – Alexander Hamilton – James Madison – John Jay • 12 remain of disputed authorship – Presumed to be by Madison or Hamilton Automated Text Analysis • Fung (2003) • Classification problem, using SVM • Used relative frequency of 70 most common words as features 70 most common words Classification • Used machine learning to find 3 features to best separate Madison & Hamilton in documents with known author – to, upon, would • Plot the 12 unknown documents along those 3 dimensions Fung’s results: Disputed papers were by Madison Language • Complex system of communication unique to humanity – used for expressing thoughts – systematic – flexible • allows for infinite combinations • multiple ways to convey the same idea – not completely predictable Patterns in Language & Language Use • We make use of patterns in language for our purposes of communication – e.g. statement vs. question • Mary sang at the concert. • Did Mary sing at the concert? • Mary sang at the concert? – e.g. Word order in conversation vs. poetry • Soldiers brave were on the march. • This information is used to classify types of language usage – e.g. genre, style, dialect, etc Similarities & Differences • What are factors that affect how language is used? – language in use (or dialect) – culture, social identity – situation • purpose, topic domain, genre, social relationship between speakers, conversation type, etc – medium • oral: in-person, by phone • written: letter, chat, texts, financial documents – deceptive or truthful From Theory to Cue • Use theoretical predictions as basis for selecting cues to explore • 5 domains – Arousal: e.g. expect quick rate of speech – Emotion: e.g. (for nervousness) expect more stuttering – Memory: e.g. expect fewer descriptive words – Cognition: e.g. expect less complex sentences – Communication: e.g. less likely to admit forgotten information Manual Coding Systems • Content-Based Criteria Analysis (CBCA) & Statement Validity Analysis (SVA) – Assumes statements derived from real memories will differ from invented ones in both content and quality – Score statements on the presence or absence of 19 criteria • Reality Monitoring (RM) – Truthful memories are more likely to contain perceptual, contextual, & affective information • Scientific Content Analysis (SCAN) – Used in criminal investigation statements Some generalized linguistic cues (from DePaulo 2003) Less forthcoming than truth tellers Tell stories that are less plausible Less cooperative, use more negative statements, words denoting anger & fear, offensive language, smile less, seem more defensive Are more tense More discrepancies, less engaging (more repetitions), behavior is less immediate (more indirect, fewer self-references), more uncertain, less fluent (more hesitations, errors, pauses), less active (gestures). Want story to be without error (fewer spontaneous corrections, less likely to admit can’t remember detail) Make a more negative impression Respond less (shorter responses, less elaboration), seem to hold back, speak at slower rate, longer response latency (less if planned) Higher pitch, fidget more, pupils dilated for longer periods BUT! Remember cues are affected by context. Research aims of CMI • Discovering linguistic cues that are – reliable indicators of deception (or truthfulness) – context-independent as possible • Note: Beware unrealistic claims of accuracy in detection – e.g. Cain’s Innocence “proved” through LVA – Consider the perspective & intentions: • • • • Researcher User’s understanding Business’ marketing Politician’s results-reporting – risk assessment vs. authoritative decision-making