Analysis of Spontaneous Speech in Dementia of
Alzheimer Type: Experiments with Morphological and Lexical Analysis
Nick Cercone
Vlado Keselj
Calvin Thomas
Kenneth Rockwood
Medicine, Dalhousie University
Computer Science
Dalhousie University
Elissa Asp
English Deparment
Saint Mary
PUL Workshop, Dalhousie University, Halifax, 23 Apr 2004
’ s University
1
Overview
Introduction
Related work: Bucks et al, authorship attribution
CNG discrimination Pt/other
rating dementia levels
use of attribute sets: MA-A, MA-B
CNG and Ordinal CNG
Conclusion
2
Introduction
Effects of the Alzheimer ’s disease (AD)
reduced communicative ability
deterioration of linguistic performance
Can we detect it?
Current methods rely on structured interviews
confrontation naming
single word production
word generation given context
word generation given first letter
picture description
3
Analysis of spontaneous speech
drawbacks of structured interviews:
sometimes insensitive to early signs of dementia observed by family
low scores are not reliable unless difficulty is observed in natural conversation brake “natural speech” into components
subjective, i.e., designed by a researcher
alternative solution: objective automatic analysis of spontaneous, i.e., natural, speech
4
Speech characteristics in Dementia of
Alzheimer Type (DAT)
frequent use of functional words (closed class)
less rich vocabulary
difficulty with constructing longer coherent phrases
more difficulties at lexical and morphological level than phonetic and syntactic levels
5
Related work: Bucks et al. (BSCW)
Bucks, Singh, Cuerden, Wilcock 2000, 2001:
Analysis of spontaneous conversational speech in dementia of Alzheimer type (DAT)
use eight linguistic measures to analyze transcribed spontaneous speech:
1) noun rate
2) pronoun rate
5) clause-like semantic unit rate
(CSU)
6) Brunet ’s index (W) 3) verb rate
4) adjective rate 7) token type ratio (TTR)
8) Honore ’s statistic (R)
6
Bucks et al.: Experiment design
• experiment with 24 participants:
• 8 patients and 16 healthy individuals
• discriminating between demented and healthy individuals:
• 100% on training data
• 87.5% with cross-validation
7
Related work: Automated authorship attribution
Problem of identifying the author of an anonymous text
One of Text Categorization Problems
.
Spam detection
.
Language and encoding identification
.
Authorship attribution and plagiarism detection
.
Text genre classification
.
Topic detection
.
Sentiment classification
8
Related work (authorship attribution)
1.
style analysis
using style markers (features)
relying on non-trivial NL analysis
Stamatatos et al. 2000-02
2.
language modeling
Peng et al. 2003, EACL ’ 03
Khmelev and Teahan 2003, SIGIR ’ 03
3.
N-gram-based text categorization
Cavnar and Trenkle 1994
9
Shortcomings of style analysis
• difficult to automatically extract some features
• feature selection is critical
• language dependent
• task dependent, i.e., does not generalize well to other types of classification
10
Character N-gram -based Methods
Text can be considered as a concatenated sequence of characters instead of words.
Advantages
1. small vocabulary
2. language independence
3. no word segmentation problems in many
Asian languages such as Chinese and Thai
11
How do character n-grams work?
Marley was dead: to begin with. There is no doubt whatever about that. …
(from Christmas Carol by Charles Dickens) n = 3
Mar arl
_th 0.015
___ 0.013
L=5 rle ley ey_ y_w sort by frequency the 0.013
he_ 0.011
and 0.007
_an 0.007
_wa was
… nd_ 0.007
ed_ 0.006
12
How do we compare two profiles?
Dickens: A Tale of Two Cities
Dickens: Christmas Carol
_th 0.015
___ 0.013
the 0.013
he_ 0.011
and 0.007
?
_th 0.016
the 0.014
he_ 0.012
and 0.007
nd_ 0.007
?
Carroll: Alice’s adventures in wonderland
_th 0.017
___ 0.017
the 0.014
he_ 0.014
ing 0.007
13
N-gram distribution
(From Dickens: Christmas Carol)
5.00E-03
4.50E-03
4.00E-03
3.50E-03
3.00E-03
2.50E-03
2.00E-03
1.50E-03
1.00E-03
5.00E-04
0.00E+00
1 4 7
10 13 16 19 22 25 28 31 34
6-grams
14
CNG profile similarity measure
• a profile = the set of
L the most frequent n-grams
• profile dissimilarity measure:
n
profile
f
1
( n ) f
1
( n )
2 f
2
( n ) f
2
( n )
2
n
profile
2
( f
1 f
1
( n )
( n )
f
2 f
2
(
( n ) n ))
2 weight
15
Authorship Attribution Evaluation
60
50
40
30
20
10
0
100
90
80
70
English Greek A Greek B Greek B+ Chinese
Style
Lang. M
CNG
16
ACADIE Data Set
• 189 GAS interviews (Goal Attainment Scaling)
• 95 patients (2 interviews per patient, except 1 patient)
• 6 sites; 17 MB of data (3.2 million words)
• interview participants:
• FR – field researcher
• Pt – patient
• Cg – caregiver
• other people
17
Experiment set-up
• preprocessing
• patients divided into two groups
• 85 training group (169 interviews)
• 10 testing group (20 interviews)
• patient speech in training group is used to build
Alzheimer profile
• non-patient speech in training group is used to build non-Alzheimer profile
• two experiments:
• classification
• improvement detection
18
Classification
• from each test interview patient and non-patient speech is extracted
• this produces 40 speech extracts
• each speech extract is labelled by the classifier as Alzheimer or non-
Alzheimer
• accuracy is reported
19
Experiment 1.1
training and testing part (90:10)
use all speakers to generate profiles
use both interviews
20
ACADIE: Classification accuracy n=1 2 3 4 5 6 7 8 9 10
L = 20 88% 85% 83% 88% 98% 93% 95% 80% 85% 85%
50 73% 80% 78% 95% 95% 85% 93% 93% 95% 100%
100 73% 78% 95% 95% 98% 98% 100% 98% 98% 100%
200 73% 93% 98% 100% 98% 100% 100% 100% 100% 100%
500 73% 80% 95% 100% 100% 98% 98% 100% 100% 100%
1000 73% 95% 100% 100% 100% 100% 100% 100% 100% 100%
1500 73% 98% 98% 100% 100% 100% 100% 100% 100% 100%
2000 73% 98% 93% 100% 100% 100% 100% 100% 100% 100%
3000 73% 98% 93% 100% 100% 100% 100% 100% 100% 100%
4000 73% 98% 98% 100% 100% 100% 100% 100% 100% 100%
5000 73% 98% 98% 98% 100% 100% 100% 100% 100% 100%
21
Improvement detection
S a
S b
similarity
similarity
with Alzheimer profile
with non Alzheimer profile
S
S a
S
a
S b
normalized similarity with
Alzheimer profile (0.5
threshold )
improvement is detected by observing an increase in S value between the first and second interview
22
ACADIE: Detected improvement n=1 2 3 4 5 6 7 8 9 10
L = 20 50% 60% 70% 80% 70% 50% 50% 40% 60% 50%
50 50% 70% 60% 30% 60% 30% 30% 60% 50% 70%
100 40% 60% 40% 40% 40% 40% 80% 60% 70% 60%
200 40% 30% 30% 40% 50% 70% 40% 70% 50% 60%
500 40% 80% 60% 80% 60% 50% 40% 60% 80% 70%
1000 40% 50% 90% 60% 70% 70% 70% 90% 60% 60%
1500 40% 70% 80% 70% 80% 60% 80% 80% 60% 50%
2000 40% 60% 90% 70% 70% 70% 70% 70% 60% 60%
3000 40% 60% 70% 70% 70% 60% 60% 70% 60% 70%
4000 40% 60% 70% 90% 80% 80% 70% 60% 70% 70%
5000 40% 60% 70% 80% 80% 70% 60% 70% 70% 70%
23
Experiment 1.2
use only first interviews to create Alzheimer and Non-Alzheimer profiles
24
Exp. 1.2: Classification accuracy n=1 2 3 4 5 6 7 8 9 10
L = 20 85% 85% 83% 88% 93% 90% 95% 80% 80% 83%
50 70% 90% 83% 98% 95% 85% 93% 95% 95% 90%
100 73% 98% 98% 98% 90% 98% 98% 98% 95% 98%
200 73% 88% 98% 100% 100% 98% 100% 95% 100% 100%
500 73% 83% 98% 100% 95% 98% 95% 100% 98% 100%
1000 73% 95% 100% 100% 100% 100% 100% 100% 100% 100%
1500 73% 95% 95% 100% 100% 100% 100% 100% 100% 100%
2000 73% 95% 93% 100% 100% 100% 100% 100% 100% 100%
3000 73% 95% 95% 100% 100% 100% 100% 100% 100% 100%
4000 73% 95% 98% 100% 100% 100% 100% 100% 100% 100%
5000 73% 95% 100% 100% 100% 100% 100% 100% 100% 100%
Improvement detection: 0.6-0.9
25
Experiment 1.3
use only first interviews
only speech produced by patients, caregivers, and other (not field researchers)
26
Exp. 1.3: Classification accuracy n=1 2 3 4 5 6 7 8 9 10
L = 20 75% 90% 85% 80% 65% 75% 80% 70% 75% 70%
50 73% 88% 68% 80% 90% 75% 75% 75% 80% 83%
100 73% 85% 88% 85% 88% 80% 83% 88% 88% 88%
200 73% 83% 90% 90% 95% 88% 93% 88% 98% 93%
500 73% 65% 95% 95% 95% 95% 98% 90% 93% 95%
1000 73% 83% 93% 93% 98% 93% 98% 98% 98% 95%
1500 73% 78% 80% 95% 95% 100% 98% 98% 98% 95%
2000 73% 80% 75% 95% 100% 98% 98% 98% 98% 95%
3000 73% 83% 83% 88% 95% 98% 95% 98% 95% 93%
4000 73% 83% 90% 95% 95% 95% 98% 98% 95% 95%
5000 73% 83% 93% 98% 95% 98% 98% 98% 98% 93%
Improvement detection: 0.6-0.8
27
Some experiment observations
Alzheimer n-gram profile captures many indefinite terms and negated (e.g., sometimes, don ’t know, can not, …)
the profiles captures reduced lexical richness
Alzheimer non-Alzheimer n-gram rank
28
Second set of experiments
rating dementia levels
implement method BSCW (by Bucks et al.),
analysis and extension
comparison with CNG
application of a wider set of machine learning algorithms
29
MMSE – Mini-Mental State Exam
MMSE – a standard test for identifying cognitive impairment in a clinical setting
17 questions, 5-10 minutes
introduced in 1975 by Folstein et al.
score range from 0 to 30
a variety of cut points suggested over years:
17.5, 21.5, 23.5, 25.5
30
MMSE Score Gradation
we use the following gradation
0 14.5 20.5 24.5 30 four classes: severe moderate mild normal two classes: low high
31
MMSE Score distribution in data set severe moderate mild normal
32
33
Part-of-speech tagging, MA-A
following the BSCW method applied Hepple from NL GATE and Connexor
Hepple is based on Brill ’s tagger
Connexor performed better
1.
2.
set of attributes MA-A: attributes similar to
BSCW: excluded CSU-rate: manually annotated reported non-significant impact by BSCW
34
Morphological Attribute Set: MA-B
start with all POS attributes regression-based attribute selection
7 POS attributes selected (conjunctions included)
add TTR and Honore statistics
Brunet statistic shown to be non-significant use several machine learning algorithms with cross-validation, using software tool WEKA
35
36
Ordinal CNG Method
• use two extreme groups to build profiles normal level severe dementia level profile severe profile normal
CNG similarity: S severe
S normal test speech profile
classify according to
S
S severe severe
S normal
37
Ordinal CNG: Thresholds
range of values: [0,1]
0 corresponds to severe, 1 to normal
what are good threshold
interesting observation:
the optimal threshold is very close to the “natural threshold ” – 0.5 (varies from 0.5 to 0.512)
38
39
Conclusions
extensive experiments on morphological and lexical analysis of spontaneous speech for detecting dementia of Alzheimer type methods:
CNG and Ordinal CNG
extension of method proposed by use of POS tags as suggested by BSCW positive results in classification and detecting dementia level:
100% discrimination accuracy (Pt and other)
93% - severe/normal
70% - two-class accuracy
46% - four-class accuracy
40
Future work
improvement detection
use of word CNG method
stop-word frequency-based classifier
syntactic analysis
semantic analysis
41