Computational analysis of language used by Alzheimer

advertisement

Analysis of Spontaneous Speech in Dementia of

Alzheimer Type: Experiments with Morphological and Lexical Analysis

Nick Cercone

Vlado Keselj

Calvin Thomas

Kenneth Rockwood

Medicine, Dalhousie University

Computer Science

Dalhousie University

Elissa Asp

English Deparment

Saint Mary

PUL Workshop, Dalhousie University, Halifax, 23 Apr 2004

’ s University

1

Overview

 Introduction

 Related work: Bucks et al, authorship attribution

 CNG discrimination Pt/other

 rating dementia levels

 use of attribute sets: MA-A, MA-B

CNG and Ordinal CNG

 Conclusion

2

Introduction

 Effects of the Alzheimer ’s disease (AD)

 reduced communicative ability

 deterioration of linguistic performance

 Can we detect it?

 Current methods rely on structured interviews

 confrontation naming

 single word production

 word generation given context

 word generation given first letter

 picture description

3

Analysis of spontaneous speech

 drawbacks of structured interviews:

 sometimes insensitive to early signs of dementia observed by family

 low scores are not reliable unless difficulty is observed in natural conversation brake “natural speech” into components

 subjective, i.e., designed by a researcher

 alternative solution: objective automatic analysis of spontaneous, i.e., natural, speech

4

Speech characteristics in Dementia of

Alzheimer Type (DAT)

 frequent use of functional words (closed class)

 less rich vocabulary

 difficulty with constructing longer coherent phrases

 more difficulties at lexical and morphological level than phonetic and syntactic levels

5

Related work: Bucks et al. (BSCW)

 Bucks, Singh, Cuerden, Wilcock 2000, 2001:

Analysis of spontaneous conversational speech in dementia of Alzheimer type (DAT)

 use eight linguistic measures to analyze transcribed spontaneous speech:

1) noun rate

2) pronoun rate

5) clause-like semantic unit rate

(CSU)

6) Brunet ’s index (W) 3) verb rate

4) adjective rate 7) token type ratio (TTR)

8) Honore ’s statistic (R)

6

Bucks et al.: Experiment design

• experiment with 24 participants:

• 8 patients and 16 healthy individuals

• discriminating between demented and healthy individuals:

• 100% on training data

• 87.5% with cross-validation

7

Related work: Automated authorship attribution

 Problem of identifying the author of an anonymous text

One of Text Categorization Problems

.

Spam detection

.

Language and encoding identification

.

Authorship attribution and plagiarism detection

.

Text genre classification

.

Topic detection

.

Sentiment classification

8

Related work (authorship attribution)

1.

style analysis

 using style markers (features)

 relying on non-trivial NL analysis

Stamatatos et al. 2000-02

2.

language modeling

Peng et al. 2003, EACL ’ 03

Khmelev and Teahan 2003, SIGIR ’ 03

3.

N-gram-based text categorization

Cavnar and Trenkle 1994

9

Shortcomings of style analysis

• difficult to automatically extract some features

• feature selection is critical

• language dependent

• task dependent, i.e., does not generalize well to other types of classification

10

Character N-gram -based Methods

 Text can be considered as a concatenated sequence of characters instead of words.

 Advantages

1. small vocabulary

2. language independence

3. no word segmentation problems in many

Asian languages such as Chinese and Thai

11

How do character n-grams work?

Marley was dead: to begin with. There is no doubt whatever about that. …

(from Christmas Carol by Charles Dickens) n = 3

Mar arl

_th 0.015

___ 0.013

L=5 rle ley ey_ y_w sort by frequency the 0.013

he_ 0.011

and 0.007

_an 0.007

_wa was

… nd_ 0.007

ed_ 0.006

12

How do we compare two profiles?

Dickens: A Tale of Two Cities

Dickens: Christmas Carol

_th 0.015

___ 0.013

the 0.013

he_ 0.011

and 0.007

?

_th 0.016

the 0.014

he_ 0.012

and 0.007

nd_ 0.007

?

Carroll: Alice’s adventures in wonderland

_th 0.017

___ 0.017

the 0.014

he_ 0.014

ing 0.007

13

N-gram distribution

(From Dickens: Christmas Carol)

5.00E-03

4.50E-03

4.00E-03

3.50E-03

3.00E-03

2.50E-03

2.00E-03

1.50E-03

1.00E-03

5.00E-04

0.00E+00

1 4 7

10 13 16 19 22 25 28 31 34

6-grams

14

CNG profile similarity measure

• a profile = the set of

L the most frequent n-grams

• profile dissimilarity measure:

 n

 profile

 f

1

( n ) f

1

( n )

2 f

2

( n ) f

2

( n )

2

  n

 profile



2

( f

1 f

1

( n )

( n )

 f

2 f

2

(

( n ) n ))



2 weight

15

Authorship Attribution Evaluation

60

50

40

30

20

10

0

100

90

80

70

English Greek A Greek B Greek B+ Chinese

Style

Lang. M

CNG

16

ACADIE Data Set

• 189 GAS interviews (Goal Attainment Scaling)

• 95 patients (2 interviews per patient, except 1 patient)

• 6 sites; 17 MB of data (3.2 million words)

• interview participants:

• FR – field researcher

• Pt – patient

• Cg – caregiver

• other people

17

Experiment set-up

• preprocessing

• patients divided into two groups

• 85 training group (169 interviews)

• 10 testing group (20 interviews)

• patient speech in training group is used to build

Alzheimer profile

• non-patient speech in training group is used to build non-Alzheimer profile

• two experiments:

• classification

• improvement detection

18

Classification

• from each test interview patient and non-patient speech is extracted

• this produces 40 speech extracts

• each speech extract is labelled by the classifier as Alzheimer or non-

Alzheimer

• accuracy is reported

19

Experiment 1.1

 training and testing part (90:10)

 use all speakers to generate profiles

 use both interviews

20

ACADIE: Classification accuracy n=1 2 3 4 5 6 7 8 9 10

L = 20 88% 85% 83% 88% 98% 93% 95% 80% 85% 85%

50 73% 80% 78% 95% 95% 85% 93% 93% 95% 100%

100 73% 78% 95% 95% 98% 98% 100% 98% 98% 100%

200 73% 93% 98% 100% 98% 100% 100% 100% 100% 100%

500 73% 80% 95% 100% 100% 98% 98% 100% 100% 100%

1000 73% 95% 100% 100% 100% 100% 100% 100% 100% 100%

1500 73% 98% 98% 100% 100% 100% 100% 100% 100% 100%

2000 73% 98% 93% 100% 100% 100% 100% 100% 100% 100%

3000 73% 98% 93% 100% 100% 100% 100% 100% 100% 100%

4000 73% 98% 98% 100% 100% 100% 100% 100% 100% 100%

5000 73% 98% 98% 98% 100% 100% 100% 100% 100% 100%

21

Improvement detection

S a

S b

 similarity

 similarity

with Alzheimer profile

with non Alzheimer profile

S

S a

S

 a

S b

 normalized similarity with

Alzheimer profile (0.5

threshold )

 improvement is detected by observing an increase in S value between the first and second interview

22

ACADIE: Detected improvement n=1 2 3 4 5 6 7 8 9 10

L = 20 50% 60% 70% 80% 70% 50% 50% 40% 60% 50%

50 50% 70% 60% 30% 60% 30% 30% 60% 50% 70%

100 40% 60% 40% 40% 40% 40% 80% 60% 70% 60%

200 40% 30% 30% 40% 50% 70% 40% 70% 50% 60%

500 40% 80% 60% 80% 60% 50% 40% 60% 80% 70%

1000 40% 50% 90% 60% 70% 70% 70% 90% 60% 60%

1500 40% 70% 80% 70% 80% 60% 80% 80% 60% 50%

2000 40% 60% 90% 70% 70% 70% 70% 70% 60% 60%

3000 40% 60% 70% 70% 70% 60% 60% 70% 60% 70%

4000 40% 60% 70% 90% 80% 80% 70% 60% 70% 70%

5000 40% 60% 70% 80% 80% 70% 60% 70% 70% 70%

23

Experiment 1.2

 use only first interviews to create Alzheimer and Non-Alzheimer profiles

24

Exp. 1.2: Classification accuracy n=1 2 3 4 5 6 7 8 9 10

L = 20 85% 85% 83% 88% 93% 90% 95% 80% 80% 83%

50 70% 90% 83% 98% 95% 85% 93% 95% 95% 90%

100 73% 98% 98% 98% 90% 98% 98% 98% 95% 98%

200 73% 88% 98% 100% 100% 98% 100% 95% 100% 100%

500 73% 83% 98% 100% 95% 98% 95% 100% 98% 100%

1000 73% 95% 100% 100% 100% 100% 100% 100% 100% 100%

1500 73% 95% 95% 100% 100% 100% 100% 100% 100% 100%

2000 73% 95% 93% 100% 100% 100% 100% 100% 100% 100%

3000 73% 95% 95% 100% 100% 100% 100% 100% 100% 100%

4000 73% 95% 98% 100% 100% 100% 100% 100% 100% 100%

5000 73% 95% 100% 100% 100% 100% 100% 100% 100% 100%

Improvement detection: 0.6-0.9

25

Experiment 1.3

 use only first interviews

 only speech produced by patients, caregivers, and other (not field researchers)

26

Exp. 1.3: Classification accuracy n=1 2 3 4 5 6 7 8 9 10

L = 20 75% 90% 85% 80% 65% 75% 80% 70% 75% 70%

50 73% 88% 68% 80% 90% 75% 75% 75% 80% 83%

100 73% 85% 88% 85% 88% 80% 83% 88% 88% 88%

200 73% 83% 90% 90% 95% 88% 93% 88% 98% 93%

500 73% 65% 95% 95% 95% 95% 98% 90% 93% 95%

1000 73% 83% 93% 93% 98% 93% 98% 98% 98% 95%

1500 73% 78% 80% 95% 95% 100% 98% 98% 98% 95%

2000 73% 80% 75% 95% 100% 98% 98% 98% 98% 95%

3000 73% 83% 83% 88% 95% 98% 95% 98% 95% 93%

4000 73% 83% 90% 95% 95% 95% 98% 98% 95% 95%

5000 73% 83% 93% 98% 95% 98% 98% 98% 98% 93%

Improvement detection: 0.6-0.8

27

Some experiment observations

 Alzheimer n-gram profile captures many indefinite terms and negated (e.g., sometimes, don ’t know, can not, …)

 the profiles captures reduced lexical richness

Alzheimer non-Alzheimer n-gram rank

28

Second set of experiments

 rating dementia levels

 implement method BSCW (by Bucks et al.),

 analysis and extension

 comparison with CNG

 application of a wider set of machine learning algorithms

29

MMSE – Mini-Mental State Exam

 MMSE – a standard test for identifying cognitive impairment in a clinical setting

 17 questions, 5-10 minutes

 introduced in 1975 by Folstein et al.

 score range from 0 to 30

 a variety of cut points suggested over years:

17.5, 21.5, 23.5, 25.5

30

MMSE Score Gradation

 we use the following gradation

0 14.5 20.5 24.5 30 four classes: severe moderate mild normal two classes: low high

31

MMSE Score distribution in data set severe moderate mild normal

32

33

Part-of-speech tagging, MA-A

 following the BSCW method applied Hepple from NL GATE and Connexor

Hepple is based on Brill ’s tagger

 Connexor performed better

1.

2.

set of attributes MA-A: attributes similar to

BSCW: excluded CSU-rate: manually annotated reported non-significant impact by BSCW

34

Morphological Attribute Set: MA-B

 start with all POS attributes regression-based attribute selection

7 POS attributes selected (conjunctions included)

 add TTR and Honore statistics

Brunet statistic shown to be non-significant use several machine learning algorithms with cross-validation, using software tool WEKA

35

36

Ordinal CNG Method

• use two extreme groups to build profiles normal level severe dementia level profile severe profile normal

CNG similarity: S severe

S normal test speech profile

 classify according to

S

S severe severe

S normal

37

Ordinal CNG: Thresholds

 range of values: [0,1]

0 corresponds to severe, 1 to normal

 what are good threshold

 interesting observation:

 the optimal threshold is very close to the “natural threshold ” – 0.5 (varies from 0.5 to 0.512)

38

39

Conclusions

 extensive experiments on morphological and lexical analysis of spontaneous speech for detecting dementia of Alzheimer type methods:

CNG and Ordinal CNG

 extension of method proposed by use of POS tags as suggested by BSCW positive results in classification and detecting dementia level:

100% discrimination accuracy (Pt and other)

93% - severe/normal

70% - two-class accuracy

46% - four-class accuracy

40

Future work

 improvement detection

 use of word CNG method

 stop-word frequency-based classifier

 syntactic analysis

 semantic analysis

41

Download