Background: The paper-based FDA 

advertisement
Background: The paper-based FDA
 The type and severity of a given instance of
dysarthria (disordered speech arising from impaired
articulator control) is diagnosable by an
assessment procedure known as the Frenchay
Dysarthria Assessment (FDA) tests.
 Two of the three FDA intelligibility tests are concerned with
the measurement of intelligibility…but what exactly is
intelligibility anyway?
“The degree of success in establishing
communication between the sender and intended
recipient of a message”
Intelligibility, a very variable percept
 Are both of these speech samples equally
intelligible?
(from ABI Corpus, Birmingham Uni.)
 Initially, a listener will find it more difficult to understand a
Learning Effect from Repeated
Exposure to Dysarthric
Speech Data - Mean Score
Improvement: Round 1 vs.
Rounds 2-5
Mean Score Improvement (%)
newly encountered accent than a familiar one. Nonetheless,
increased exposure to the initially unfamiliar speaking style
will usually invoke a subconscious adaptation, a learning
effect, making that speech easier to understand. This holds
true even for dysarthric speech.
50
45
40
35
30
25
20
15
10
5
0
Naïve
Listeners
Expert
Listeners
Judge Judge Judge Judge Judge Judge Judge Judge Judge Judge
1
2
3
4
5
6
7
8
9
10
Modelling the Naïve Listener
 If the learning effect alters a listener’s perception of a particular
individual’s speaking style, is that listener’s judgement still
representative of the naïve listener?
• If the learning effect introduces an inevitable bias, can a computer
model be built which behaves like an “eternal” naïve listener (i.e.
never adapting to an unfamiliar speaking style and therefore
always consistent in assessment)?
Possible Solution: Using HMM Models to Emulate the Naïve listener
• A hidden Markov Model (HMM) is, essentially, a statistical
representation of a speech unit at the phone/word/utterance level.
HMM models are “trained” by analysing the acoustic features of
multiple utterances representing the specified speech unit.
Multiple Speech
Samples from
multiple
speakers
Goodness of Fit
 Once trained, an HMM word model can be used to
estimate the likelihood that a given speech sound
could have actually been produced by that word
model. This likelihood is called a goodness of fit
(GOF) and can be expressed as a log likelihood, e.g.
10-35 (or simply -35).
Mr. HMM Model, could
you’ve been my daddy?
The more acoustically dissimilar an
utterance is from what the IE has been
trained on, the lower the GOF score
Hmm, with a log
likelihood of 1055, I’m not so
sure…
Using Forced-Alignment GOF
scoring to measure Intelligibility
 Since two of the FDA intelligibility tests require the repetition
of words/phrases from a pre-selected vocabulary, HMM
utterance models can be built for these words/phrases.
 Furthermore, the incoming speech can be matched to the
corresponding utterance model to determine the goodness
of fit. This matching of a speech sample to a specific
utterance model and only that model is called forced
alignment.
 We hypothesise that force-aligning a speech sample with its
corresponding “everyman” word model will yield GOF
scores which are systematically related to that speech
sample’s intelligibility. When HMMs are used in this way, we
call them intelligibility estimators.
…so, how does it work in practice?
 IE utterance models are trained on normal speech from a
variety of speakers and a range of GOF scores for normal
speech test data is established: typically between -5 and -10.
 Ranges have been established for moderate and low
intelligibility (which, in an FDA diagnostic context = dysarthric)
speech, typically with GOF scores between -11 and -20
(moderately intelligible) and < -20 (low intelligibility). These
scores are relative to the maximum likelihood utterance (i.e.
the speech file with the highest GOF score) in the IE’s training
set.
Sample GOF scores
5
Normal Speaker 1
Normal Speaker 2
Normal Speaker 3
-5
GOF scores for
isolated single words
Normal Speaker 4
Normal Speaker 5
-10
Normal Speaker 6
-15
Normal Speaker 7
Normal Speaker 8
-20
Normal Speaker 9
Normal Speaker 10
-25
Dysar. Speaker 1
Dysar. Speaker 2
-30
Dysar. Speaker 3
-35
5
-40
0
GOF scores for short
sentence utterances
Normalised GOF Scores (Relative to MLU)
Normalised GOF Scores (Relative to MLU)
0
Normal Speaker 1
Normal Speaker 2
-5
Normal Speaker 3
Normal Speaker 4
-10
-15
Normal Speaker 5
Normal Speaker 6
Normal Speaker 7
-20
Normal Speaker 8
Normal Speaker 9
-25
-30
Normal Speaker 10
Dysar. Speaker 1
Dysar. Speaker 2
-35
-40
-45
Dysar. Speaker 3
Problem: How do we make IEs truly naïve?
 “Everyman’ HMM utterance models are not really ‘everyman’, it’s
not feasible to train them on speech data representing all the
world’s anglophone accents. In this experiment, the utterance
models have been trained on speech principally from the South
Yorkshire region, thus accents not represented in the HMM training
data could receive GOF scores which do not truly reflect that
speech sample’s intelligibility as perceived by a naïve listener.
 A non-trivial problem: Certain anglophone accents, due to their
prestige, are more universally intelligible than others, e.g. Estuary
English and RP, while others are a lot less intelligible internationally
(e.g. the Glaswegian accent). What mix of accents should be used
to train an HMM word model to make it truly representative of a
‘typical’ naïve listener?
Objective #2: Overall Diagnosis
 After collecting data from all the 28 FDA sub-tests, how do we
arrive at a dysarthria sub-type diagnosis?
 Usually by template matching and symptom categorisation (e.g.
“At-rest tasks performed better than in-speech tasks? If so,
spastic dysarthria most likely”).
 Can these processes be automated? Yes, via a neural network
combined with an expert system. The neural network does the
basic pattern matching while the rule-based expert system
attempts to disambiguate diagnostic information not directly
represented in the FDA letter grades.
Example of
CFDA Expert
system rulebased data
disambiguation
Uncontrollably Rapid
Speech Rate?
Yes
No
Hypokinetic Dysarthria
Most likely of 5 types
Slow Speech Rate?
No
Flaccid Dysarthria most
likely of 5 types
Yes
Extrapyramidal Dysarthria
less likely than other 4 types
Diagnostic Accuracy of Hybrid
System
Classification accuracy (%)
100
FDT Classification
Correctness
80
MLP Classification
Correctness
60
40
20
0
Ataxic
Extrapyramidal
Flaccid
Dysarthria sub-type
Mixed
Spastic
Hybrid System
Classification
Correctness (1st
choice)
Hybrid System
Classification
Correctness (1st
or 2nd choice)
The automated diagnostic system will even
tell you why it came to a given decision…
Future Work
 Acquisition of HMM Technology which (for the
Intelligibility Estimator) doesn’t have prohibitively
high license fees.
 Collection of dysarthric data to build an FDAspecific dysarthric speech database.
 More interviews with experienced speech
therapists to increase the diagnostic expert
system’s knowledge database.
 Results of NHS Field Trials of the CFDA
application
Download