An Expert System for Interpreting Lafaces,

advertisement
From: AAAI-82 Proceedings. Copyright ©1982, AAAI (www.aaai.org). All rights reserved.
An Expert
Renato
0
De Mario,
Istituto
Massimo
10142
-TORINO
has
novel
been
solution
is
cues
acoustic
GiordanaO,Pietro
Lafaces,
far
42
proposed
extraction
is performed
an expert
Torino
to
im-
the
of
the
organization
erstanding
System
structured
and
straction,
the
duces
been
the
a general
and
on
P
is
control
phonetic
strategy
for
a set
applied
that
This
has
paper
features
of rules
in
a
a
to
set
of
speech
which
tion
between
cues
the
have
task-independent
giving
Japanese
system
be
results
were
that
as
new
is
presently
is
as
knowledge
hoped
introduced
care
has
2.
A
tic
by the
automatic
language
system
them
con-
is
can
of
ving
also
rules
be
can
enriched
Knowledge
updating
designers,
learning
but
will
BETWEEN
CUES.
proposed
PHONETIC
is represented
For
is
and
"slopes
of
algebraic relathe
acou-
it is associated
with
"lax".
and
In
general
both
example,
rule
describes
in the
relation
by
the
the
form:
+
. "pal-pseudo-loci".
the
and
importance
by
a dot
of
the
) follow-
+ indicates
logical
acoustic
Let
F2B
and
third
F3A
be
burst,
and
descriptions
F3B
formant
the
be
the
before
pseudo-loci
intervocalic
with
the
nonsonorant
of the acoustic
pseudo-loci
the
just
"pal-pseudo-loci"
in conjunction
it
be
of the
plosive
after
p6."high-pseudo-loci
where
a
"high-pseudo-loci
defined
in the plane
and "high-pseudo-loci
defined
AND
a set of phonephoneme
/g/ is
consonant:
before"
.
after"
;
before"
is
the
after"
"pal-slopes"
"pal-slopes"
= p7."rising
consonant
is defined
as follows,
feature "lax" in a single
before"
a
coordinates
is another
of the coordinates
Analogously,
SB
107
in the plane
of
cues.
second
and let F2A,
the
= p4."high-pseudo-loci
p5."high-pseudo-loci
after" +
application.
FEATURES
cases
+
(indicated
"high-pseudo-loci
which
involved
"compact-
cues
"pal-pseudo-loci"
and
are defined by other relations
involjudgements
expressed
on parameters
contained
cues.
for controlling
ACOUSTIC
phoneme
an
"pal-slopes"
been taken in selecting rules
easily detectable
and possibly
RELATION
features.
sake
with
cues
"pal-pseudo-loci"
acoustic
speaker-invariant
planning
the
disjunction.
in thefuture.
robust,
frame
set
which
acquired.
some
that
present
kernel
is
which
A
a
performed
Particular
use
the
or
the following
Pl, p2,
logical conjunction
tests
on the English
and
The interesting
aspect of
Languages.
considered
For
feature
on wheter
"pal-slopes"
;
p3 are measures
of
ing
good
limited
is
related
introduced
acoustic
phonetic
"tense"
in the detailed
the
are
= pl."pal-pseudo-loci"."pal-slopes"
of
which have been tested exresults
for the Italian
good
Surprisingly,
the
L
"palatal"
the
P2. "compact-burst"
P3. "compact-burst"
cues.
the system
the
has
"palatal"
hypotheses.
These
relations
between
implementation
Language.
with
be
"pseudo-loci"
transition".The
depends
feature
relation
rules
tensively
obtained
and
account.
will
feature
with
formant
stic
The
a
NI
into
rules
phonetic
relation
any speaker in any lansentence
it produces
a
and acoustic
In the present
tains
taken
the
-burst-spectrum",
intro-
interpreting
structured
phonetic
using
are obtained
hypotheses
is
brevity,
in
of
ab-
system for speech decoding.
should
be capable
the system
principle,
palatal(P)"
features
cues
by
context-independent
rules
involved
in a relation
in which also
context
the second
accepting
any sentence
of
guage. For every analyzed
of
Und-
are the representation
several
levels
of
framework
succesfully
lattice
of a Speech
efficiently.
describes
multi-speaker
In
(SUS)
knowledge
patterns
set:
example.
knowledge
use
24
by the following
phonetic
The
to
Abruzzi
Italy
acoustic
while
by a gram-
INTRODUCTION
Central
degli
lax(L),
,
The
of frames.
1.
Duca
10129
prob-
processes
represented
_ CENS
/g/ = flnonsonorant-interrupted-consonant(NI)-
the extraction
of
this paper.
This
by parallel
system
Generale
di Torino
Corso
con-
SaittaO
di Elettrotecnica
represented
in
an unsolved
on
in
and Lorenza
Politecnico
Italy
so
Patterns
§Istituto
dell'Informazione
d'Azeglio
based
plementing
mar
Speech
ABSTRACT
syllabic
hypothesization
speech
A
for Interpreting
di Torino
torso
Efficient
lem.
Attilio
di Scienze
Universita
tinuous
System
is defined
SB"
+
fuzzy
set
F2B,
F3B
fuzzy
set
F2A,
F3A.
as follows:
+ p8."falling
P9 . "rising SB" . "falling SA" ;
is the slope of the second formant
SA" +
transition
SA is the slope
after the burst.
before the consonant
and
second formant transition
Context
syllabic
dependencies
segments.
segments
Semantic
Directed
Fu[l] ) which
ses
from
is
phonetic
: back
: central
SON
: sonorant
consonant
SNCL
: cluster
of sonorant
NI
: nonsonorant
interrupted
NA
: nonsonorant
affricate
(see
: nonsonorant
continuant
of
of
a
Tai and
hypothe-
hypotheses
can
be
are:
vowel
vowel
Primary
phonetic
similarity
consonants
consonant
with
Fig.
consonant
for
.
are
as
lexical
access
work
by Shipman
a recent
a
Actions
with
Term
and
lexical
of brevity,
hypotheses
this
paper.
Nevertheless
that
lexical
hypotheses
cation
of
labic
level
involving
of
PARALLEL
The
ledge
ted
of gene-
discussed
worth
relations
at
used
by
of an
of
expert
rela-
by
expert
well
as
system.
the
in a collection
cues
the
frames
one
are
The
the expert
system
to be
tem has been
ted
problem
Parallel
number
of
vation
Some
by a fra-
work
a very
or
the
of
of
accomplished
can
facing
of possible
some
by
is
may
decominto
The
make
a
by
sent
is
the
ambiguous
that
is
by a society
the
the
signal
the
same
system
The
data and
tains
written
TE
in a
into
a
descriptions
difficult
of auditory
task
and
the
end-point
LTM
of
a grammar
to
which
of
the
is
-syllabic
108
are
computed
spectral
description
trans-
of
the
of the
(TE-
of descri-
total
energy
of
and valleys.
At
on, transforms
and
signal
signal
energy
has the task
sends
a
another
message
is repeated
TE-DESCRIPTOR,
GTEDES
that
denoted
controls
and valleys,
of
to
until
a
expert
signal
called
conof
This grammar
and
energy
are
(TE)
"GSF-DESCRIPTOR1l,
the
acoustic
cues
acoustic
cues
and
LTM2,
a coding
in [3].
the
These
bounds
a freis sto-
is detected.
"SYLLABIC-CUE-EXTRACTOR"
experts.
which
signal
operation
described
provides
sentence.
and
the
goes
of peak
another
it
of peaks
signal
This
Descriptions
Sent
reliable
AEDPST
starts
to obtain
(GSF)
total
in terms
time,
in terms
a
for
the
evolution
(TE)
its use were
hypotheses
computations
time
detection
AEDPST
enough part of the
a synchronization
Expert
of
of a sentence
into the "GSF-STM".
denoted
LTMl
contains
detection
long
TE-DESCRIPTOR
of
sentence
(STM).
of
bing
of
features
transformed,
to the
point
end-points
in order
AEDPST,
a
TE-DESCRIPTOR.
solutions.
write
of
evolution
portion
main moti-
model
signal
end-point
been
time
on a
accomplished
knowledge
and
intermediate
signal
are
experts.
tasks
(LTM)
extraction
speech
has
sys-
simulated
distributed
uses
Term Memory
The
LTM
for
the STM.
End-Points
De(AEPDST).
detected,
and stored
formation.After
of distribu-
generation
Subtasks
a
Memory
results
of
of paral-
conceived
hypothesis
variety
expert
Term
Short
are
to real-time
large
been
for
been
spectral
DESCRIPTOR).
called
execution
Each
Long
of
using
close
has
the
for
starting
has
the spectra
rules
computer.
programs
parallel
degree
decoding.The
the
of rules
point
gross
from
into
Expert
representation
into the "SPECTRA-STM".
red
know-
structure
in a framework
algorithms
task
for
the
a certain
and
subtasks.
reasoning
the
in speech
conceived
solving
the
and
allow
achieved
DEC VAX 11/780
posing
language
for
a set
The
frame
Shortarrows.
Transformation"
Signal
looks
this
of writing
quency-domain
the
integra-
described
from
dashed
experts is represented
by
contains
pointers
to a
message
"Auditory
transforming
ap-
under
The procedural
structural
of
and
a
using
When
performed
by
signal
and
AEDPST
FOR GENERATING
is
reading
is sampled, quantized,
stoand transformed
by an
"SIGNAL-STM"
called
tection
predictions.
acoustic
between
a
experts.
and
represented
the action
into
red
The
me language.
lelism
When
The speech
the syl-
these
into
are
passing
presenting
appli-
of articulation.
cues
relations
docie-ty
between
the arrow
is established
STM,
a link
representing
a message
passing and the arrow re-
in
mentioning
the
the auditory
writing
Memories
Message
HYPOTHESES
of
as
is
on top-down
extraction
plication
be
constrain
places
ALGORITHMS
SYLLABIC
control
it
may
acoustic
can be based
the problem
won't
context-dependent
extraction
3.
of
arrows.
For the sake
tions
1 shows
Fig.
used
Zue [2].
rating
d - Expektd 0,j the auditoty
consonant
hypotheses
constraint
preliminary
some
effect
hypotheses
VB
pseudo-
bounds
phonetic
These
VC
NC
side
primary
cues.
Primary
ambiguous.
VF
: front vowel
a
to
of
Translation
generates
acoustic
limited
detection
pseudo-syllabic
Syntax
are
The
of the
for
segmenting
are sent
which
determines
extracts,
sometimes
to the
pseudo-
upon
re-
Table
ing
I
received
into
:= (t NAME) (SLOT-LIST))
:= ((SLOT)
((SLOT))
)
:= ((NAME)
[( DESCRIPTION
)I)
:= ( described-as
(NAME ))
(DESCRIPTIO#')
:= (( CONNECTIVE)
:= (not{ DESCRIPTION)
)
:= (filled-by
( NAME))
:= (CONDITIONAL)
:= (result-of
( PROC))
:= (when{ NAME)
(DESCRIPTION)
(DESCRIPTION)
[ (else( DESCRIPTION)
) 1)
:= (when< PREDICATE
EXPRESSION)
DESCRIPTION
(else
DESCRIPTION
) )
:= (unless
DESCRIPTION
DESCRIPTION
FRAME
SLOT-LIST
SLOT
DESCRIPTION
CONDITIONAL
CONNECTIVE
:=
PROC
from
cues
to
tion.
a
the
instantiation
Table
be
used
Syllabic
k)'
times
of
)
)
EXPERT
tations
and
cues
lexical
degrees
(SE) which
an
is
unambiguous
sends
syllabic
These
hypotheses
performed
by
lexical
expec-
ted
any
of
of
LTM
of
the next
the
GSF-DESCRIPTOR
stored
is
KNOWLEDGE
IN THE LONG
AUDITORY
in
slot
AND PROCEDURAL
TERM
MEMORIES
an
dural
integration
knowledge
the
gross
time
evolution
TE
for
spectral
: the
energy
R12
: the
ratio
work
obtaining
features
energy
El2
The
denoted
Bl
a
knowledge
of plans
con-
tion
description
represented
in LTM3
represented
grammar
B2
a
part
of
the
the
LTM
of
description
letters
the frame slots is created
in the output queue QUOUT.
a
is received
of
the
description
of a total
energy
by the GSF-DESCRIPTOR,
an instan-
frame
GSF-DESCRIPTOR
FRSTR
of
of
a
is created
attempt
sequentially
descriptions
frame
DPTE.
the
causes
PKTE
and
into
of
The execution
then initiated.
is
fill
dip
the
GSFDFR.
plan
to
PKTE
by
(INPUT result-of
the
network
ing
to
represents
a grammar
which
acoustic
A
knowledge
cues
from
frame
is
a frame-name
a control
frequency
are
P-READ(PARAMETERS))
(filled-by DPTE))))
P-APPEND(QUOUT))))
(TERM(result-of
(PKTE
; peak of total eneru
(INTPTE (result-of F-INTCINPUT)))
(PEAKE12 (result-of F-DESCRPEAK(Fa,Fb,INTpTE
of frames.
and
spectral
an
strategy
is applied
a number
for
accord-
(VINT (result-of F-CVINT(PEAKE12,HR)))
(PCONT
(unless (filled-by (or
extracting
(V~CPEAK)(SONPEAK)(NSPFAK)(BRSTPEAK)))
information.
information
(described-as
structure
of slots.
made
A slot
UPK(INTPTE)))))
of
(VOCPEAK
is the
; Vocalic
peak
(WCONT (when (and (HDURPKTE-P)(HPR12P))
holder
i tern
of
information
called
concerning
a particular
"slot-filler"
(Minsky r4]).
Slot-
-fillers
may
lations
or
results
slots
are
fill
the
tion.
A
be
made
(fille'd-by (o~(VOCCUESET)
((LEFT~~W))(C~NSVOW))))))
(V~CCUESET
descriptions
of
of
procedures.
during
events,
re-
(LOWR (result-of F-FLOWR(INTPTE)))
Attempts
to
a frame instantia-
(TRNINT
(result-of F-TRNFCINTPTE)))
(VWINT (result-of F-INTCVCINT))
(HGR (result-of F-CONSHR(INTPTE,VWINT)))
simple
frame
instantiation
reasoning
program
of
can
an
be
of PKTE.
instantia-
(filled-by PKTE))
(when DP-P(INPUT)
band,
started
expert
after
by
a
(VCONT (filled-by
(O~(VOW)
(C~NSV~W)(V~WC~NS)))))
hav-
109
the
of the corThis process
slots
DPTE
the STM
filling
(HR (result-of F-FHGRlZ(INTPTE)))
The
by
by semantic
informally.
(OSFDFR
net-
of
GSF-DESCRIPTOR.
Table II
= 5 - 10 KHz.
any
a langua-
The LTM of GSF-DESCRIPTOR
the
1 con-
repeated
defines
in
of
by
is a hierarchical
by
be
(FRSTR (or (when PK-PCINPUT)
in the
KHz,
in Table
can
filling
a node
parameters:
energies
0.9
Brackets
which
and proce-
3 - 5 KHz frequency
the
0.2 -
relations.
frame-structure
indicated
in capital
with -P and are defined
which will be described
for
with
of the signal,
in the
of
=
LTM3,
structural
of the following
: the total
bands
the
se-
indicated
by names
starting
with
are indicated
by names
starting
P-.Whenever
the frame GSFDFR is instantiated
attempts
of
slots
are
responding
OF THE
EXPERTS.
The LTM of GSF-DESCRIPTOR,
attibute
this
are
Receiving
tains
the
LTM knowledge.
stored
ending
tiation
OF STRUCTURAL
the
of
times.
items
Whenever
peak
Section.
INTEGRATION
fill
in
created
1. The asterisk means that
absent, present, or repea-
be
contains
a process
of
4.
slots
Procedures
with
into
introduced
II
Functions
F-.
acou-
to the
affected
by
knowledge
the
which
of times.
frames
along
of
number
optional
attachments
a
hypotheses
are
all
expert
to
of
rules
than
can
words
of plausibility.
organization
greater
expression
Predicates
acoustic
description
and
the
ge for representing
hypothesiza-
receives
level.
The
the
pseudo-syllable
is instanis created
structures
are precisely
defiof a grammar
defining
all the
the
Table
detailed
hypothesization
SYLLABIC
stic
for
the
attempts
Frame
rules
a frame
structure
beginning
and
The frame-structure
( not
PREDICATE
)
( CONNECTIVE
PREDICATE
F- function
P- procedure
LTM
The exponent
K>l
of an expression means
grammar.
that the expression
can be rewritten
any number
number
expert,
the
empty
shows
After
its
composition
1
tain
syllabic
are
of
At
STM
the
quest
STM.
the
acceptable
PREDICATE
:=
:=
:=
:=
the
copy
quentially.
ned by the
:= or
:= and
:= xor
PREDICATE
EXPRESSION
a message.
, a
tiated
1) )
complex
Attempting
structures.
the extraction
INTPTE
is filled
the
cation
of
INPUT.
This
time
the
the
PEAKTE
time
in the
than
in
time
of the applithe
argument
of beginning,
duration ot the peak
is written
into the
fills
of
which
the
the
of PKTE.
by the result
which
F-FHGR12
describes
ratio
THl).The
of
5.
the peak
slot HR.It
R12
is
high
be
filled
instantiations
called
Each
invoked
more
detailed
tempting
frame
to
the
is
F-CVINT computhe
peaks
in
of
an
A
SONPEAK,
the
are
frame
consisting
UPK
peak
is the
detected
in
at-
of
with
descriptime
the
in more
refer
to
hundred
similar
network
of
plans
is
used
the slots of DPTE.
of more
detailed
execution
has
been
show
of
the
slots
verification
tes
HDURPKTE-P
the
duration
of
VOCPEAK
of the
and
of
HPR12-P
is true
R12
INTPTE
in
threshold
filled
is
truth
HPR12-P.
the
for
at-
plans
6.
if there
whose
the
appear
least
conditionned
energy
is at
maximum
least
TR-EE
if
at
day
analy-
a general
solving.
an average
syllabic
Predegree
hypothesization,
can
be
done
in
multi-microprocessors
one peak
is higher
for Pictorial
,Purdue
1981.
81-38,
Large
of
slots
acoustic
of
which
cues
are
-
which
a total energy peak containing
F-TRNF(INTPTE)
extracts
an
VINT.
The
been
found
Implication
Recognition
time
interval
fills
the
The
maximum
default
cues
energy
interval
when
the
two
in
time
peak
and the description
Comments
cues
F-FLOWR
De
Mori,
R.,"Computer
Models
Fuzzy
Algorithms."
New
interval
for
in
both
HGVINT-P
which
R12
is
true
when
have
1982.
cognition
on
after
If
the
VWINT
and
above
is
is
the
is described
INTPTE
are
predicates
are
as a vocalic
one
is VOC(VWINT).
the
colons
in
Table
II
IEEE
and Machine
the band Fa-Fb is high in the
AEQ-P(VWINT,INTPTE)
is true
intervals
Speech
Plenum
5 - De Mori,R.,Giordana,A.,Laface,P.
extra-
the functions
of
York:
and
Vi-
Sait-
in
extracts
is low, F-CONSHR
Systems",Proc.
- Minsky,M., "A Framework for Representing
Knowledge"
In The Psychology
of Computer
sion, Winston,P.
Ed.,McGraw
Hill, 1975.
VINT.
coincident.
the
these
VWINT.
R12
value
predicate
time
true,
vocalic
in which
slot
in which
consonantal
The
of
of
for Advanced
1982,pp.546-549.
ta L., "Parallel
the intervals
the
description
Report
Using
Press
vowel.
the
University
and Zue,V.W.,"Properties
Lexicons:
Isolated-Word
the beginning
for
Syntax-DirecPattern
than
of a peak where cues of
typical
for example of
transient,
F-INT(VCINT)
found.
been
has
sounds,
plosive
both
with
and Fu,K.S.,"Semantic
- Shipman,D.W.
is high,
consonantal
almost
ex-
by
is true
peak
value
has
extraction
in
one
interval
zero.
were
90%
REFERENCES.
1 - Tai,J.W.
for
of the two predica-
HDURPKTE-P
signal
TH2.VOCCUESET
by
usually
cts
every
with
problem
standard
ICASSP-82,Paris
high.
syllables
processing,
using
Recognition"
the
looks
the
simulated
that
12,
signal
ted Translation
filling
a
of
than
syllables
architectures.
to fill
The
at
and one
interpretation
value
sentences
syllable
INTPTE.
tempting
a
introduced.
of
four male
right
for parallel
parallelism
real-time
by
the
Sys-
talker.
results
excluding
the
using
results
than
program
liminary
has been
hundreds
speech
spoken
More
and
Understanding
evidence
The
system
purpose
condi-
PCONT
from
cues
several
gave
highest
cases.
The
frame
a default
in filling
UPK(INTPTE).
uncertain
no
the
the
a Speech
continuous
zed for each
of
for
slots.If
speakers
language.
BRSTPEAK.
executed
female
tracted
frame
to a hierarchy
can be completed,
assumed
interval
VOCPEAK,
of
in
of
representing
of
on
uttered
with
for
of acoustic
Experiments
(greater
disjunction
which
fill
description
tion
a
corresponds
plans
instantiations
tion
by
model
knowledge
tem in terms
inside
in which R12 is high.The
last plan of the
attempts
to fill the slot PCONT. This
sequence
can
new
syllabic
PEAKE12
slot
CONCLUSIONS.
A
gives
intervals
function
inside
intervals
time
the
time
the
the
a threshold
tes
causes
frequency
band Fa-Fb and in the
INTPTE.Successively,
written
in
description
INTPTE
on
is filled
F-DESCRPEAK
function
the
slots
in the instantiation
interval
the
result
F-INT
and the
INTPTE
slot of PKTE
energy
the
ending
function
of
by
function
INPUT.
their
cues.
gives
in
STM after
fill
function
of
described
The next
to
of acoustic
help
110
in
Algorithms
Continuous
Transactions
Intelligence.
for
Syllable
Speech",
on
Pattern
To
Re-
appear
Analysis
Download