ch6 (Synthesis).ppt

advertisement
5-Text To Speech (TTS)
Speech Synthesis
Speech Synthesis Concept
Phone Units
Phone Sequence To Speech
Speech Naturalness
– Concatenative Approaches
– Rule-Based Approaches
1
Speech Synthesis Concept
Text
Text
Text to
Phone Sequence
Natural Language
Processing (NLP)
Speech
Phone Sequence
to Speech
Speech
Speech Processing
2
Phone Units
)
Sentence ( )
Paragraph (
Word (Depends on the language. Usually more than 100,000)
Syllable
Diphone & Triphone
Phoneme (Between 10 , 100)
3
Phone Units (Cont’d)
Diphone : We model Transitions between
two phonemes
p1
p2
p3
p4
p5
. . . . .
Diphone
Phoneme
4
Phone Units (Cont’d)
In farsi we have 30 Phoneme. so we have
30*30 Diphone Theoretically.
Practically the only Diphone that we don’t
have in farsi is /zho/
we have 27000 Triphone Theoretically.
But practically we have about 15000
Triphone in farsi.
5
Phone Units (Cont’d)
Syllable = Onset (Consonant) + Rhyme
Syllable is a set of phonemes that exactly
contains one vowel
Syllables in Farsi : CV , CVC , CVCC
We have about 4000 Syllables in farsi
Syllables in English :V, CV , CVC ,CCVC,
CCVCC, CCCVC, CCCVCC, . . .
Number of Syllables in English is very
much
6
Phone Sequence To Speech
Concatenative Approaches : Trade-Off
between Naturality And Memory usage
and variety of desired functions
Rule-Based Approaches : The most
important Rule-Based approach is Klatt
method
7
Phone Sequence To Speech
(Cont’d)
Text
Text to
Phone
Sequence
NLP
Phone
Sequence
to primitive
utterance
primitive
utterance Speech
to Natural
Speech
Speech Processing
8
Speech Naturalness
Obviation of undesirable noise and
distortion and dissociation from speech
Prosody generation
– Speech energy
– Duration
– pitch
– Intonation
– Stress
9
Speech Naturalness (Cont’d)
Intonation and Stress are very effective in
speech naturalness
Intonation : Variation of Pitch frequency
along speaking
Stress : Increasing the pitch frequency in a
specific time
10
Concatenative Approaches
In this approaches we store units of
natural speech for reconstruction of
desired speech
We could select the appropriate phone
unit for speech synthesis
we can store compressed parameters
instead of main waveform
11
Concatenative Approaches
(Cont’d)
Benefits of storing compressed
parameters instead of main waveform
– Less memory use
– General state instead of a specific stored
utterance
– Generating prosody easily
12
Concatenative Approaches
(Cont’d)
Phone Unit
Type of Storing
Paragraph
Main Waveform
Sentence
Main Waveform
Word
Main Waveform
Syllable
Coded/Main Waveform
Diphone
Coded Waveform
Phoneme
Coded Waveform
13
Concatenative Approaches
(Cont’d)
Pitch Synchronous Overlap-Add-Method
(PSOLA) is a famous method in phoneme
transmit smoothing
Overlap-Add-Method is a standard DSP
method
PSOLA is a base action for Voice Conversion.
In this method in analysis stage we select
frames that are synchronous by pitch markers.
14
Rule-Based Approach Stages
Determine the speech model and model
parameters
Determine type of phone units
Determine some parameter amount for
each phone unit
Substitute sequence of phone units by
its equivalent parameter sequence
Put parameter sequence in speech
model
15
KLATT 80 Model
16
KLATT 88 Model
17
THE KLSYN88 CASCADE PARALLEL FORMANT SYNTHESIZER
GLOTTAL SOUND SOURCES
FILTERED
IMPULSE
TRAIN
SO
MODIFIED
LF
MODEL
FRICATION
NOISE
GENERATOR
NASAL
POLE ZERO
PAIR
TRACHEAL
POLE ZERO
PAIR
SS
F1 B1
DF1 DB1
FIRST
FORMANT
RESONATOR
F2 B2
F3 B3
CP
AH
+
SECOND
FORMANT
RESONATOR
THIRTH
FORMANT
RESONATOR
ASPIRATION
NOISE
GENERATOR
A3F
A4F
A5F
SECOND
FORMANT
RESONATOR
THIRD
FORMANT
RESONATOR
B2F
FOURTH
FORMANT
RESONATOR
F5 B5
FIFTH
FORMANT
RESONATOR
B4F
FIFTH
FORMANT
RESONATOR
B5F
AB
A1V
FIRST
FORMANT
RESONATOR
A2V
SECOND
FORMANT
RESONATOR
-
+
A3V
THIRTH
FORMANT
RESONATOR
A4V
FOURTH
FORMANT
RESONATOR
ATV
TRACHEAL
FORMANT
RESONATOR
-
+
-
B6F F6
+
+
B3F
FOURTH
FORMANT
RESONATOR
SIXTH
FORMANT
RESONATOR
ANV
NASAL
FORMANT
RESONATOR
+
FIRST
DIFFERENCE
PREEMPHASIS
+
A6F
F4 B4
CASCADE VOCAL TRACT MODEL LARYNGEAL SOUND SOURCES
SPECTRAL
TILT LOW-PAS
RESONANTOR
A2F
AF
FTP FTZ
BTP BTZ
TL
F0 AV OO FL DI
KL GLOTT
88 model
(default)
FNP FNZ
BNP BNZ
+
-
BYPASS PATH
PARALLEL VOCAL TRACT MODEL FRICATION SOUND SOURCES
PARALLEL VOCAL TRACT MODEL LYRYNGEAL
SOUND SOURCES (NORMALLY NOT USED)
18
Three Voicing Source Model In
KLATT 88
The old KLSYN impulsive source
The KLGLOTT88 model
The modified LF model
19
Download