Digital Systems: Hardware Organization and Design

advertisement
Speech Processing
Introduction
Syllabus


ECE 5525 Speech Processing
Contact Info:

CRN
81888


Instructor(s): Veton Këpuska
Textbook(s):
“Discrete-Time Speech Signal Processing: Principles and Practice”,
Thomas F.Quatieri, Prentice Hall, 2002

Reference Material:
Këpuska, Veton
Olin Engineering Building, Office # 353
Tel. (321) 674-7183
E-mail: vkepuska@fit.edu
http://my.fit.edu/~vkepuska/ece5525



Subj
ECE
Crse
5525
Sec
E1
Credits
3.00
Title
Speech Processing
“Digital Processing of Speech Signals”,
L.R. Rabiner and R.W. Schafer, Prentice Hall, 1978
“Digital Signal Processing”,
Alan V. Oppenheim and Ronald W. Schafer, Prentice Hall, 1975
“Digital Signal Processing A Practical Approach”,
Emmanuel C. Ifeachor and Barrie W. Jervis, Second Edition, Prentice Hall, 2002
23 March 2016
Veton Këpuska
2
Syllabus

Course Goals:
Teach modern methods that are used to process speech signals.

Subject Area: Digital Speech Processing

Prerequisites by Topic:

Topics Covered:


















Discrete-Time Speech Signal Processing,
Production and Classification of Speech Sounds,
Acoustic Theory of Speech Production,
Speech Perception,
Speech Analysis,
Speech Synthesis,
Homomorphic Speech Processing,
Short-Time Fourier Transform Analysis and Synthesis,
Filter-Bank Analysis/Synthesis,
Sinusoidal Analysis/Synthesis,
Frequency Domain-Pitch Estimation,
Nonlinear Measurement and Modeling Techniques
Coding of Speech Signals,
Speech Enhancement,
Speaker Recognition,
Methods for Speech Recognition,
Digital Speech Processing for Man-Machine Communication by Voice
Recommended Grading




Digital Signal Processing
Homework:
Exams:
Project(s):
20%
30%
50%
MATLAB Exercises and Homework Problems
23 March 2016
Veton Këpuska
3
Course Information
 http://my.fit.edu/~vkepuska/web
 or Directly
 http://my.fit.edu/~vkepuska/ece5525/
23 March 2016
Veton Këpuska
4
Discrete-Time Speech Signal
Processing
Introduction
23 March 2016
Veton Këpuska
6
Introduction
 The topic of this study is as old as our
human language and as new the
newest as computer chip.
 Goal of creation of more efficient and
effective systems for:
 human-to-human communication, as
well as
 More recently human-machine
communication, is studied.
23 March 2016
Veton Këpuska
7
History
1960’s Digital Signal Processing (DSP):
 Assumed a central role in speech
communication studies:
 It enabled development of a large
number of applications
 Advances in IC technology
 DSP algorithms
 Computer architecture
23 March 2016
Veton Këpuska
8
History
 Created environment with virtually
limitless opportunities for innovation
in






Speech Processing
Image Processing
Video Processing
Radar & Sonar
Medical Diagnostics, and
Consumer Electronics
23 March 2016
Veton Këpuska
9
History
 Three levels of understanding are
required to appreciate the
technological advancement of DSP
 Theoretical
 Conceptual
 Practical
Practice
Concepts
Theory
23 March 2016
Veton Këpuska
Ability to implement theory
and concepts in working code
(MATLAB, C, C++, JAVA)
Basic understanding of
how theory is applied
Mathematics,
derivations, signal
processing
10
Speech Technology
 Theoretical:
 Acoustic Theory of Speech Production
 The basic Mathematics of Speech Signal
representation.
 Derivation of Various Properties of Speech
associated with each representation
 Basic Signal Processing mathematics:




23 March 2016
Speech signal
Sampling
Aliasing
Filtering and other DSP methods
Veton Këpuska
11
Speech Technology
 Conceptual
 Hos speech Processing theory is applied
in order to make various speech
measurements and to estimate and
quantify various attributes of the speech
signal
23 March 2016
Veton Këpuska
12
Speech Technology
 Practical


For technology to realize its full potential, it is essential to
be able to convert theory and conceptual understanding to
practice.
This process involves constraints:
 Knowledge of the goals of the application
 Knowledge of the engering tradeoffs and judgments, and
 Ability to provide implementations in working computer code
(e.g., MATLAB, C/C++ or java),
 Specialized code running in real-time signal-processing chips
 Specialized languages and technologies such as:



23 March 2016
ASICs
FPGAs (Field Programmable Gate Arrays)
DSP chips
Veton Këpuska
13
Speech Technology
 What is the nature of speech signal?
 How do DPS techniques play a role in
learning about the speech signal?
 What are the basic digital
representations of speech signal and
how are they used in algorithms for
speech processing?
 What are the important applications that
are enabled by digital speech processing
methods?
23 March 2016
Veton Këpuska
14
Discrete-Time Speech Signal
Processing
 Speech has evolved as a primary form of
communication between humans.
 Technological advancement has enhanced
our ability to communicate:
 One early case is the transduction by a
telephone handset of the continuously-varying
speech pressure signal at the lips output to
continuously-varying (analog) electric voltage
signal.
 Digital technology has brought communication
to a new level. This technology requires
additional transduction of the signals:
 Analog-to-digital (A/D) and Digital-to-Analog
converter (D/A), vs
 Continuous Analog Signal
23 March 2016
Veton Këpuska
15
Speech Communication Chain
23 March 2016
Veton Këpuska
16
Discrete-Time Speech Signal
Processing
 The topic of this course can be loosely defined as:
 The manipulation of sampled speech signals by a
digital processor to obtain a new signal with some
desired properties.
 Example: Changing a speaker’s rate of articulation
with the use of digital computer.
 Modification of articulation rate (referred to as timescale modification of speech) has as an objective to:


23 March 2016
generate a new speech waveform that corresponds to a
person talking faster or slower than the original rate,
maintain the character of the speaker’s voice (i.e., there
should be little change in the pitch and spectrum of the
original utterance).
Veton Këpuska
17
Time-Scale Modification Example
 Useful applications:
 Fast Scanning of a
long recording in a
message playback
system.
 Slowing Down
difficult to
understand speech.
23 March 2016
Veton Këpuska
18
The Speech-Communication
Pathway
 The linguistic level of communication:
 Idea is first formed in the mind of the speaker.
 This idea is then transformed to words, phrases, and
sentences according to the grammatical rules of the
language.
 The physiological level of communication:
 The brain creates electric signals that move along the
motor nerves these electric signals activate muscles
in the vocal tract and vocal cords.
 The acoustic level in speech communication pathway:
 This vocal tract and vocal cord movement results in
pressure changes within the vocal tract, and, in
particular, at the lips initiating a sound wave that
propagates in space.
 Pressure changes at the ear canal cause vibrations at
the ear drum of the listener.
23 March 2016
Veton Këpuska
19
The Speech-Communication
Pathway
 The physiological level of communication pathway:
 Eardrum vibrations induce electric signals that move
along the sensory nerves to the brain.
 The linguistic level of the listener:
 The brain performs speech recognition and
understanding.
 The linguistic and physiological activity of:
 The speaker
 The listener
23 March 2016
=>
=>
“transmitter”
“receiver”
Veton Këpuska
20
The Speech-Communication
Pathway
 Transmitter and Receiver of the speechcommunication system have other functions besides
basic communication:
 Monitoring and correction of one’s own speech via the
feedback through speakers ear (importance of the
feedback studied in the speech of the deaf).
 Control of articulation rate
 Adaptation of speech production to mimic voices, etc.
 Receiver performs voice recognition:
 Robust to noise and other interferences
 Able to focus on a single low-volume speaker in a room
full with louder interfering multiple speakers (cocktail
party effect).
23 March 2016
Veton Këpuska
21
The Speech-Communication
Pathway
 Significant advances in reproducing
parts of this communication system
by synthetic means.
 Far from emulating the human
communication system.
23 March 2016
Veton Këpuska
22
Analysis/Synthesis Based on
Speech Production and Perception
 This class does not cover entire speech
communication pathway:
 Signal measurements of the acoustic waveform.
 From these measurements and current
understanding of speech of how the vocal tract
and vocal cords produce sound waves
production models are build.
 Starting from analog representations which are
then transformed to discrete-time
representations (A/D).
 From the receiver side the signal processing of
the ear and higher auditory levels are covered
however to significantly lesser extend only to
account for the effect of speech processing on
perception.
23 March 2016
Veton Këpuska
23
Analysis/Synthesis Based on
Speech Production and Perception

Preview of a speech model.
Figure in the next slide shows a model of vowel production.

In a vowel production:
 Air is forced from the lungs (by contraction of the muscles
around the lung cavity).
 Air then flows pass the vocal cord/folds (two masses of
flesh) causing periodic vibration of the cords.
 The rate of vibration of the cords determines the pitch of
the sound.
 Periodic puffs caused by vibration of cords act as an
excitation input, or source, to the vocal tract.
 The vocal tract is the cavity between the vocal cords and
the lips:
 Vocal tract acts as a resonator that spectrally shapes the
periodic input (much like the cavity of the musical wind
instrument).
23 March 2016
Veton Këpuska
24
Analysis/Synthesis Based on
Speech Production and Perception
23 March 2016
Veton Këpuska
25
Analysis/Synthesis Based on
Speech Production and Perception
 From this basic understanding of the
speech production mechanism a simple
engineering model can be build, referred to
source/filter model.
 In this model the following is assumed:
 Vocal tract is a liner time-invariant system (or
filter),
 This linear time-invariant system is driven by a
periodic impulse-like input.
 Those assumptions imply that the output at the
lips that is itself periodic.
23 March 2016
Veton Këpuska
26
Analysis/Synthesis Based on
Speech Production and Perception
 Example of a vowel is “a” as in the word “father”.
 Vowel “a” is one of many basic sounds of a language called phonemes.
 For each phoneme a different production model is built.
 Typical speech utterance consists of a string of vowel and
consonant phonemes whose temporal and spectral
characteristics change with time.
 This change corresponds to the changes in


excitation source and
vocal tract system.


A time-varying source and system,
Furthermore, in realty there is a complex non-linear interaction
of both.
 This fact implies:
 Therefore, even though a simple linear time-invariant
model seems plausible, it does not always represent well
the real system.
23 March 2016
Veton Këpuska
27
Analysis/Synthesis Based on
Speech Production and Perception

Using discrete-time modeling of speech
production the course will cover the design
of speech analysis/synthesis systems as
depicted in Figure 1.3.

Analysis part extracts underlying
parameters of time-varying model from
speech waveform.

Synthesis takes extracted parameters and
models to put back together the speech
waveform.

An objective in this development is to
achieve an identity system for which the
output equals to input when no
manipulation is performed.

A number of other analysis/synthesis
methods can be derived based on various
useful mathematical representations in
time or frequency.

These analysis/synthesis methods are the
backbone for applications that transform
the speech waveform into some desirable
form.
23 March 2016
Veton Këpuska
28
Applications (Project Areas)

Speech Modification:

time-scale manipulations:

Fitting the speech waveform 






Message playback
Voice mail
Reading machines and books for the blind

Learning a foreign language
Slowing down speech –
Voice disguise
Entertainment
Speech synthesis
Spectral change of frequency compression and expansion:


Speeding up speech –
Voice transformations using Pitch and spectral changes of speech
signal:




In Radio and TV commercials into an allocated time slot and the
synchronization of audio and video presentation.
may be useful in transforming speech as an aid to the partially deaf.
Many methods can be applied to music and special effects.
23 March 2016
Veton Këpuska
29
Applications (Project Areas)
 Speech Coding
 Goal is to reduce the information rate measured in
bits per second while maintaining the quality of the
original waveform.
 Waveform coders:


Represent the speech waveform directly and do not rely on
a speech production model.
Operate in a high range of 16-64 kbps
 Vocoders:



Largely are speech model-based and rely on a small set of
model parameters.
Operate at the low bit range of 1.2-4.8 kbps
Lower quality then waveform coders.
 Hybrid coders:


23 March 2016
Partly waveform based and partly speech model-based
Operate in the 4.8 – 16 kbps range
Veton Këpuska
30
Applications (Project Areas)
 Applications of speech coders include:
 Digital telephony over constrained
bandwidth channels





23 March 2016
Cellular
Satellite
Voice over IP (Internet)
Video phones
Storage of Voice messages for computer voice
mail applications.
Veton Këpuska
31
Applications (Project Areas)
 Speech Enhancement
 Goal is to improve the quality of degraded speech.
 Preprocess speech before is degraded:

Increasing the broadcast range of transmitters constrained
by a peak power transmission limits (e.g., AM radio and TV
transmissions).
 Enhancing the speech waveform after it is degraded.




23 March 2016
Reduction of additive noise in
 (Digital) telephony
 Vehicle and aircraft communications
Reduction of interfering backgrounds and speakers for the
hearing impaired,
Removal of unwanted convolutional channel distortion and
reverberation
Restoration of old phonograph recordings degraded by:
 Acoustic horns
 Impulse-like scratches from age and wear
Veton Këpuska
32
Applications (Project Areas)
 Speaker Recognition
 Speech signal processing exploits the variability of
speech model parameters across speakers.
 Verifying a person’s identity (Biometrics)
 Voice identification in forensic investigation.
 Understanding of the speech model features that cue
a person’s identity is also important in speech
modification where model parameters can be
transformed for the study of specific voice
characteristics:
 Speech modification and speaker recognition can be
developed synergistically.
 Speech (Voice) Recognition is covered in ECE 5526
23 March 2016
Veton Këpuska
33
End
Download