View/Open

advertisement
CS 582 Intro to Speech Processing
Spring 2014
Credits: 3 units
Contact Hours: Monday and Wednesday 1400-1515
Instructors: Chuck Konopka Office: GMCS 562 Email: CKonopka@mail.SDSU.edu
Office Hours: Mondays and Wednesdays 1530 – 1630 (and by appointment)
Course Materials
1. Required text: Huang, Acero & Hon, Spoken Language Processing, A Guide to Theory,
Algorithm, and System Development, Prentice Hall PTR, 2001. ISBN-13: 0-13-022616-5
2. CS 582 lecture notes/slides (available on Blackboard)
Course Information for CS 582
Description from the Official Course Catalog
Fundamentals of speech processing and speech recognition. Physical aspects of speech
production and perception. Mathematical models for speech recognition. Corpus development:
data collection, processing, and evaluation. Applications of speech processing and associated
research topics.
Prerequisites: Computer Science 310
Course Type: Selected elective course in the program
Specific Goals for CS 582
Course-Level Student Learning Outcomes
1. Ability to explain and apply the general concept of modeling as well as specific modeling
techniques presented in class.
2. Ability to explain the physical (biological) model of speech perception and production
and how these concepts are used to implement corresponding computational models.
3. Ability to explain and apply concepts related to speech data collection, including the
ethical aspects of the treatment of human subjects.
4. Ability to explain the concept of phono-acoustic modeling.
5. Ability to explain the concept of language grammar, its various forms and their relevancy
to speech recognition and production.
6. Ability to apply relevant grammar modeling techniques to the problem of evaluating
token strings for admissibility to a language.
7. Ability to explain and apply various signal processing concepts to implement speech
signal (and/or other) data feature extraction.
8. Ability to explain the pattern recognition algorithms presented in class, how they differ,
when and how to apply them.
9. Ability to apply relevant pattern recognition algorithms to implement classifiers of
speech and/or other data.
10. Ability to integrate and apply concepts from previous CS core courses with concepts
learned in CS582 to complete a team project.
Relationship to CS Program Course Outcomes
CS 582 addresses the following CS Program course outcomes:
1. An ability to apply knowledge of computing and mathematics
2. An ability to analyze a problem, and identify and define the computing requirements
appropriate to its solution
3. An ability to design, implement, and evaluate a computer-based system, process,
component, or program to meet desired needs
4. An ability to function effectively on teams to accomplish a common goal
5. An understanding of professional, ethical, legal, security and social issues and
responsibilities
6. Recognition of the need for and an ability to engage in continuing professional
development
7. An ability to use current techniques, skills, and tools necessary for computing practice
8. An ability to apply mathematical foundations, algorithmic principles, and computer
science theory in the modeling and design of computer-based systems in a way that
demonstrates comprehension of the tradeoffs involved in design choices.
9. An ability to apply design and development principles in the construction of software
systems of varying complexity.
Topics Covered
The following topics are covered in CS 582:
1. Course Introduction: A High-Level Introduction to Speech Processing
2. The Concept of Modeling from Observation and Data
3. The “Natural Model”: A Review of the Biological Model of Speech Production and
Perception
4. Interpreting the Natural Model: Deriving a Mathematical Model of Speech Data
Processing from the Biological Model
5. Signal Processing: An Introduction to Signal Processing Concepts: Sampling,
Buffering, Filtering, Transforms, Concepts, Tools, etc.
6. Feature Extraction: Implementing the Mathematical Model Using Signal Processing
Concepts to Extract Features for Classification, the Mel-Frequency Cepstral
Transform (MFCC), Linear Predictive Coding (LPC)
7. A Review of Classification Techniques: Supervised vs. Unsupervised Learning,
Gaussian Classifiers, Bayes Theory, Neural Networks, Clustering, Hidden Markov
Models, Support Vector Machines
8. Phonetics and Phonology: Concepts, Phonetic Modeling
9. Grammatical Concepts: Rule-based vs. Stochastic, Syntax, Semantics, Language
Modeling, Chomsky Hierarchy, Context-Free Grammars
10. N-Gram Modeling: Concept and Application to Speech Processing, Modeling
Grammar, Practical Issues and Solutions
11. Collecting Speech Data: Study Design, Speech Corpora, Ethics in Data Collection
from Human Subjects
12. Constructing a Simple Speech Recognition System: Putting it all together, Practical
Issues and Solutions
13. Performance Evaluation: Complexity
14. Advanced Topics in Support of Semester Projects
Course Schedule and Grading Policies
The following schedule is subject to adjustments according to actual class progress.
Week
1
Date
1/19
Topics

2
1/26

Class organization:
o Syllabus
o Grading
o Assignments & Tests
o Semester Project
o Project Teams
Introduction to Key Concepts:
o
o
o
o
o
Reading
Text: 1.1-1.7
Text: 7.1-7.5
Assignment
Text: 9.4
Classification
Algorithms:
Due:
Mon. 2/16
6.1-6.5
Discrete
Hidden
Markov
Models:
Due:
Mon. 3/2
What is Speech Processing?
What is a Model?
A Model of Human Speech Production
A Model of Human Speech Perception
Deriving a Computational Model from
The Biological Model
o Phonetics (The Acoustic-Phonetic
Model of Speech)
o
3
2/2

4
2/9


5
2/16
6
2/23



The Challenges of Speech Recognition:
 Data Variability
 Acoustic, Syntactic and Semantic
Context
Classification Algorithms:
o A review of some basic statistics
o Supervised vs. Unsupervised learning
o Clustering
 k-means,
 Vector Quantization
 Self-Organizing Maps
 Bayesian classifier Clustering
 Hierarchical Clustering
o Neural Networks, Hebian Learning
o Univariate Gaussian Classifiers
o Gaussian Mixture Models
o Support Vector Machines
o “Deep Learning” Algorithms
Classification Algorithms, Continued
Discrete Hidden Markov Models
o Motivation for Use
o Evaluation
o Traversal
o Training
o HMM Variations
o Applications to Speech Recognition
o Other Applications
Semi-Continuous HMMs
Continuous HMMs
HMMs, Continued
9.4
7
3/2

Midterm Review
8
3/9

9
3/16


Feature Extraction/Signal Processing
o What is feature extraction, Why is it
needed?
o A Review of Needed Concepts
o Sampling Theory/Nyquist Theory
 Sufficient sampling for Speech
o Windowing Functions
 Boxcar, Hann, Hamming, etc.
 Spectral Leakage as Motivation
o The Mel-Frequency Cepstral Coefficient
Transform (MFCC)
o The Linear Predictive Coding Algorithm
Feature Extraction/Signal Processing, continued
Implementing Speech Recognition:
o Implementing a Speech Recognition System
Using MFCCs and Semi-Continuous
HMMs
10
3/23

11
12
3/30
4/6

13
4/13

Midterm:
Due:
Mon. 3/16
Building a
Simple
Speech
Recognition
System using
MFCCs and
SC/HMMs:
Due:
Mon. 4/6
Language Models
o Syntax:
 Rule-based Grammar
 Context-Free Grammar
 Chomsky’s “Universal Language
Theory”
 “Grammar Hierarchy”
o Stochastic Grammar
 The Concordance
 N-Grams
 PCFGs
o Semantic Models/Deriving Meaning From
Speech
 Thematic Roles
 The Proposition Bank
 Word Net
 The Proposition Bank
 Metaphor, Simile
Spring Break
Language Models continued
18.1-18.7
20.1-20.10
Discourse
o Segmentation
o Coherence
o Reference Resolution
21.1-21.10
22.1-22.5
24.1
17.1-17.6
19.1-19.6
Mimicking
Writing Style
with NGrams:
Due:
4/27
14
15
4/20
4/27
16
5/4
o Anaphora
o Turn cues and Turn-taking
 Discourse continued
 Speech Synthesis
o Approaches
 Phoneme/Syllable Synthesis
 Concatenative Synthesis
 Formant Synthesis
o Implementation
Final Week:
 Review for Final
 Semester Project Presentation



Final
Semester Project
Important Dates:
o Last class is May 6th
o Last day of class is May 7th
o Finals begin May 8th.
o Commencement begins May 15th.
Major Assignments: Team Projects
Class will be divided into 3-5 person teams. Each team will propose and refine a project topic
that utilizes Speech Processing theory, tools and methods learned in class. Activities will begin
with project planning and culminate in a review of the team results. Each team will be required
to submit 4 reports related to their project:
1. The first report will state and analyze the objectives and requirements for the system as
well as describe the approach and schedule to completion.
2. The second and third submissions will report progress.
3. The fourth report will describe the completed project, including a summary of the
proposal, how Speech Processing theory and tools were applied, a presentation of the
work product (software, etc.) all data used to train and test the system and a
demonstration of the project.
Scored activities and weighting by percentage of total score
Activity
Homework
Exercises/Participation
Midterm
Final
Semester Project
Grading Scale:
Grade
Percentage
A
>90%
B
>=80%
C
>=70%
D
>=60%
F
<60%
Percentage
20
10
20
25
25
Other Course Policies
Special Assistance: If you are a student with a disability and believe you will need
accommodations for this class, it is your responsibility to contact Student Disability Services
at (619) 594-6473. To avoid any delay in the receipt of your accommodations, you should
contact Student Disability Services as soon as possible. Please note that accommodations are not
retroactive, and that accommodations based upon disability cannot be provided until you have
presented your instructor with an accommodation letter from Student Disability Services. Your
cooperation is appreciated.
Download