CS 582 Intro to Speech Processing Spring 2014 Credits: 3 units Contact Hours: Monday and Wednesday 1400-1515 Instructors: Chuck Konopka Office: GMCS 562 Email: CKonopka@mail.SDSU.edu Office Hours: Mondays and Wednesdays 1530 – 1630 (and by appointment) Course Materials 1. Required text: Huang, Acero & Hon, Spoken Language Processing, A Guide to Theory, Algorithm, and System Development, Prentice Hall PTR, 2001. ISBN-13: 0-13-022616-5 2. CS 582 lecture notes/slides (available on Blackboard) Course Information for CS 582 Description from the Official Course Catalog Fundamentals of speech processing and speech recognition. Physical aspects of speech production and perception. Mathematical models for speech recognition. Corpus development: data collection, processing, and evaluation. Applications of speech processing and associated research topics. Prerequisites: Computer Science 310 Course Type: Selected elective course in the program Specific Goals for CS 582 Course-Level Student Learning Outcomes 1. Ability to explain and apply the general concept of modeling as well as specific modeling techniques presented in class. 2. Ability to explain the physical (biological) model of speech perception and production and how these concepts are used to implement corresponding computational models. 3. Ability to explain and apply concepts related to speech data collection, including the ethical aspects of the treatment of human subjects. 4. Ability to explain the concept of phono-acoustic modeling. 5. Ability to explain the concept of language grammar, its various forms and their relevancy to speech recognition and production. 6. Ability to apply relevant grammar modeling techniques to the problem of evaluating token strings for admissibility to a language. 7. Ability to explain and apply various signal processing concepts to implement speech signal (and/or other) data feature extraction. 8. Ability to explain the pattern recognition algorithms presented in class, how they differ, when and how to apply them. 9. Ability to apply relevant pattern recognition algorithms to implement classifiers of speech and/or other data. 10. Ability to integrate and apply concepts from previous CS core courses with concepts learned in CS582 to complete a team project. Relationship to CS Program Course Outcomes CS 582 addresses the following CS Program course outcomes: 1. An ability to apply knowledge of computing and mathematics 2. An ability to analyze a problem, and identify and define the computing requirements appropriate to its solution 3. An ability to design, implement, and evaluate a computer-based system, process, component, or program to meet desired needs 4. An ability to function effectively on teams to accomplish a common goal 5. An understanding of professional, ethical, legal, security and social issues and responsibilities 6. Recognition of the need for and an ability to engage in continuing professional development 7. An ability to use current techniques, skills, and tools necessary for computing practice 8. An ability to apply mathematical foundations, algorithmic principles, and computer science theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices. 9. An ability to apply design and development principles in the construction of software systems of varying complexity. Topics Covered The following topics are covered in CS 582: 1. Course Introduction: A High-Level Introduction to Speech Processing 2. The Concept of Modeling from Observation and Data 3. The “Natural Model”: A Review of the Biological Model of Speech Production and Perception 4. Interpreting the Natural Model: Deriving a Mathematical Model of Speech Data Processing from the Biological Model 5. Signal Processing: An Introduction to Signal Processing Concepts: Sampling, Buffering, Filtering, Transforms, Concepts, Tools, etc. 6. Feature Extraction: Implementing the Mathematical Model Using Signal Processing Concepts to Extract Features for Classification, the Mel-Frequency Cepstral Transform (MFCC), Linear Predictive Coding (LPC) 7. A Review of Classification Techniques: Supervised vs. Unsupervised Learning, Gaussian Classifiers, Bayes Theory, Neural Networks, Clustering, Hidden Markov Models, Support Vector Machines 8. Phonetics and Phonology: Concepts, Phonetic Modeling 9. Grammatical Concepts: Rule-based vs. Stochastic, Syntax, Semantics, Language Modeling, Chomsky Hierarchy, Context-Free Grammars 10. N-Gram Modeling: Concept and Application to Speech Processing, Modeling Grammar, Practical Issues and Solutions 11. Collecting Speech Data: Study Design, Speech Corpora, Ethics in Data Collection from Human Subjects 12. Constructing a Simple Speech Recognition System: Putting it all together, Practical Issues and Solutions 13. Performance Evaluation: Complexity 14. Advanced Topics in Support of Semester Projects Course Schedule and Grading Policies The following schedule is subject to adjustments according to actual class progress. Week 1 Date 1/19 Topics 2 1/26 Class organization: o Syllabus o Grading o Assignments & Tests o Semester Project o Project Teams Introduction to Key Concepts: o o o o o Reading Text: 1.1-1.7 Text: 7.1-7.5 Assignment Text: 9.4 Classification Algorithms: Due: Mon. 2/16 6.1-6.5 Discrete Hidden Markov Models: Due: Mon. 3/2 What is Speech Processing? What is a Model? A Model of Human Speech Production A Model of Human Speech Perception Deriving a Computational Model from The Biological Model o Phonetics (The Acoustic-Phonetic Model of Speech) o 3 2/2 4 2/9 5 2/16 6 2/23 The Challenges of Speech Recognition: Data Variability Acoustic, Syntactic and Semantic Context Classification Algorithms: o A review of some basic statistics o Supervised vs. Unsupervised learning o Clustering k-means, Vector Quantization Self-Organizing Maps Bayesian classifier Clustering Hierarchical Clustering o Neural Networks, Hebian Learning o Univariate Gaussian Classifiers o Gaussian Mixture Models o Support Vector Machines o “Deep Learning” Algorithms Classification Algorithms, Continued Discrete Hidden Markov Models o Motivation for Use o Evaluation o Traversal o Training o HMM Variations o Applications to Speech Recognition o Other Applications Semi-Continuous HMMs Continuous HMMs HMMs, Continued 9.4 7 3/2 Midterm Review 8 3/9 9 3/16 Feature Extraction/Signal Processing o What is feature extraction, Why is it needed? o A Review of Needed Concepts o Sampling Theory/Nyquist Theory Sufficient sampling for Speech o Windowing Functions Boxcar, Hann, Hamming, etc. Spectral Leakage as Motivation o The Mel-Frequency Cepstral Coefficient Transform (MFCC) o The Linear Predictive Coding Algorithm Feature Extraction/Signal Processing, continued Implementing Speech Recognition: o Implementing a Speech Recognition System Using MFCCs and Semi-Continuous HMMs 10 3/23 11 12 3/30 4/6 13 4/13 Midterm: Due: Mon. 3/16 Building a Simple Speech Recognition System using MFCCs and SC/HMMs: Due: Mon. 4/6 Language Models o Syntax: Rule-based Grammar Context-Free Grammar Chomsky’s “Universal Language Theory” “Grammar Hierarchy” o Stochastic Grammar The Concordance N-Grams PCFGs o Semantic Models/Deriving Meaning From Speech Thematic Roles The Proposition Bank Word Net The Proposition Bank Metaphor, Simile Spring Break Language Models continued 18.1-18.7 20.1-20.10 Discourse o Segmentation o Coherence o Reference Resolution 21.1-21.10 22.1-22.5 24.1 17.1-17.6 19.1-19.6 Mimicking Writing Style with NGrams: Due: 4/27 14 15 4/20 4/27 16 5/4 o Anaphora o Turn cues and Turn-taking Discourse continued Speech Synthesis o Approaches Phoneme/Syllable Synthesis Concatenative Synthesis Formant Synthesis o Implementation Final Week: Review for Final Semester Project Presentation Final Semester Project Important Dates: o Last class is May 6th o Last day of class is May 7th o Finals begin May 8th. o Commencement begins May 15th. Major Assignments: Team Projects Class will be divided into 3-5 person teams. Each team will propose and refine a project topic that utilizes Speech Processing theory, tools and methods learned in class. Activities will begin with project planning and culminate in a review of the team results. Each team will be required to submit 4 reports related to their project: 1. The first report will state and analyze the objectives and requirements for the system as well as describe the approach and schedule to completion. 2. The second and third submissions will report progress. 3. The fourth report will describe the completed project, including a summary of the proposal, how Speech Processing theory and tools were applied, a presentation of the work product (software, etc.) all data used to train and test the system and a demonstration of the project. Scored activities and weighting by percentage of total score Activity Homework Exercises/Participation Midterm Final Semester Project Grading Scale: Grade Percentage A >90% B >=80% C >=70% D >=60% F <60% Percentage 20 10 20 25 25 Other Course Policies Special Assistance: If you are a student with a disability and believe you will need accommodations for this class, it is your responsibility to contact Student Disability Services at (619) 594-6473. To avoid any delay in the receipt of your accommodations, you should contact Student Disability Services as soon as possible. Please note that accommodations are not retroactive, and that accommodations based upon disability cannot be provided until you have presented your instructor with an accommodation letter from Student Disability Services. Your cooperation is appreciated.