EE 225D Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr12/overview. html Textbook: Speech and Audio Signal Processing Gold, Morgan, and Ellis Wiley&Sons, 2nd edition, 2011 Prerequisites EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor Speech and audio signal processing: why does this material matter? • Speech w/o visual vs visual w/o speech • Requires DSP, machine learning • Multidisciplinary tasks are good training • Many applications! What should we be able to do (automatically)? • Human example suggests, plenty • What was said • Who said it • When they said it • What it meant • How to respond Why is it hard? • Speaker variability (within and between) • Noise, reverberation, channel • Confusable vocabulary • Meaning and tone Course Philosophy I • People can do these tasks effortlessly • Include psychoacoustics and physiology • Also some acoustics • But of course, also DSP and machine learning Course Philosophy II • First part of the course is basic stuff • The rest is applications • Much of the course grade based on an original project • Some practice in oral presentation Section I: Broad background • Synthesis/vocoding history (chaps 2&3) • Recognition history (chap 4) • Machine recognition basics (chap 5) • Human recognition basics (chap 18) Section II: Scientific background • Pattern classification (chaps 8 and 9) • Ear physiology (chap 14) • Acoustics (chaps 10 and 13) • Linguistic sound categories (chap 23) Section IIIa: Engineering Apps • Signal processing “front end” (chaps 19-22) • Perceptual audio coding (chap 35) • Music signal analysis (chap37) • Source separation (chap 39) Section IIIb: Engineering Apps • Deterministic sequence recognition (chap 24) • Statistical modeling and inference (chaps 25,26) • Discriminant methods and adaptation (chaps 27,28) Section IIIc: Engineering Apps • Speech synthesis (chap 30) • Spoken dialog systems (chap29++) • Speaker verification (chap 41) • Speaker diarization (chap 42) Course grading • Quizzes/assignments (for first half): 30% • Project proposal: 10% • Project oral presentation: 20% • Project write-up & results: 40% Course location • After today, 6 floor ICSI • 1947 Center Street, between Milvia and th MLK • Class will start at 4:15 instead of 4:10 (15 minute walk from Cory) • Office hour, one hour before each class Course location