Class Intro

advertisement
EE 225D
Audio Signal Processing in Humans and Machines
Prof. N. Morgan and friends
MW 4:00-5:30
http://www.icsi.berkeley.edu/eecs225d/spr12/overview.
html
Textbook:
Speech and Audio Signal Processing
Gold, Morgan, and Ellis
Wiley&Sons, 2nd edition, 2011
Prerequisites
EE123 or equivalent, and Stat 200A or
equivalent; or grad standing and consent of
instructor
Speech and audio signal
processing:
why does this material matter?
• Speech w/o visual vs visual w/o speech
• Requires DSP, machine learning
• Multidisciplinary tasks are good training
• Many applications!
What should we be able to
do
(automatically)?
• Human example suggests, plenty
• What was said
• Who said it
• When they said it
• What it meant
• How to respond
Why is it hard?
• Speaker variability (within and between)
• Noise, reverberation, channel
• Confusable vocabulary
• Meaning and tone
Course Philosophy I
• People can do these tasks effortlessly
• Include psychoacoustics and
physiology
• Also some acoustics
• But of course, also DSP and machine
learning
Course Philosophy II
• First part of the course is basic stuff
• The rest is applications
• Much of the course grade based on an
original project
• Some practice in oral presentation
Section I: Broad background
• Synthesis/vocoding history (chaps 2&3)
• Recognition history (chap 4)
• Machine recognition basics (chap 5)
• Human recognition basics (chap 18)
Section II: Scientific
background
• Pattern classification (chaps 8 and 9)
• Ear physiology (chap 14)
• Acoustics (chaps 10 and 13)
• Linguistic sound categories (chap 23)
Section IIIa: Engineering Apps
• Signal processing “front end” (chaps
19-22)
• Perceptual audio coding (chap 35)
• Music signal analysis (chap37)
• Source separation (chap 39)
Section IIIb: Engineering Apps
• Deterministic sequence recognition (chap 24)
• Statistical modeling and inference (chaps 25,26)
• Discriminant methods and adaptation (chaps
27,28)
Section IIIc: Engineering Apps
• Speech synthesis (chap 30)
• Spoken dialog systems (chap29++)
• Speaker verification (chap 41)
• Speaker diarization (chap 42)
Course grading
• Quizzes/assignments (for first half): 30%
• Project proposal: 10%
• Project oral presentation: 20%
• Project write-up & results: 40%
Course location
• After today, 6 floor ICSI
• 1947 Center Street, between Milvia and
th
MLK
• Class will start at 4:15 instead of 4:10 (15
minute walk from Cory)
• Office hour, one hour before each class
Course location
Download