More Details for FDP Click Here

advertisement
Write up on the FDP held on 27th December, 2014
Presenter: Ms. Shweta Bansal
(Asst. Prof. of Engineering Department)
The Presenter gives her Presentation on the topic “Speech Processing”. In this topic she defines
speech as “when we as humans speak, air pass from our lungs through our mouth and nasal
cavity, and this air stream is restricted and changed with our tongue and lips. This produces
contractions and expansions of the air, an acoustic wave, a sound. The sounds we form, the
vowels and consonants, are usually called phones. The phones are combined together into
words. Speech signal refers to the analog electrical representation of the contractions and
expansions of air.”
She includes processing of speech in the following way:
 Speech recognition (speech to text)
 Speech synthesis (text to speech)
 Speaker identification (identify the person who is speaking)
Speech Recognition is hard due to the following reasons:






Digitization: Converting analogue signal into digital representation
Signal processing: Separating speech from background noise
Phonetics: Variability in human speech
Phonology: Recognizing individual sound distinctions (similar phonemes)
Lexicology and syntax: Disambiguating homophones & Features of continuous speech
Variation among speakers due to
Vocal range (f0, and pitch range)
Voice quality (growl, whisper, physiological elements such as nasality, adenoidality,
etc)
ACCENT!!! (Vowel systems, consonants, allophones, etc.)
 Variation within speakers due to
Health, emotional state
Ambient conditions
 Speech style: formal read vs. Spontaneous
 Identifying phonemes:
Differences between some phonemes are sometimes very small
Mismatched Phonemes
Parameters of ASR
Different types of tasks with different difficulties
 Speaking mode (isolated words/continuous speech)
 Speaking style (read/spontaneous)
 Enrollment (speaker-independent/dependent)
 Vocabulary (small < 20 wd/large >20kword)
Speaking Mode
 Isolated speech - the speaker has to speak word-by-word into the system.
 Connected speech - the speaker can speak a number of words without stopping.
 Continuous speech - like human.
Audible Range (Hearing Range)
 Humans can generally hear sounds with frequencies between 20 Hz and 20,000 Hz (the
audio range or hearing range) although this range varies significantly with age,
occupational hearing damage, and gender. The majority of people can no longer hear
20,000 Hz by the time they are teenagers, and progressively lose the ability to hear higher
frequencies as they get older.
 Most human speech communication takes place between 200 and 8,000 Hz and the
human ear is most sensitive to frequencies around 1,000-3,500 Hz.
 Sound above the hearing range is known as ultrasound and that below the hearing range
as infrasound.
Some Software of speech recognition
 Dragon Naturally Speaking
 Speak Q
 Microsoft Accessibility
 Dictate (MAC Product)
Download