Write up on the FDP held on 27th December, 2014 Presenter: Ms. Shweta Bansal (Asst. Prof. of Engineering Department) The Presenter gives her Presentation on the topic “Speech Processing”. In this topic she defines speech as “when we as humans speak, air pass from our lungs through our mouth and nasal cavity, and this air stream is restricted and changed with our tongue and lips. This produces contractions and expansions of the air, an acoustic wave, a sound. The sounds we form, the vowels and consonants, are usually called phones. The phones are combined together into words. Speech signal refers to the analog electrical representation of the contractions and expansions of air.” She includes processing of speech in the following way: Speech recognition (speech to text) Speech synthesis (text to speech) Speaker identification (identify the person who is speaking) Speech Recognition is hard due to the following reasons: Digitization: Converting analogue signal into digital representation Signal processing: Separating speech from background noise Phonetics: Variability in human speech Phonology: Recognizing individual sound distinctions (similar phonemes) Lexicology and syntax: Disambiguating homophones & Features of continuous speech Variation among speakers due to Vocal range (f0, and pitch range) Voice quality (growl, whisper, physiological elements such as nasality, adenoidality, etc) ACCENT!!! (Vowel systems, consonants, allophones, etc.) Variation within speakers due to Health, emotional state Ambient conditions Speech style: formal read vs. Spontaneous Identifying phonemes: Differences between some phonemes are sometimes very small Mismatched Phonemes Parameters of ASR Different types of tasks with different difficulties Speaking mode (isolated words/continuous speech) Speaking style (read/spontaneous) Enrollment (speaker-independent/dependent) Vocabulary (small < 20 wd/large >20kword) Speaking Mode Isolated speech - the speaker has to speak word-by-word into the system. Connected speech - the speaker can speak a number of words without stopping. Continuous speech - like human. Audible Range (Hearing Range) Humans can generally hear sounds with frequencies between 20 Hz and 20,000 Hz (the audio range or hearing range) although this range varies significantly with age, occupational hearing damage, and gender. The majority of people can no longer hear 20,000 Hz by the time they are teenagers, and progressively lose the ability to hear higher frequencies as they get older. Most human speech communication takes place between 200 and 8,000 Hz and the human ear is most sensitive to frequencies around 1,000-3,500 Hz. Sound above the hearing range is known as ultrasound and that below the hearing range as infrasound. Some Software of speech recognition Dragon Naturally Speaking Speak Q Microsoft Accessibility Dictate (MAC Product)