ch5 (recognition principles).ppt

7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training In HMM 1 Speech Recognition Concepts Speech recognition is inverse of Speech Synthesis Speech Synthesis Text Speech Speech Phone Processing Sequence NLP NLP Speech Processing Text Speech Understanding Speech Recognition 2 Speech Recognition Approaches Bottom-Up Approach Top-Down Approach Blackboard Approach 3 Bottom-Up Approach Signal Processing Knowledge Sources Feature Extraction Voiced/Unvoiced/Silence Segmentation Signal Processing Sound Classification Rules Feature Extraction Phonotactic Rules Segmentation Lexical Access Language Model Segmentation Recognized Utterance 4 Top-Down Approach Inventory Word of speech Dictionary Grammar recognition units Feature Analysis Syntactic Hypo thesis Unit Matching System Lexical Hypo thesis Utterance Verifier/ Matcher Recognized Utterance Task Model Semantic Hypo thesis 5 Blackboard Approach Acoustic Processes Environmental Processes Lexical Processes Black board Semantic Processes Syntactic Processes 6 top down An overall view of a speech recognition system bottom up 7 From Ladefoged 2001 Recognition Theories Articulatory Based Recognition – Use from Articulatory system for recognition – This theory is the most successful until now Auditory Based Recognition – Use from Auditory system for recognition Hybrid Based Recognition – Is a hybrid from the above theories Motor Theory – Model the intended gesture of speaker 8 Recognition Problem We have the sequence of acoustic symbols and we want to find the words that expressed by speaker Solution : Finding the most probable word sequence having Acoustic symbols 9 Recognition Problem A : Acoustic Symbols W : Word Sequence we should find ŵ so that P(wˆ | A)  max P(w | A) w 10 Bayse Rule P( x | y) P( y)  P( x, y) P( y | x) P( x) P( x | y )  P( y ) P( A | w) P( w)  P( w | A)  P( A) 11 Bayse Rule (Cont’d) P(wˆ | A)  max P(w | A) w P( A | w) P( w)  max w P( A) ˆ  Arg max P( w | A) w w  Arg max P( A | w) P( w) w 12 Simple Language Model w  w1w2 w3  wn n P( w)   P( wi | wi 1wi 2  w1 ) i 1 Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models. 13 Simple Language Model (Cont’d) n Trigram : P( w)   P( wi | wi 1wi 2 ) i 1 n Bigram : P( w)   P(wi | wi 1 ) i 1 n Monogram : P( w)   P( wi ) i 1 14 Simple Language Model (Cont’d) Computing Method : P(w3 | w2 w1 )  Number of happening W3 after W1W2 Total number of happening W1W2 AdHoc Method : P(w3 | w2 w1 )  1 f (w3 | w2 w1 )  2 f (w3 | w2 )  3 f (w3 ) 15 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types 16 From Ladefoged 2001 17 P(A|W) Computing Approaches Dynamic Time Warping (DTW) Hidden Markov Model (HMM) Artificial Neural Network (ANN) Hybrid Systems 18 Dynamic Time Warping Method (DTW) To obtain a global distance between two speech patterns a time alignment must be performed Ex : A time alignment path between a template pattern “SPEECH” and a noisy input “SsPEEhH” 19 Recognition Tasks Isolated Word Recognition (IWR) And Continuous Speech Recognition (CSR) Speaker Dependent And Speaker Independent Vocabulary Size – Small <20 – Medium >100 , <1000 – Large >1000, <10000 – Very Large >10000 20 Error Production Factor Prosody (Recognition should be Prosody Independent) Noise (Noise should be prevented) Spontaneous Speech 21 Artificial Neural Network x0 x1 .w1 . . xN 1 w0  y N 1 y  ( wi xi   ) i 1 wN 1 Simple Computation Element of a Neural Network 22 Artificial Neural Network (Cont’d) Neural Network Types – Perceptron – Time Delay – Time Delay Neural Network Computational Element (TDNN) 23 Artificial Neural Network (Cont’d) Single Layer Perceptron y0 x0 ... ... yM 1 xN 1 24 Artificial Neural Network (Cont’d) Three Layer Perceptron ... ... ... ... 25 Hybrid Methods Hybrid Neural Network and Matched Filter For Recognition Acoustic Output Units Speech Features Delays PATTERN CLASSIFIER 26 Neural Network Properties The system is simple, But too much iterative Doesn’t determine a specific structure Regardless of simplicity, the results are good Training size is large, so training should be offline Accuracy is relatively good 27 Hidden Markov Model Si aij a ji Sj Observation : O1,O2, . . . O1 , O2 , O3 ,, Ot States in time : q1, q2, . . . q1 , q2 , q3 , , qt All states : s1, s2, . . . 28

ch5 (recognition principles).ppt

Related documents

Products

Support

ch5 (recognition principles).ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib