ch8.1 (recognition principles).ppt

7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types 1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training In HMM 2 Recognition Tasks Isolated Word Recognition (IWR) Connected Word (CW) , And Continuous Speech Recognition (CSR) Speaker Dependent, Multiple Speaker, And Speaker Independent Vocabulary Size – Small <20 – Medium >100 , <1000 – Large >1000, <10000 – Very Large >10000 3 Speech Recognition Concepts Speech recognition is inverse of Speech Synthesis Speech Synthesis Text Speech Speech Phone Processing Sequence NLP NLP Speech Processing Text Speech Understanding Speech Recognition 4 Speech Recognition Approaches Bottom-Up Approach Top-Down Approach Blackboard Approach 5 Bottom-Up Approach Signal Processing Knowledge Sources Feature Extraction Voiced/Unvoiced/Silence Segmentation Signal Processing Sound Classification Rules Feature Extraction Phonotactic Rules Segmentation Lexical Access Language Model Segmentation Recognized Utterance 6 Top-Down Approach Inventory Word of speech Dictionary Grammar recognition units Feature Analysis Syntactic Hypo thesis Unit Matching System Lexical Hypo thesis Utterance Verifier/ Matcher Recognized Utterance Task Model Semantic Hypo thesis 7 Blackboard Approach Acoustic Processes Environmental Processes Lexical Processes Black board Semantic Processes Syntactic Processes 8 Recognition Theories Articulatory Based Recognition – Use from Articulatory system for recognition – This theory is the most successful until now Auditory Based Recognition – Use from Auditory system for recognition Hybrid Based Recognition – Is a hybrid from the above theories Motor Theory – Model the intended gesture of speaker 9 Recognition Problem We have the sequence of acoustic symbols and we want to find the words that expressed by speaker Solution : Finding the most probable of word sequence by having Acoustic symbols 10 Recognition Problem A : Acoustic Symbols W : Word Sequence we should find Ŵ so that P(Wˆ | A)  max P(W | A) W 11 Bayse Rule P( x | y) P( y)  P( x, y) P( y | x) P( x) P( x | y )  P( y ) P( A | W ) P(W )  P(W | A)  P( A) 12 Bayse Rule (Cont’d) P(Wˆ | A)  max P(W | A) W P( A | W ) P(W )  max W P( A) Wˆ  Arg max P(W | A) W  Arg max P( A | W ) P(W ) W 13 Simple Language Model w  w1w2 w3  wn n P( w)   P( wi | wi 1wi  2  w1 ) i 1  P(W1 ) P(W2 | W1 ) P(W3 | W2 ,W1 ) P(W4 | W3 ,W2 ,W1 )..... P(Wn | Wn 1 , Wn  2 ,..., W1 )  P(Wn ,Wn 1 ,Wn  2 ,..., W1 ) Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models. 14 Simple Language Model (Cont’d) n Trigram : P( w)   P( wi | wi 1wi 2 ) i 1 n Bigram : P( w)   P(wi | wi 1 ) i 1 n Monogram : P( w)   P( wi ) i 1 15 Simple Language Model (Cont’d) Computing Method : P(w3 | w2 w1 )  Number of happening W3 after W1W2 Total number of happening W1W2 AdHoc Method : P(w3 | w2 w1 )  1 f (w3 | w2 w1 )  2 f (w3 | w2 )  3 f (w3 ) 16 Error Production Factor Prosody (Recognition should be Prosody Independent) Noise (Noise should be prevented) Spontaneous Speech 17 P(A|W) Computing Approaches Dynamic Time Warping (DTW) Hidden Markov Model (HMM) Artificial Neural Network (ANN) Hybrid Systems 18 Dynamic Time Warping Dynamic Time Warping Dynamic Time Warping Dynamic Time Warping Dynamic Time Warping Search Limitation : - First & End Interval - Global Limitation - Local Limitation Dynamic Time Warping Global Limitation : Dynamic Time Warping Local Limitation : Artificial Neural Network x0 x1 .w1 . . xN 1 w0  y N 1 y  ( wi xi   ) i 0 wN 1 Simple Computation Element of a Neural Network 26 Artificial Neural Network (Cont’d) Neural Network Types – Perceptron – Time Delay – Time Delay Neural Network Computational Element (TDNN) 27 Artificial Neural Network (Cont’d) Single Layer Perceptron x0 ... xN 1 ... y0 yM 1 28 Artificial Neural Network (Cont’d) Three Layer Perceptron ... ... ... ... 29 2.5.4.2 Neural Network Topologies 30 TDNN 31 2.5.4.6 Neural Network Structures for Speech Recognition 32 2.5.4.6 Neural Network Structures for Speech Recognition 33 Hybrid Methods Hybrid Neural Network and Matched Filter For Recognition Acoustic Output Units Speech Features Delays PATTERN CLASSIFIER 34 Neural Network Properties The system is simple, But too much iteration is needed for training Doesn’t determine a specific structure Regardless of simplicity, the results are good Training size is large, so training should be offline Accuracy is relatively good 35 Pre-processing Different preprocessing techniques are employed as the front end for speech recognition systems The choice of preprocessing method is based on the task, the noise level, the modeling tool, etc. 36 38 39 41 42 ‫روش ‪MFCC‬‬ ‫روش ‪ MFCC‬مبتني بر نحوه ادراک گوش انسان از اصوات مي باشد‪.‬‬ ‫روش ‪ MFCC‬نسبت به ساير وي ِژگيها در محيطهاي نويزي بهتر عمل ميکند‪.‬‬ ‫ً‬ ‫‪ MFCC‬اساسا جهت کاربردهاي شناسايي گفتار ارايه شده است اما در شناسايي گوينده نيز‬ ‫راندمان مناسبي دارد‪.‬‬ ‫واحد شنيدار گوش انسان ‪ Mel‬مي باشد که به کمک رابطه زير بدست مي آيد‪:‬‬ ‫‪43‬‬ ‫مراحل روش ‪MFCC‬‬ ‫مرحله ‪ :1‬نگاشت سيگنال از حوزه زمان به حوزه فرکانس به کمک ‪ FFT‬زمان‬ ‫کوتاه‪.‬‬ ‫)‪ :Z(n‬سيگنال گفتار‬ ‫‪ :‬تابع پنجره مانند پنجره‬ ‫(‪W(n‬همينگ‬ ‫‪WF= e-j2π/F‬‬ ‫;‪m : 0,…,F – 1‬‬ ‫‪ :F‬طول فريم گفتاري‪.‬‬ ‫‪44‬‬ ‫مراحل روش ‪MFCC‬‬ ‫مرحله ‪ :2‬يافتن انرژي هر کانال بانک فيلتر‪.‬‬ ‫‪ M‬تعداد بانکهاي فيلتر مبتني بر معيار مل ميباشد‪.‬‬ ‫‪k  0,1,..., M  1‬‬ ‫لترهاي بانک فيلتر است‪.‬‬ ‫تابع في‪W‬‬ ‫)‪(j‬‬ ‫‪k‬‬ ‫‪45‬‬ ‫توزيع فيلتر مبتنی بر معيار مل‬ ‫‪46‬‬ ‫مراحل روش ‪MFCC‬‬ ‫مرحله ‪ :4‬فشرده سازي طيف و اعمال تبديل ‪ DCT‬جهت حصول به ضرايب‬ ‫‪MFCC‬‬ ‫‪‬‬ ‫‪47‬‬ ‫در رابطه باال ‪ n=0،...،L‬مرتبه ضرايب ‪ MFCC‬ميباشد‪.‬‬ ‫کپستروم‬-‫روش مل‬ ‫سیگنال زمانی‬ ‫فریم بندی‬ |FFT|2 Mel-scaling Logarithm IDCT Cepstra Delta & Delta Delta Cepstra Differentiator 48 Low-order coefficients ‫ضرایب مل‬ ‫کپستروم)‪(MFCC‬‬ ‫‪49‬‬ ‫ویژگی های مل‬ ‫کپستروم)‪(MFCC‬‬ ‫نگاشت انرژی های بانک فیلترمل درجتهی که واریانس آنها‬ ‫ماکسیمم باشد (با استفاده از ‪)DCT‬‬ ‫استقالل ویژگی های گفتاربه صورت غیرکامل نسبت به‬ ‫یکدیگر(تاثیر ‪)DCT‬‬ ‫پاسخ مناسب درمحیطهای تمیز‬ ‫کاهش کارایی آن درمحیطهای نویزی‬ ‫‪50‬‬ Time-Frequency analysis Short-term Fourier Transform – Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. – W(n): windowing function – N: frame length – p: step size 51 Critical band integration Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole 52 Bark scale 53 Feature orthogonalization Spectral values in adjacent frequency channels are highly correlated The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix Decorrelation is useful to improve the parameter estimation. 54

ch8.1 (recognition principles).ppt

Related documents

Products

Support

ch8.1 (recognition principles).ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib