International Journal of Engineering Trends and Technology (IJETT) – Volume 22 Number 1- April 2015 Speech Signal Analysis And Synthesis By Using LPC Technique Ravindrbabu A1, Rambabu A2, Santhoshkumar G3 1 M. Tech student, 2M. Tech student, 3M. Tech student Department of E I E& GITAM University, Visakhapatnam, Andhra Pradesh, INDIA Abstract -LPC analysis and synthesis is a primary task implemented in techniques like speech coding, speech ecognition, and speech enhancement. It involves processing of speech signal and reconstruction of it. Speech analysis is a task of extracting the encoded speech signal. Speech analysis involves sampling the speech signal, framing the samples, overlapping the frames, windowing of each frame, calculation of linear predictive coefficients (LPC) and extraction of the residue or synthesis or encoded speech signal. The filter used in analysis is an all pole filter or FIR filter. Speech synthesis involves the computation of gain and reconstruction of the speech signal. Using the encoded speech signal. The filter used for synthesis is an all zero filter or IIR filter. The performance of speech analysis and synthesis is computed by calculating the signal to noise ratio (SNR), mean opinion score (MOS). The software used is MATLAB. I. systems were designed to support telecommunication applications, by limiting the frequency contents between 300 and 3400 Hz. To convert the analog speech signal to digital format, to maintain the perceptual quality and to make the digital speech signal indistinguishable from the input it is necessary to sample the speech signal with more than 8 bits per sample. The block diagram of a speech coding system is shown in Figure 1.1 Throughout the thesis the parameters considered for the digital speech signal are sampling frequency of 8 KHz and 8 bits per sample. Hence the input speech signal taken will have a bit-rate equal to 64 Kbps. INTRODUCTION Linear predictive coding (LPC) is defined as a digital method for encoding an analog signal in which a particular value is predicted by a linear function of the past values of a signal. LPC is one of the methods of compression that models the process of speech production. linear prediction filter attempt to predict future value of the input signal based on past signals. A speech coder is one which converts a digitized speech signal into a coded representation and transmits it in the form of frames. II. SPEECH CODING SYSTEM Fig1: Speech coding system. A. CLASSIFICATION OF SPEECH CODERS Speech codes are classified based on the bit-rate at which they produce output with reasonable quality and on the type of coding techniques used for coding the speech signal. B. CLASSIFICATION BY BIT-RATE In a speech coding system, initially the input speech signal which is analog in nature is digitized using a filter, sampler and analog to digital converter (A/D) circuits. The filter used is an anti aliasing filter which is a low pass filter used before a sampler to remove all signal frequencies that are above the nyquist frequency. The filtering is done to avoid the problem of aliasing. If the speech signal sampling frequency is less than twice the bandwidth of a sampled speech signal the problem of aliasing occurs. The best solution to aliasing is to make the sampling frequency greater by 2.5 times the bandwidth of the analog speech signal. According to nyquist theorem the sampling frequency must be at least twice the bandwidth of the continuous-time signal in order to avoid aliasing. A value of 8 KHz is commonly selected as the standard sampling frequency for the telephone speech signals, since the telephone speech signal frequency range is between 300 to 3400 Hz. Later the sampler converts the analog speech signal into a discrete form and will be given as an input to A/D converter whose output is a digitized speech signal. Most speech coding ISSN: 2231-5381 The classification of speech coders based on the bitrate is shown in Table. Table: Classification of Speech Coders based on the Bit-Rate Bit-Rate Range Type of coder High bit-rate coders >15 Kbps Medium bit-rate coders 5 to 15 Kbps Low bit-rate coders Very Low bit-rate coders 2 to 5 Kbps <2 Kbps Speech codes are classified as High bit-rate coders, Medium bit-rate coders, Low bit-rate coders and Very Low bit-rate coders depending on the bit-rate range at which the speech coders produce reasonable quality. http://www.ijettjournal.org Page 31 International Journal of Engineering Trends and Technology (IJETT) – Volume 22 Number 1- April 2015 III. CLASSIFICATION BY CODING TECHNIQUES Based on the type of coding technique used speech codes are classified into three types and are explained below. They are Waveform coders Parametric coders Hybrid coders A. WAVEFORM CODERS Waveform coders digitize the speech signal on a sampleby-sample basis. Its main goal is to make the output waveform to resemble the input waveform. So waveform coders retain good quality speech. Waveform coders are low complexity coders, which produce high quality speech at data rates above and around 16 Kbps. When the data rate is lowered below this value the reconstructed speech signal quality gets degrade. Waveform coders are not specific to speech signals and can be used for any type of signals. The two types of waveform coders are Time domain Waveform Coders and Frequency domain Waveform Coders. Time domain waveform coders utilize the digitization schemes based on the Time domain properties of the speech signal. Some of the examples of Time domain waveform coding techniques are Pulse Code Modulation (PCM), Differential Pulse Code Modulation (DPCM), Adaptive Differential Pulse Code Modulation (ADPCM) and Delta Modulation. Frequency domain waveform coders segment the speech signal into small frequency bands and each band is coded separately using waveform coders. In these coders, the accuracy of encoding is altered dynamically between the bands to suit the requirements of a speech waveform. Examples of frequency domain waveform coders are Sub band coders and Adaptive transform coders. B. PARAMETRIC CODERS In parametric coders the speech signal is assumed to be generated from a model which is controlled by some speech parameters. In these coders the speech signal is modeled using a limited number of parameters corresponding to the speech production mechanism. These parameters are obtained by analyzing the speech signal and quantizing before transmission. At the receiving end the decoder uses these parameters to reconstruct the original speech signal. In these coders the output speech signal is not an exact replica of the input speech signal and the resemblance of the input speech signal is lost. However, the output speech signal will utter the same as the input speech signal. Using parametric coders it is possible to obtain very low bit-rates (< 2.4 Kbps) at a reasonable quality. To retain the quality one has to use waveform coders taking advantage of the properties of both speech production and auditory mechanisms, so that the resulting quality is good at the cost of increased bit-rates compared to parametric coders. At lower bit-rates, the quality attained by the waveform coders is less compared to parametric coders. Examples of parametric coders are Linear Predictive Coders (LPC) and Mixed Excitation Linear Predictive (MELP) coders. This class of coders works well for low bit-rates. In these coders increasing the bit-rate does not ISSN: 2231-5381 result in an increase in the quality as it is restricted by the coder chosen. For this type of coders the bit-rate is in the range of 2 to 5 Kbps. C. HYBRID CODERS Hybrid coders try to fill the gap between waveform coders and parametric coders. Hybrid coders operate at medium bit-rates between those of waveform coders and parametric coders and produce high quality speech than parametric coders. There are a number of hybrid encoding schemes, which differ in the way the excitation parameters are generated. Some of these techniques quantize the residual signal directly, while others substitute approximated quantized waveforms selected from an available set of waveforms. A hybrid coder is a combination of both waveform coder and parametric coder. Like parametric coders, hybrid coders relies on the speech production model, as in waveform coders an attempt is made to match the original signal with the decoded signal in time domain. An example of hybrid coder is the Code Excited Linear Predictive (CELP) coders. We are generally produces two types of sounds Voiced sounds Un voiced sounds D. VOICED SOUNDS Voiced sounds are usually vowels and often have high average energy levels and very distinct resonant or formant frequencies. Voiced sounds are generated by air from the lungs being forced over the vocal cords. E. UNVOICED SOUNDS Unvoiced sounds are usually consonants and generally have less energy and higher frequencies then voiced sounds. The production of unvoiced sound involves air being forced through the vocal tract in a turbulent flow. F. MARTHA METICAL MODE OF SPEECH PRODUCTION In LPC analysis for each frame a decision-making process is made to conclude whether a frame is voiced or unvoiced. If the frame is voiced an impulse train is used to represent it with non zero taps occurring at intervals of the pitch period. In this thesis the method used for the estimation of the pitch period is the autocorrelation method. If the frame is considered as unvoiced the frame is represented using white noise and the pitch period is zero as the vocal cords do not vibrate. So the excitation to LPC filter is an impulse train or white noise. IV. LPC SPEECH ENCODING/ANALYSIS Analysis of speech signal is employed in variety of systems like voice recognition system and digital speech coding system. In encoding technique we are using sampling, framing, overlapping , windowing techniques are used. A. SAMPLING In sampling the continuous signal is converted to discrete signal. For example conversion of sound wave into the sequence of samples. http://www.ijettjournal.org Page 32 International Journal of Engineering Trends and Technology (IJETT) – Volume 22 Number 1- April 2015 V. LPC SYNTHESIS/DECODING The process of decoding a sequence of speech segments is the reverse of the encoding process. Each segment is decoded individually and the sequence of reproduced sound segments is joined together to represent the entire input speech signal. In synthesis we are using IIR filter. The transfer function of IIR filter is Fig2: sampling of signal B. FRAMING Normally a speech signal is not stationary, but seen from a short-time point of view it is. This result from the fact that the glottal system can’t change immediately. It is having n samples with signal frequency fs with time interval tst(stationary) then n=tstfs C. OVERLAPPING A. LPC MODEL The overlap add method is a efficient way to evaluate the discrete convolution of a very long signal is h(n). The concept is to divide the problem into multiple convolutions of h[n] with short segment of x[n]. The most preferable method for overlapping of frames is circular overlap add method. The particular source-filter model used in LPC is known as the linear predictive coding model. It has two key components: analysis or encoding and synthesis or decoding.The analysis part of LPC involves examining the speech signal and breaking it down into segments or blocks.The receiver performs LPC synthesis by using the answers received to build a filter that when provided the correct input source will be able to accurately reproduce the original speech signal.Essentially, LPC synthesis tries to imitate human speech production. D. WINDOWING In windowing we are mainly using hamming window because it have less disadvantages compared to other windows of rectangular, haning windows. In hamming window we have “raised cosine”. The “raised cosine” with these particular coefficients was proposed by Richard W. Hamming. We find it by using the formula w(n)= 0.54-0.46cos(2*п* n/N-1) E. CALCULATING LPC COEFFICIENTS The values for the reflection coefficients are used to define the digital lattice filter which acts as the vocal tract in this speech synthesis system. New values for k1-k10 are needed for every 25msec block of the utterance. The value for k1-k10 is derived from the LPC-10 coefficient value, so LPC value must be calculated first.LPC is based on linear equations to formulate a mathematical model of the human vocal tract and an ability to predict a speech sample based on previous ones. The vocal tract is modeled by an all pole transfer function of the form B. ANALYSIS/SYNTHESIS TECHNIQUES The transmitter or sender analyses the riginal signal and acquires parameters for the model which are sent to the receiver. The receiver then uses the model and the parameters it receives to synthesize an approximation of the original signal. VI. LPC APPLICATION In general, the most common usage for speech compression is in standard telephone system. Further applications of LPC and other speech compression schemes are voice mail systems telephone answering machines and multimedia applications. Most multimedia applications, unlike telephone applications, involve one-way communication and involve storing the data. VII. RESULTS Where p is the number of LPC coefficients, in our case p=10 And are the LPC coefficient values In order to calculate the LPC values for each 25msec block of the utterance, I used mat lab’s built in LPC function. Function LPC-coefficients=LPC (Y,P) Y is a 25 millisecond utterance P is the number of LPC coefficients (poles) Fig3: Input signal before overlap ISSN: 2231-5381 http://www.ijettjournal.org Page 33 International Journal of Engineering Trends and Technology (IJETT) – Volume 22 Number 1- April 2015 This is the input signal. It is plotted by taking time on Xaxis and amplitude on Y-axis. The blue shaded portion of the signal represents the energy present in the signal. Fig7: Synthesis signal after removing overlap. It represents the synthesis signal after removing overlapping. In this the number of frames are decreased. Fig4 : Input signal after overlap It is obtained by overlapping the frames of input signal. Due to overlapping, the number of frames in the signal is increased. Fig7: Window signal. It represents the Hamming windows signal which contains only the main lobe without containing any side lobes. Fig5: Analysis signal It represents the signal obtained after analysis or encoding. This signal is also called as the residue signal. It is obtained by passing the analysis signal through an FIR filter which is also called as “All pole filter”. VIII. CONCLUSION Linear predictive coding encoders break up a sound signal into different segments and then send information on each segment to the decoder. The encoder send information on whether the segment is voiced or unvoiced and the pitch period for voiced segment which is used to create an excitement signal in the decoder. The encoder also send information about the vocal tract which is used to build a filter on the decoder side which when given the excitement signal as input can reproduce the original speech. REFERENCES Fig6 :Synthesis signsl before remove overlap It represents the synthesis signal also called as decoded signal. It is obtained by passing the analysis signal through an IIR filter which is an “All zero filter”. 1. 2. 3. 4. ISSN: 2231-5381 Willaim H. Press, Brian P. Flannery, Saul A. Teukolsky, and Willaim T Vetterling. Numerical Recipies in C: “speech analysis and synthesis”. Cambridge University Press, Cambridge, England-1988. Steven F. Boll. “LPC analysis and synthesis” IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-27(2):113-120, April 1979. J. Makhoul. “Linear predictive coding” A tutorial review. Proceedings of the IEEE, 63(4):561-580, April 1975. Thomas F Queteri, “Speech Processing”. http://www.ijettjournal.org Page 34 International Journal of Engineering Trends and Technology (IJETT) – Volume 22 Number 1- April 2015 5. 6. 7. 8. A.M.Kandoz, “low bit rate speech coding applications”. Emmanuel C. Ifeachor and Barrie W. Jervis. “Speech synthesis” Apractical approach. Addison-Wesley-1993. Kinjo, Funaki.K “Human Speech Recognition Based on Speech Analysis”IEEE Industrial Electronics, IECON 32nd Annual Conference on 2006. A.S. Spanias, “Speech coding: a tutorial review,” Proceedings of the IEEE, Vol 82, Issue 10, Oct 1994, pp. 1541–1582. ISSN: 2231-5381 http://www.ijettjournal.org Page 35