Discriminating between Nasal and Mouth Breathing Peng Yuan B00541592 BSc(Hon’s) Computing Science Supervisor: Dr. Kevin Curran Interim Report, April 2010 1 Acknowledgements I would like to extend sincere thanks and appreciation to my project supervisor, Dr. Kevin Curran, for the initial idea of the project and his invaluable help and guidance throughout this year. I would also like to show my appreciation to all the staff in the library and labs, for their kind help and assistance during the last few months. Lastly I would like to thank my parents for their unending support during my studying abroad and all my fellow classmates for helping me in many ways throughout the project. 2 Abbreviations 3D Three Dimension AFS Amplitude Frequency Spectrum ANN Artificial Neural Network BPNN Back-propagation Neural Network CDSS Clinical Decision Support System CORSA Computerized Respiratory Sound Analysis DFT Discrete Fourier Transform DTFT Discrete Time domain Fourier Transform EPD End-point Detection FFT Fast Fourier Transform LPC Linear Predictive Coding LPCC Linear Predictive Cepstrum Coefficients MFCC Mel Frequency Cepstrum Coefficients UML Unified Modeling Language HOD High-order Difference HR Heart Rate RIP Respiratory Inductive Plethysmography RR Respiratory Rate STFT Short Time Fourier Transform ZCR Zero Crossing Rate 3 Table of Contents Acknowledgements ................................................................................................ 2 Abbreviations ........................................................................................................ 3 Table of Contents .................................................................................................. 4 Declaration ........................................................................................................... 8 Abstract ................................................................................................................. 9 1. Introduction ...............................................................................................10 1.1 Aims and Objectives ............................................................................................................................................. 10 1.2 Outline of Thesis ................................................................................................................................................... 10 2. Literature Review ......................................................................................12 2.1 Introduction to the exploitation of biological sound ............................................................................................... 12 2.2 The study of breathing pattern .............................................................................................................................. 12 2.3 Respiratory rate monitoring .................................................................................................................................. 13 2.4 Sound Analysis and Feature Extraction ............................................................................................................... 13 2.4.1 Filter Analysis of Sound Signal .................................................................................................................. 14 2.4.2 Feature Extraction .......................................................................................................................................... 14 2.5 Digital signal Characteristics in Time Domain ...................................................................................................... 14 2.5.1 Short-time Energy .......................................................................................................................................... 15 2.5.2 the Average Zero Crossing Rate of Short-time .................................................................................... 15 2.6 Digital signal Characteristics in Frequency Domain ....................................................................................... 15 2.6.1 the Linear Predictive Coding (LPC) parameters .................................................................................. 15 2.6.2 the algorithm for LPC parameters ............................................................................................................. 16 2.6.3 the Linear Predictive Cepstrum Coefficients (LPCC) ......................................................................... 18 2.6.4 the Mel Frequency Cepstrum Coefficients (MFCC) ............................................................................ 18 2.7 3. Artificial Neural Network ....................................................................................................................................... 20 Requirements Analysis and Functional Specification ...............................22 3.1 Functional requirements ....................................................................................................................................... 22 3.2 Non-functional requirements ................................................................................................................................ 24 3.2.1 The frequency range to use ........................................................................................................................ 25 3.2.2 Placement of the sensor .............................................................................................................................. 25 3.3 4. Summary .............................................................................................................................................................. 25 Design ........................................................................................................27 4.1 MATLAB ............................................................................................................................................................... 27 4.1.1 Introduction of the related Matlab function ............................................................................................. 27 4.1.1.1 Short time spectrum analysis ..................................................................................................................... 27 4.2 Equipment ............................................................................................................................................................ 28 4.2.1 Acoustic sensor ............................................................................................................................................... 28 4.2.2 Sound Recorder .............................................................................................................................................. 29 4 4.3 System architecture .............................................................................................................................................. 29 4.4 Data modeling ...................................................................................................................................................... 31 4.5 Analyzing methods ............................................................................................................................................... 32 4.6 Sound Signal Pre-processing ............................................................................................................................... 33 4.6.1 Cutting off frequency band .......................................................................................................................... 33 4.6.2 Filter Design ..................................................................................................................................................... 34 4.6.3 End-point Detection of the Signal ............................................................................................................. 36 4.6.3.1 Short-time average amplitude method .................................................................................................... 37 4.6.3.2 Short-time energy method ........................................................................................................................... 37 4.6.3.3 Short-time average zero crossing rate method .................................................................................... 37 4.7 the principle of Back-propagation Neural Network .............................................................................. 38 4.7.1 Feed-forward Calculation ............................................................................................................................. 39 4.7.2 the rules of weights adjustment in BP Neural Network ...................................................................... 39 4.7.3 5. The breath pattern classification flowchart ............................................................................................. 40 Implementation ..........................................................................................41 5.1 Pre-processing ................................................................................................................................................ 42 5.1.1 Digital Filter Applications .............................................................................................................................. 42 5.1.2 Apply filter to the digital signal .................................................................................................................... 43 5.2 Principles of Spectrum Analysis and Display ........................................................................................ 44 5.2.1 Short Time FFT Spectrum Analysis of Discrete Signal ...................................................................... 44 5.2.2 the dynamic spectrum display of the Pseudo-color coded Mapping ............................................. 45 5.2.3 Broad-band spectrum and Narrow-band spectrum ............................................................................. 46 5.2.4 Pseudo-color mapping and display of the spectrum ........................................................................... 46 5.2.5 Implementation within Matlab ..................................................................................................................... 47 5.2.5.1 function specgram(FileName, Winsiz, Shift, Base, Coltype); ........................................................... 47 5.2.5.2 display the pseudo-color mapping graph ................................................................................................ 48 5.3 Feature Extraction .......................................................................................................................................... 49 5.3.1 Introduction to End-point Detection .......................................................................................................... 49 5.3.2 End-point Detection Error ............................................................................................................................ 50 5.3.3 the Zero Crossing Rate (ZRO) ................................................................................................................... 50 5.3.4 High-order Difference .................................................................................................................................... 51 5.4 Back-propagation Neural Network Algorithm and Implementation ................................................ 53 5.4.1 design of the artificial neural network ...................................................................................................... 53 5.4.2 Back-propagation neural network implementation .............................................................................. 53 5.4.2.1 initialization of the network .......................................................................................................................... 53 5.4.2.2 training samples .............................................................................................................................................. 54 5.4.2.3 calculate the actual output of the network .............................................................................................. 54 5.4.2.4 adjust the weights........................................................................................................................................... 54 6. 6.1 Evaluation ..................................................................................................58 Interface and Controls .................................................................................................................................. 58 5 6.2 End-point Detection Evaluation ................................................................................................................. 61 6.3 Breath Pattern Detection Evaluation ........................................................................................................ 63 7. 7.1 Conclusion .................................................................................................65 Future Work...................................................................................................................................................... 65 References ...........................................................................................................66 Appendix A ..........................................................................................................68 Appendix B ..........................................................................................................69 Appendix C ..........................................................................................................70 Appendix D ..........................................................................................................72 Appendix E ..........................................................................................................73 Appendix F ..........................................................................................................76 6 Table of Figures Figure 1 Requirements analysis is in the first stage (Wikipedia) .......................................................... 22 Figure 2 System Use Case Diagram .................................................................................................... 23 Figure 3 System Sequence Diagram .................................................................................................... 24 Figure 4 An example of MATLAB simulation ........................................................................................ 27 Figure 5 Acoustic Sensor ...................................................................................................................... 29 Figure 6 Recorder ................................................................................................................................. 29 Figure 7 Proposed System Architecture ............................................................................................... 30 Figure 8 Proposed System Collaboration Diagram .............................................................................. 31 Figure 9 Data Flow Diagram ................................................................................................................. 32 Figure 10 Audio Signal Analysis ........................................................................................................... 33 Figure 11 Low Pass Filter at cutoff frequency 1000 (Hz) ...................................................................... 35 Figure 12 Bandpass filter at frequency range from 110 to 800 (Hz) ..................................................... 35 Figure 13 Frequency Response of several low-pass filters .................................................................. 36 Figure 14 Classify the breathing pattern ............................................................................................... 40 Figure 15 the overall procedure flowchart ............................................................................................. 41 Figure 16 Pre-processing flowchart ...................................................................................................... 42 Figure 17 the signal pass through a low-pass filter .............................................................................. 44 Figure 18 the spectrum of the audio signal ........................................................................................... 48 Figure 19 the spectrum of the audio signal ........................................................................................... 48 Figure 20 Feature Extraction flowchart ................................................................................................. 49 Figure 21 Zero Crossing Rate ............................................................................................................... 51 Figure 22 End-point Detection using the ZCR and HOD ...................................................................... 52 Figure 23 design of the tow-layers artificial back-propagation neural network ..................................... 53 Figure 24 BP Neural Network training algorithm flowchart ................................................................... 55 Figure 25 The main interface ................................................................................................................ 58 Figure 26 The Control part of the interface ........................................................................................... 58 Figure 27 One sound file opened by the program. ............................................................................... 59 Figure 28 Prompt that inf ....................................................................................................................... 59 Figure 29 Displaying spectrum ............................................................................................................. 60 Figure 30 Result displaying text box ..................................................................................................... 60 Figure 31 Detection result showing hint ................................................................................................ 61 Figure 32 Original signal wave display ................................................................................................. 61 Figure 33 Mouth sound End-point Detection ........................................................................................ 62 Figure 34 Nasal sound End-point Detection ......................................................................................... 62 Figure 35 Mixed breath pattern sound End-point Detection ................................................................. 63 Figure 36 Nasal breath only breathing pattern detection ...................................................................... 64 Figure 37 Breath pattern detection for mixed breathing ....................................................................... 64 7 Declaration “I hereby declare that for a period of two years following the date on which the dissertation is deposited in the Library of the University of Ulster, the dissertation shall remain confidential with access or copying prohibited. Following the expiry of this period I permit the Librarian of the University of Ulster to allow the dissertation to be copied in whole or in part without reference to me on the understanding that such authority applies to the provision of single copies made for study purposes or for inclusion within the stock of another library. This restriction does not apply to the copying or publication of the title and abstract of the dissertation. IT IS A CONDITION OF USE OF THIS DISSERTATION THAT ANYONE WHO CONSULTS IT MUST RECOGNISE THAT THE COPYRIGHT RESTS WITH THE AUTHOR AND THAT NO QUOTATION FROM THE DISSERATION CAN NO INFORMATION DERIVED FROM IT MAY BE PUBLISHED UNLESS THE SOURCE IS PROPERLY ACKNOWLEDGED.” [Peng Yuan] 8 Abstract The suggestions of changing from the mouth breathing to the nose breathing have been well recognized by either patients or healthy person and have a positive impact on the daily life. This project is trying to discriminate the nasal and mouth breathing patters in a pre-recorded sound file by an acoustic sensor, and is further aims to detect or classify the mouth or nasal breathing pattern by a artificial back-propagation neural network. Two participants involved in this experiment to record the breath sound file, and several recordings have been done over approximate half a minute period for each file sitting on a chair in a quiet room. The first purpose of the this project is to investigate the recognition rate of classifying the breathing patterns, and if that is high enough for identifying the differences in the two patterns the second issue is to detect them and then try to integrate the result into a intelligent device with a alarming system as well as the motivational feedback for the patients to help them change the pattern from mouth to nasal. The result in this project illustrate that the breath pattern could be discriminated in certain place of the body both by visual spectrum or the BP neural network classifier self built. The sound file recoded in from the sensor placed on hollow show the most promising accuracy as high as above 90%. However, the performance for the mixed breath pattern is not as good as the single breath pattern either with nasal or mouth and the reasons have also been analysis theoretically. 9 1. Introduction It is well known that change the breathing pattern from the mouth to the nose is good for the individual health as recommended by the doctors either for a healthy person or a patient. The purpose of this project is firstly to investigate the principles of automated discrimination of breathing patterns using an acoustic sensor, and if the two breathing types can be classified with high accuracy for some certain locations, this project is secondly trying to program and integrate it to a decision support system device so that it can discriminate those difference, and also try to optimise the algorithms to make the motivational feedback system more intelligence and to make sure that it can work in various environments with improved classification accuracy. 1.1 Aims and Objectives Two participants involved in this experiment to record the breath sound file, and several recordings have been done over approximate half a minute period for each file sitting on a chair in a quiet room. The first purpose of the this project is to investigate the recognition rate of classifying the breathing patterns, and if that is high enough for identifying the differences in the two patterns the second issue is to detect them and then try to integrate the result into a intelligent device with a alarming system as well as the motivational feedback for the patients to help them change the pattern from mouth to nasal. 1.2 Outline of Thesis There are seven chapters of all in the final report, an overview of the content of each chapter is listed as below: Chapter 2 presents the development of using biological sound as reference to diagnose disease along with the attempt to monitor the status inside the body to assist curing. In the meanwhile some related information about the background of this project, include relevant research of sound signal analysis and processing method, the artificial neural network. 10 Chapter 3 performs the requirements analysis, which includes functional requirement and non-functional requirement as well as the use case diagrams and sequence diagrams based on the user requirements. A summary of the specification of this project is also presented. Chapter 4 gives a briefly introduction of the programming language and analysis tools as well as the equipments that will be used in this project. And then presents the initial overall system architecture design and data flow diagram along with the proposed specification of implementation, the digital sound signal analysis methods have also been initially designed as well as the BP neural network. Chapter 5 detailed the implementation of each technology pre-designed that used within this project such as band-pass filter, end-point detection, pseudo-colour display and the backpropagation neural network. Chapter 6 is the evaluation stage that tests the performance of this application. The interface has been briefly introduced in this section and the result of the detection is displayed and analyzed with several figures. Chapter 7 is the final chapter in this report that concludes a summary of this project and the proposed work remained to do in the future. The summary conclude all the things have been done so far and give a briefly discussion about the result obtained as well as the problem that still exits to be a good issue for the future work. 11 2. Literature Review 2.1 Introduction to the exploitation of biological sound The normal lung sounds have the interaction variations which is universally known. Meanwhile the lung sounds have the variability both in a single day and several days continued (Mahagna and Gavriely, 1994). As fundamental of these major variations, nasal and mouth of specific changes will only be seen from a larger amount of discussion of subjects. The aim that analysis the breathing pattern within the help of the computer is to understand and store them objectively. As the hearing system of human beings attenuate at a lower frequency, especially below 300 (Hz), a computer -aided devices that could record the sound within that low frequency will be essential helping for sound recognition. Since the equipment is invasive and the whole procedure does not cost much, it has the potential to be used for a healthy person even for pneumonia patients which are regards as a high risk group. Thus the analysis of lung sound spectra graph could be used for incipient pneumonia patients to avoid the appearance of any radiologic abnormality or turned into even worse situation in the daily life. 2.2 The study of breathing pattern (Forgacs et al., 1971) illustrate that the intensity of breath sounds at the mouth has a relationship with the forced expiratory volume within one second of the unhealthy person with chronic bronchitis and asthma. There lies a large potential possibility to assess whether the ventilatory system have anything to do with the respiratory sound signal due to development of the modern signal processing high technology. The major issue is focusing on the variable wheezing sounds that the distribution of the frequency band could clearly be seen from the lung sound spectra graph. (Malmberg et al., 1994) has a study of investigating the connection of the ventilatory system and frequency spectrum of the breathing sound between a heathy person and a patient with asthmatic. They found that in asthmatics, the breath sound frequency distribution in terms of median frequency reflected acute changes in airways obstruction with high sensitivity and specificity (Malmberg et al., 1994) found that the frequency distribution of the breathing sound, especially in the middle frequency band, shows the acute diversification of the obstacles in airways. The patient emergency are system 12 has been influenced by a large number of signs information (Brown, 1997). The respiratory rate is a vital sign in the way that reflect the potential illness accurately and if be correctly used is a essential marker of metabolic dysfunction to help decision making in a hospital (Gravelyn and Weg, 1980; Krieger et al., 1986). Primary clinical courses show the importance of the changes in breathing rate and the requirement for long day use, invasive and reliable respiratory monitoring devices has long been arisen. 2.3 Respiratory rate monitoring The Respi-Check Respiratory Rate Monitor (Intersugical, 2009) is suitable for adults and children in an older age to use. This electronic device apply an infrared sensor to the breathing Respi-Check mask as a indicator (Breakell and Townsend-Rose, 2001). Respiratory rate is then detected continuously and the result is shown on the digital screen. The preinstalled audio and visual alarms system will be activated if no breathing has been detected for a continue 10 seconds. Another particular alarming system works for cable disconnection and battery changing. Researchers at the University of Arkansas established and measured two similar with slightly difference biosensors that has the function of detecting important physiological signs. Smart vests and fabrics are the typical organic semiconductors which enable the manufactures to make light, flexible devices that could be integrated with biomedical applications easily. 2.4 Sound Analysis and Feature Extraction Based on the research of the characteristics of the human voice and hearing, a lot of theoretical models of the sound signal analysis have been developed by researchers, such as Short Time Fourier Transform, Filter Analysis, LPC Analysis and Wavelet Analysis(Jedruszek; Walker, 2003). These theories have been widely used for Sound Coding, Sound Synthesis and Feature extraction of the Sound. In Linear Predictive Coding (LPC), the feature of the sound could be extracted by calculating the coefficients in different order of the linear predictive; The Filter Analysis Theory first filter out the frequency of the sound signal by using a bandpass filter, then extract the frequency feature based on simulating the function of the hearing system of the biological nerve cells. 13 2.4.1 Filter Analysis of Sound Signal The main part of the Filter Analysis Technology is a bandpass filter which is used to separate and extract the different and useful frequency band of the signal. However, a complete and perfect Filter Analysis Model should be a bandpass filter that followed by nonlinear processing, low-pass filter, resampling by a lower sampling rate and compression of the signal’s amplitude process. The common function for non-linear processing is Sin function and Triangular windowing function. In order to smooth the sudden changing part of the signal, it should pass through a low-pass filter after the non-linear processing. The Alternative process which re-sample the signal by a lower sampling rate or compress the amplitude aims for reducing the calculation at later stage. 2.4.2 Feature Extraction The fundamental problem of the sound signal recognition lies on what and how to choose the reasonable features and characteristics of the signal. Sound signal is a typical time-varying signal, however if zoom in the time of the signal to observe it at a milli-seconds level the sound signal shows some certain periods that seems to be a stable signal to some extents, then the features are extracted from several stable signals to represents the original signal. In general, the characteristics parameter of the sound signal fall into two types, one is the features in time domain and the other is the features in frequency domain after transformation of the signal. Usually, just the sampling values in one frame, such as the average amplitude of short-time or the average zero crossing rate of short-time, could constitute the characteristics parameter in time domain. But the another type of features are obtained by transforming the original signal into frequency domain, for example Fast Fourier Transform, to get the LPCC or MFCC features of the signal. The form type has the advantage of simple calculation but has a large dimensions of feature parameters and not suitable for represent the amplitude spectrum features. On the contrary, the latter type has a quite complex calculation of transforming, but could characterise the amplitude spectrum of the signal from several different angles. 2.5 Digital signal Characteristics in Time Domain 14 2.5.1 Short-time Energy The short-time energy of the sound signal reflects the characteristics of amplitude over the time, the mathematical formula description is: ∞ En = ∑ [x(m) w(n − m]2 m=−∞ 2.5.2 the Average Zero Crossing Rate of Short-time In the discrete time domain signal, if the adjacent sampling value has a different algebraic symbol, such as 3 followed by -1 or -2 followed by 2, the situation is called zero crossing (Mocree and Barnwell; Molla, and Keikichi, 2004). As the sound signal is a broad-band signal, in order to extract the feature precisely a short-time transform should apply to the original signal named Short-time Zero Crossing Rate, defined as: ∞ Zn = ∑ |sgn[x(m) − sgn[x(m − 1)]| w(n − m) m=−∞ Where sgn[x(n)] = { 1 0 x(n) ≥ 0 x(n) < 0 And 1 w(n) = {2N , 0, 0≤π≤π−1 ππ‘βπππ€ππ π 2.6 Digital signal Characteristics in Frequency Domain 2.6.1 the Linear Predictive Coding (LPC) parameters The LPC analysis is based on the theory that the signal in this moment could be 15 approximately figured out by the linear combination several signals before. By minimise the average variance between the actual sampling value and the linear predictive sampling value, the LPC parameters could be obtained. 2.6.2 the algorithm for LPC parameters For a linear predictive model, the value of the n point S(n) is expressed by the linear combination of several (p) sample points before n: s(n) ≈ a1 s(n − 1) + a2 s(n − 2) + β― + ap s(n − p) where a1, a2, ... ap is constant, then the above equation could be abstracted as: p s(n) = ∑ ak s(n − k) + Gu(n) k=1 where Gu(n) is the normalised impulse response and the product of the gain coefficients. Then the approximate system output is defined as: p sΜ (n) = ∑ ak s(n − k) k=1 so the approximate error of the system is: p e(n) = s(n) − sΜ (n) = s(n) − ∑ ak s(n − k) k=1 As the linear predictive error is equals to the production of impulse and gain coefficients, that is: e(n) = Gu(n) define the short-time sound signal and error as below: Sn (m) = Sn (n + m) en (m) = en (n + m) 16 then the sum of the squares error is: 2 p En = ∑ e2n (m) = ∑[sn (m) − ∑ ak sn (m − k)] m m k=1 calculate the derivation of the LPC parameters that in the above equation in different orders and set the result to zero respectively, the following equation is then obtained: p ∑ sn (m − i)sn (m) = ∑ aΜ k ∑ sn (m − i)sn (m − k) m k=1 m Based on the related function: φ(i, k) = ∑ sn (m − i)sn (m − k) m then: p φ(i, 0) = ∑ aΜ k φn (i, k) k = 1, 2, … p k=1 the above equation contains several equations and variables, and the LPC parameters could be obtained by solving the equations The minimum average squares error of the system is expressed as: p p Μ n = ∑ sn2 (m) − ∑ aΜ k ∑ sn (m)sn (m − k) = φ(0,0) − ∑ aΜ k φn (0, k) E m k=1 m k=1 There are many ways to solve the above equations, such as autocorrection (by Durbin), Covariance method, etc. The recurrence formula of Durbin’s method is: En (0) = R n (0) ππ = π−1 π π (π) − ∑π−1 π=1 ππ π π (π − π) πΈππ−1 17 (i) ai = k i (i) (i−1) aj = aj − k i ai−1 i−j (i) 1≤j<π (i−1) En = (1 − k12 En where the superscript i represent the i time iteration, and only calculate and update the a1, a2, ... , ai in each iteration until i = p. 2.6.3 the Linear Predictive Cepstrum Coefficients (LPCC) In the sound recognition system, the LPC parameters are seldom used directly but instead using the Linear Predictive Cepstrum Coefficients (LPCC) derived by the LPC, and the cepstrum could increase the stability of the coefficients. Then recurrence relation between LPC and LPCC are as below: c0 = log G2 m−1 cm = am + ∑ k=1 m−1 cn = ∑ k=1 k c a m k m−k k c a m k m−k 1≤m<π m>π where C0 is the DC component. 2.6.4 the Mel Frequency Cepstrum Coefficients (MFCC) The LPC parameters are the acoustic feature derived by the research on the voice mechanism of human beings, but the Mel Frequency Cepstrum Coefficients (MFCC) are obtained by the achievement of research on the human hearing system. The basic theory is that when two tones of the similar frequency appears at the same moment, only one tone could be heard by the human. The Critical bandwidth is the exactly bandwidth boundary where the human feel the sudden change objectively, when the frequency of the tone is less than the bandwidth boundary, people usually mistake hearing the two tones as one, and that is called the 18 shielding effect. Mel calibration is one of the methods to measure the critical bandwidth, and the calculation of the MFCC is based on the Mel frequency, the transformation with linear frequency is as: fmel = 2595 log10 (1 + f ) 700 MFCC is calculate per frame, first get the power spectrum s(n) of the frame by Fast Fourier Transform, then transform it to the power spectrum under the Mel frequency. But before doing that, the original signal should pass through several bandpass filters: Hm (n), m = 0,1, … M − 1, n = 0, 1, … . N −1 2 where M is the number of the filters, N is the frame length. The process of MFCC calculation is (Mocree, 1995); a. first get the frame length (N), then apply the Discrete FFT to the signal after preemphasising each frame S(n), the power spectrum S(n) is obtained by squaring the modulated the result calculated before. b. get the power value of the S(n) that pass through the M filters Hm(n), that is calculate the sum of the production of the S(n) and Hm(n) in each discrete frequency, the result is the parameter Pm, m = 0, 1, ... , M - 1. c. calculate the natural logarithm of the Pm to get Lm, m = 0, 1 ... , M - 1, then apply the discrete cosine transform to the result Lm to get Dm, m = 0, 1, ... , M - 1. d. leave out the D0 that represent the DC component, then D1, D2, ... Dk are regarded as the MFFC. However, the standard MFCC only show the static features of the sound signal, to get the dynamic features, which is more sensitive to human hearing, the differential cepstrum should be involved which is shown as below: d(n) = k I ∑ i β c(n + i) √∑ki=k−1 i2 i=−k 19 where c and d the parameter of one frame, k is a constant, then the differential coefficient of the current frame is called the linear combination of the former two frames and latter two frames, and the equation above is named the first order differential of MFCC. Apply the same equation to the first order differential of MFCC result in the second differential of it, and so on. In the actual usage, merger the MFCC with different order difference to form a vector as the real MFCC of one frame signal. In the description of the acoustic features, lower order coefficients could not represent the sound signal precisely but higher order leads to more complicated calculations, so it is very essential to choose a appropriate order, most sound recognition system use the order range from 10 to 15 for the LPC, LPCC or MFCC. 2.7 Artificial Neural Network The Artificial Neural Network (ANN), which is composed of a large number of microprocessing units, is an interconnected nonlinear and self adaptive information processing system. Inspired by the modern neural science research achievements, the ANN is trying to simulate the way that the biological neuron network does to process and store a large amount of information simultaneously. In the artificial neural network, each processing unit represents different objects such as features, letters, sentences or meaningful abstract patterns. The whole network is built up by three main components which is the input layer, the hidden layer and the output layer, input layer receives the signal and data from the outside world and the output layer gives the result of that being processed by the network. The hidden is placed between the input and output layers that could not be observed from the outer system directly, and the number of the hidden layer is flexible, the more quantity the hidden layers have the more complex computation the network will have and thus could deal with a larger amount of intricate information. The weights between the neurons reflect the connection strength of the processing units, and the connected relationships between them represent the expression and processing of the input information. Artificial Neural Network, in general, is a non-procedural, have the self adaptability and the biological brain style alike information processing system. It is essentially to adjust the connections and weight between the units to have the ability of 20 parallel and distributed information processing. It is an interdisciplinary fields that involves neural science, thinking science, artificial intelligence, computer science, etc. 21 3. Requirements Analysis and Functional Specification The user requirements analysis means to determine the product-specific performance and document the customer needs that generate the functional and non-functional requirements. An initial analysis is in a trying to define the project clearly and specify the proposed solution approach. Figure 1 Requirements analysis is in the first stage (Wikipedia) Requirements analysis is the first stage in the systems development process followed by function specification, software architecture, software design, implementation, software testing, software deployment and software maintenance. 3.1 Functional requirements Functional requirements define the function of certain software based on the user requirement analysis which specifies the inputs, outputs, external interface and other special management information needs. The basic functional requirements are listed as bellow: a. Collect the record sound of breath either with single nasal pattern or mouth pattern or mixed patterns in different place of the body with acoustic sensor. b. Collect the record sound of breath either with single nasal pattern or mouth pattern or mixed patterns in two participants. c. Find the place that has the best performance with the sensor and the highest accuracy in the pattern detection. 22 d. Try to find whether the time when record the sound file has an effect to the detection result, say in the morning or in the late after, etc. e. Discriminate the nasal and mouth breath with mixed patterns in a short time to identify the shortest time needed for the analysis. f. Improve the of algorithm to detect the nasal and mouth breath pattern in different of situations with better performance. Here is the system Use Case diagram: Figure 2 System Use Case Diagram 23 Here is the System Sequence Diagram: Figure 3 System Sequence Diagram 3.2 Non-functional requirements The non-functional requirements means the requirement does not relate to the functionality but how it performs its task concerning the attributes such as reliability, efficiency, usability, maintainability and portability. For this project the non-functional requirement is shown as bellow: a. The pre-recorded sound should be discriminated no matter when it was recorded and who it was recorded from. 24 b. The discrimination should perform within the specific time such as when breathe with a certain pattern last for 3 seconds it can then be detected c. The device should be easy to use for end users, for non-invasive to human body and nonobstruct to the daily life. d. Easy to transfer the program from one device to another. e. Facilitate to maintenance in the follow-up usage. 3.2.1 The frequency range to use For a healthy person, the frequency band of the vesicular breathing sounds is range from 0 all the way to 1000 (Hz), meanwhile the power spectrum shows the main energy lies between the frequency 60 and 600 (Hz) (Pasterkamp et al., 1997). (Gross et al., 2000) also illustrate some other sound like wheezing has been carried by the frequency over 2000 (Hz). The general lung sound detection using the frequency in low, middle and high band which is 100 to 300, 300 to 600, 600 to 1200 (Hz) respectively (Gross et al., 2000; Sanchez and Pasterkamp, 199; Soufflet et al., 1990; Shykoff et al., 1988). So this project focus on the frequency band 1 to 1200 (Hz). 3.2.2 Placement of the sensor This project focus on the frequency below 1200 (Hz), in order to explore the different voices in different locations, the sensor has been placed in five areas of the body, the chest, chin, hollow, right shoulder and left shoulder. And the performance in each place should be assessed before feature extraction and pass through the BP neural network. As a very sensitive sensor is used, the noise really misleading the judgments of the detection, so get rid of the noise is the first consideration before the pre-processing 3.3 Summary The first step is to find the best place for the senor and build the recording system that record the breath sound inside the body as digital file for analyzing. Before the sound analysis there is a pre-processing to get rid of the noise to facilitate the detection in a later stage. Some place has a good performance for the analysis system and others do not, this project also has a assessment of different people and recorded in different time of the day. Certain frequency 25 band has different usage in different steps of the whole project, this is the major problem need to be figured out. Discriminate the breath pattern in the daily left outside the laboratory is the final attempt of the report, but to all (Baird and Neuman, 1992) know, such a frequently used device have not come out into the products level. 26 4. Design 4.1 MATLAB MATLAB is a numerical computing environment and fourth generation programming language (Wikipedia, MATLAB). It is actually a software package for engineering analysis and powerful enough to meet almost anything an engineer could desire. MATLAB has a great graphics capability that simple for users to draw just about anything which are required and also a powerful simulating capability that the analysis toolbox consist of hundreds of simulators with which the engineer can simulate a program to see how mimic it can perform, and hence do the modification and improvement. Figure 4 An example of MATLAB simulation 4.1.1 Introduction of the related Matlab function This section introduces several Matlab functions that related to spectrum analysis and display, as well as the equation derivation and the way to choose appropriate parameters. 4.1.1.1 Short time spectrum analysis A. frame and windowing function (Eva and Anders, 1999) 27 The windowing function that provided by the matlab is hamming(N), hanning(N), blackman(N) and bartlett(N), N is the window length (frame length). Every windowing function has its featured characteristics and usage that be used in different situations as requested, in this case the Hamming window has been added to the original audio signal. Normally a power of 2 has been chosen as the frame length (N), such as 512 and 1024, to facilitate the calculation the Fast Fourier Transform (FFT), though any constant could be applied. B. Fast Fourier Transform (FFT) function Function fft(S) is provided by Matlab, where parameter ‘S’ is one frame of the windowed signal. Notice that the frequency domain sampling value of the real-time signal after FFT is symmetrical about the mid-point (that is half the sampling frequency), so only the first half of the result matrix of expression fft(S) is useful. C. Get the conjugate of the Complex Matlab provide function conj(Z) to get the conjugate of the complex ‘Z’, Here parameter ‘Z’ is the result matrix of fft(S). This function could also be used to calculate the amplitude (|X(m, k)|) of expression X(m, k) which is a complex. 4.2 Equipment The equipments needed to build the system are provided by Axelis Ltd. Here is a briefly introduction of such equipments. 4.2.1 Acoustic sensor The acoustic sensor used was supplied by Axelis Ltd. and is covered by United State Patent No: US 6,937,736 filed by Measurement Specialties Inc in 2005. 28 Figure 5 Acoustic Sensor 4.2.2 Sound Recorder Along with the acoustic sensor there is a recorder connect to it which can record the sound that the acoustic sensor sensed directly into a flash drive plugged in. Figure 6 Recorder The recorder is designed in an easy to use approach with a flash drive plugged in by USB port and has a rechargeable battery that enable the user to do the record everywhere appropriately. And also it will save the sound automatically when begin to record in a high quality of Windows Audio file to the flash drive ready for analyzing. 4.3 System architecture System architecture is the overall hardware/software configuration and database design including subsystem components supporting a particular application. There usually is a mapping of functionality onto the hardware as well as software components. 29 By creating the system architecture the system disintegrated into small structural elements, and also subsystems that simplify the problem by dividing the whole system into reasonably independent pieces that can therefore be solved separately. Figure 7 Proposed System Architecture Figure 7 shows how this proposed system works. First the acoustic sensor is wore by a person needed, then the acoustic sensor could sense the sound inside the body of the user and pass the sensed data to a sound recorder, after the sound has been processed into a proper type the sound analyzer will analyze the sound file, if a inappropriate breath pattern has been detected the analyzer will inform the alarm system to give proper recommendations. 30 Figure 8 Proposed System Collaboration Diagram 4.4 Data modeling Data modeling is to create a data model that descripts the data flow in a software. It defines and analyzes the data requirements needed to support a specific software, it also present the data that associated and define the relationship between the data components and structure. Here is the data flow diagram: 31 Figure 9 Data Flow Diagram Figure 9 presents how the data flows inside the whole process. After the sound data has been sensed by the acoustic sensor it will be stored in the sound recorder ready for processing, then the processed data will be passed to the analyzer for analyzing. 4.5 Analyzing methods In the last few years, MATLAB has become the main tool to process data and models mathematically and been generally used by universities academic research as well as business product, and its power in dealing with mathematics has been well approved (Brandt, 2005). One great advantage of using MATLAB for analysis of audio signal is that the user will be forced to understand the process result more comprehensibly than in the beginning when even do not know how to use the menu-based software 32 Figure 10 Audio Signal Analysis This also means once the user passes the threshold that exists originally, he will become specialized in a particular field of analyzing with MATLAB. In the mean time the MATLAB’s path is automatically traceable which means the user can have a clear clue about what happened anywhere in the middle of processing, this is especially important in some analysis requirement. 4.6 Sound Signal Pre-processing 4.6.1 Cutting off frequency band Research shows the frequency of lung sound lies mainly below 1000 (Hz) and this project focus the frequency under 1200 (Hz). Different frequency bands are used for different function for this experiment, the much lower frequency under 100 (Hz) is cut off to extract acoustic features and the higher frequency above 1200 (Hz) carry a lot of noise which has been filtered out at the first stage, the rest in between is used for end-point detection. So the 33 original signal should pass through several bandpass filters to cut out the specified frequency band. However, unlike the speech signal come out from the lips that has a attenuation of 6dB/oct, the pre-recorded sound signal do not have to be pre-emphasized as they came from the acoustic sensor attached to the hollow. 4.6.2 Filter Design Based on the theory above, a filter called Butterworth could be designed by the Matlab function ‘butter’: [v, u] = butter(order, Wn, function) where parameter ‘order’ is the order of the filter that results in a better filter effect when use a larger order but also brings in a larger quantity of calculations, and the length (L) of the parameter vectors ‘u’ and ‘v’ have a relationship with parameter ‘order’ that: Lu,v = order + 1 Parameter ‘Wn’ is the normalization value of the frequency that to be filtered. When the sample frequency is expressed as fs, as the highest frequency that could be processed is fs/2, if it is the frequency (f) 2000 that aims to be filtered out then: fs Wn = f/( ) 2 Parameter ‘function’ is a string that indicate the specified function of the filter. For example function = ‘low’ means a Low Pass Filter, function = ‘high’ represents a High Pass Filter. 34 Figure 11 Low Pass Filter at cutoff frequency 1000 (Hz) As shown in the above figure Frequency Response, when the original signal pass through the filter, the different frequency will be applied to the decaying rate from 1 to 0 accordingly. It could be seen obviously that it is a Low Pass Filter at cutoff frequency 1000 (Hz). Figure 12 Bandpass filter at frequency range from 110 to 800 (Hz) Based on the formula Lu,v = order + 1, the higher the order of the frequency is the more effective the filter is as the parameter vectors ‘u’ and ‘v’ will have a larger length, but it will 35 requires a more complex calculation as well. On the contrary, decreasing the order of the filter means a smaller length of the ‘u’ and ‘v’, reduced calculation that lead to worse filter effect. Figure 13 Frequency Response of several low-pass filters It is evidently to see from the above figure, the filter becomes increasingly effective when applies a larger order gradually from 1 to 8. 4.6.3 End-point Detection of the Signal The features of the sound signal effect the performance of the whole recognition system, end point detection, that is detect the beginning as well as the ending of the meaningful sound signal, is the premise of the feature extraction. Many ways could be used for end point detection in time domain, typical ones are short-time average amplitude, short-time average zero crossing rate and short-time energy. 36 4.6.3.1 Short-time average amplitude method Apparently, when the meaningful part of the signal come out, the short-time average amplitude change obviously. According to this change end points could be detected, and the equation for calculating short-time average amplitude is: π π¦(π’) = ∑ |π± π’ (π§)| π§=π 4.6.3.2 Short-time energy method In most of the actual experiment, the concept of short-time average amplitude is usually substitute by short-time energy to describe the amplitude features of the sound signal, several ways to calculate the energy are: N e(i) = ∑ |xi (n)| n=1 which is called the absolute energy, and N e(i) = ∑ xi2 (n) n=1 which is called square energy, or N e(i) = ∑ log xi2 (n) n=1 which is named logarithm energy. The short-time energy increase a lot when the useful part of the signal begins until the end of that, it then reduces gradually. As discussed the short-time energy is also a good way to detect the end point. 4.6.3.3 Short-time average zero crossing rate method 37 In the general acoustic sound signal most of the energy lies on the higher frequency band, and high frequency means a higher zero crossing rate, then the energy would have some relationship with the zero crossing rate. but the breathing sound is very unlike the normal speech signals. Firstly, a large part of the recored file are noise as the equipment used is a very sensitive sensor which recorded the breathing sound inside the body as well as the noise from the skin and airflow through the skin, sometime the noise is much larger than the useful breathing sound. Secondly most of the energy lies on the frequency band below 100 (Hz) which is a frequency band the human beings could barely hear it, this frequency band is also the most useful band that the features extracted from. So it is uncommon to see the much lower frequency band has a higher zero crossing rate as the changing rate is larger in that particular band. 4.7 the principle of Back-propagation Neural Network Back-propagation learning algorithm is also called BP algorithm, and the artificial neural network that related is also known as BP network. The BP learning is a supervised multilayered feedforward neural network algorithm. In a single ANN without hidden layer, the δ learning algorithm could be applied to the input and output sampling data to train the network. However, the multilayered feedforward perceptron introduces one or several hidden layers in between whose target output is unknown to the network, therefore the output error of the hidden layer could not be calculated directly and the supervised learning algorithm for the single layer perceptron training could not work out either. It is vital important that the back propagation refers to the output errors but not feedback the result of the output to the hidden layers or even to the input layer. The network itself does not have the feedback function but back propagate the output errors to adjust the connection weights of the hidden layers and output layer, so the BP network could not be regarded as nonlinear dynamic systems but a nonlinear mapping system. 38 4.7.1 Feed-forward Calculation Consider a tow layers neural network, that introduces one hidden layer between the input and output layers, the direction of the arrows indicates the way that the information flow through the network. The node pointed by the arrow is called the low layer of the arrow and the node in the arrow tail is the upper layer of the arrow, then the output of the j node in a given training samples could be expressed as: net j = ∑ oi β wij where oi is the output of the i node in the upper layer, wij is the connection weight between the i node in the upper layer and the j node in current layer, as for the input layer the input is always equals to the output at any node. The output (oj) of the j node is the transformation of its input by the expression given below: oj = fs (net j ) = 1 1 + e−netj where the output (oj) is taken as the input of nodes in the lower layers. Abstract the above expression to get the function: fs (a) = 1 1 + e−a then the differential expression of the output (oj) is given as: fs′ (a) = −1 β (−e−a ) = fs (a)[1 − fs (a)] (1 + e−a )2 4.7.2 the rules of weights adjustment in BP Neural Network If set the target output of the j node in the output layer as tj, the output error is then obtained as tj - oj, back propagate this error value from the output layer to the hidden layers and continually adjust the weights according to the principle of the amendment to decrease the errors. The error function for the network is: 39 1 e = ∑(t j −oj )2 2 j In order to have the error (e) a decrease trend, the amendment of the weights should follow the gradient descent of the error function, that is: βωij = −ϑ ∂e ∂ωij where η is a gain coefficient that greater than zero. 4.7.3 The breath pattern classification flowchart Figure 14 Classify the breathing pattern 40 5. Implementation The whole procedure of this project involves five stages. In the user stage the audio files that contain the breathing sound are recorded by the acoustic sensor by two people, and there are three types of sound files one is mouth breathing only and one is nose breathing only and the third type mix breath patterns with mouth and nose. Both the two of us record the file in a quiet room with smooth breathing sitting on a chair for about half a minute. The second stage is called pre-processing, before the feature extraction the pre-recorded sound file has been filtered out certain frequency band and added window to smooth the signal as well as Fast Fourier Transform. Characteristic extraction is in the third stage that involves process like End-point Detection, passing through band-pass filter, etc. The Back-propagation Neural Networks is established independently after the feature extraction, about fifty training data including mouth breathing and nose breathing are trained by the Neural Network to adjust the weights. When come to the recognition stage, the testing data are input the Neural Network for detection using the weights obtained in the training process, and some expert experience are add to the result of the detection manually. The whole process are shown int the flowchart below: Figure 15 the overall procedure flowchart 41 5.1 Pre-processing The pre-processing procedure also has several steps, after load data from the audio file the first step is to filter out the noise in the higher frequency above 1200, after that the signal should be cut into smaller frame and windowed for each one, then apply the Fast Fourier Transform to the windowed frame, finally we get the spectrum map by pseudo-color mapping. The steps are illustrated clearly by the graph: Figure 16 Pre-processing flowchart 5.1.1 Digital Filter Applications Theoretically, a filter is constituted by two vectors ‘u’ and ‘v’ with the length ‘m’ and ‘n’ for the vector ‘u’ and ‘v’ respectively, the expression is shown as below: u = [u1 , u2 , β― , um ], u1 = 1 42 v = [v1 , v2 , … , vn ] When apply the digital filter with the parameter vectors ‘u’ and ‘v’ to a discrete audio signal s(t), result in the filtered signal S(n) shown as below: uβ ∗ S(t) = vβ ∗ s(t) after merging the polynomial, S(n) is expressed like this: S(t) = v1 s[t] + v2 s[t − 1] + … + vn s[t − n + 1] − u2 S[t − 1] + u3 S[n − 2] − β― − um S[t − m + 1] For instance, choose the particular vector value u = [1] and v = [1/3, 1/3, 1/3, 1/3], then the output of the filter is: S(t) = [s(t) + s(t − 1) + s(t − 2) + s(t − 3)]/4 This customized filter take the average value of the previous four points, thus has the effect of Low Pass, in other words it filters off the high-frequency band of the original signal by averaging out them but leaving out the low-frequency band relatively. Such kind of filter is named Low Pass Filter. 5.1.2 Apply filter to the digital signal In order to filter off the noise in the sound signal, pass the signal through a low-pass filter that the frequency below 1200 (Hz) will get pass while the frequency above 1200 (Hz) get filtered out. The original signal and filtered signal are shown below: 43 Figure 17 the signal pass through a low-pass filter 5.2 Principles of Spectrum Analysis and Display 5.2.1 Short Time FFT Spectrum Analysis of Discrete Signal The spectrum analysis of signal is based on Short Time Fourier Transform (STFT) analysis of discrete time domain. Discrete time domain sampling signal can be expressed as x(n) where n = 0, 1, … , N-1 means the sampling point number and N is the signal length. In the process of the digital signal people usually frame the signal by adding window on it, then x(n) could be expressed as Xm(n) where n = 0, 1, … , N-1 and 'm' means the number of the frame, 'n' is the time number of the synchronous frame, N is the sampling points within one frame known as the frame length. the Discrete Time domain Fourier Transform (DTFT) of windowed signal Xm(n) could be illustrated as below: N−1 jω X(m, e ) = ∑ wm (n) β xm (n) β e−jωn n=0 in order to simplify the discrete calculation, the Discrete Fourier Transform (DFT) of wm(n) * xm(n) has usually been used: 44 N−1 X(m, k) = ∑ wm (n) β xm (n) β e− j2πnk 2N , k = 0, 1, … N − 1 n=0 then the |X(m, k)| is the estimated value of short-time amplitude in terms of one frame Xm(n). Take m as the time variable, k as the frequency variable then |X(m, k)| is the dynamic spectrum of signal x(n). Since the Decibel (DB) could be calculated as: DB(x(n)) = 20 ∗ log10 (|X(m, k)|) we can get the dynamic spectrum of the signal displayed by DB. Again simplify the calculation of the |X(m, k)| by Fast Fourier Transform (FFT). (Cooley and Tukey, ) 5.2.2 the dynamic spectrum display of the Pseudo-color coded Mapping Take 'm' as the abscissa, 'k' as the ordinate and the value of |X(m, k)| as the pseudo-color mapping on the two-dimensional plane, we get the dynamic spectrum of the signal x(n). Mapping the value of |X(m, k)| to the pseudo-color enables better resolution and visual effects of the dynamic spectrum as well as the improvement of the diagram's readability. The method is firstly mapping the minimum value (Xmin) of |X(m, k)| to the normalized zero, the maximum value (Xmax) of |X(m, k)| to the normalized 1 and the rest of them to the Ci between 0 and 1 linearly. Secondly, display the the Ci by the mapped color on the monitor. In order to make full use of the dynamic range of the color space, the appreciated base spectrum value should be chosen. The value that less than the base is limited on the base and that greater than the base then be normalized linearly. The color value matrix is expressed as C = {c(m, k)} then the mapping from |X(m, k)| to c(m, k) is illustrated mathematically as below: c(m, k) = B(m, k) − Base [Max(B(m, k)∀(m, k))] − Base where: π΅(π, π) = { |π(π, π)|, π΅ππ π, 45 |π(π, π)| > 0 |π(π, π)| ≤ 0 5.2.3 Broad-band spectrum and Narrow-band spectrum According to the Discrete Fourier Transform (DFT) analysis principle, the frequency resolution of the spectrum refers to the interval between the discrete frequency, that means the frequency interval (f0) represented by variable ‘k’ in the expression X(m, k). The value depends on the frame length N and the sampling frequency of the signal fs. Based on the Nyquist sampling theorem, f0, fs and N fall into the relationship as below: f0 = fs /N As the formula suggested, the frequency interval (f0) has nothing to do with the frequency that the signal contains. As long as the sampling frequency is a constant, increase the frame length (N) will result in the higher resolution of the spectrum or the smaller bandwidth that represented by the ‘k’ in the expression X(m, k), in that case the spectrum will tend to be a Narrow-band one, otherwise it will be a Broad-band spectrum. Increase the resolution in frequency domain by using a larger value of N will result in a lower resolution in time domain of the spectrum. The way to resolve the contradiction is to introduce the sub-frame by frame shift (N1, N1 < N ) while choosing a larger but appropriate frame length (N), in this way a spectrum that with balanced resolution in frequency domain and time domain will be obtained. the sub-frame shift could be illustrated as below: π₯π (π) = π₯(π + π1 ∗ π), π = 0, 1, … , π − 1, π1 < π 5.2.4 Pseudo-color mapping and display of the spectrum Pseudo-color mapping function colormap(MAP) is built in Matlab, parameter ‘MAP’ is the vector, which is a 64 rows by 3 columns matrix, that used for pseudo-color mapping, each column represents the saturation of the color red, green and blue respectively. For instance, expression MAP = [0 0 0] means the pure black mapping, expression MAP = [1 1 1] means the pure white mapping and MAP = [1 0 0] means the pure red mapping. Meanwhile parameter ‘MAP’ could also be the matrix registered in Matlab, such as MAP = gray shows the gray mapping linearly, MAP = hot shows the progressively increase of the color 46 saturation from black to red and yellow then to white mapping. MAP = copper shows the bronze mapping linearly. The function imagesc(t, f, C) is also built in Matlab for displaying spectrum, where parameter ‘t’ is the time coordinate, ‘f’ is the frequency coordinate and ‘C’ is the color value that the amplitude of frequency being mapped to pseudo-color. If the whole audio signal has been framed into the ‘M’ pieces, parameter ‘t’ is an M-dimensional row vector and its value is the starting point of each frame corresponding to the time. As the useful sampling points in the frequency domain is only half of the sampling frequency (N/2), parameter ‘f’ is an N/2dimensional row vector and its value is the amplitude of corresponding frequency. Accordingly, parameter ‘C’ is an M by (N/2) matrix. 5.2.5 Implementation within Matlab 5.2.5.1 function specgram(FileName, Winsiz, Shift, Base, Coltype); FileName: indicates the digital file that contains the audio signal to be processed. The audio file is pre-recorded and saved as the ‘wav’ format, the sample values are evaluated to the matrix ‘Signal’ which is expression x(n) introduced before, and the sampling frequency is evaluated to fs mentioned in above section. Winsiz: defines the length of the frame, normally choose the power of 2, for example 1024, as default value to simply the calculation of FFT. The Broad-band spectrum or the Narrowband spectrum could be obtained by choosing the suitable value of the parameter ‘Winsiz’. Shift: shows the frame shift value (N1). Generally, N1 is less than or equals to the ‘Winsiz’, the smaller the N1 is the higher resolution the frequency domain has. Base: sets the spectrum base value. This value is practical experience dependent, there is no fix value but choose a appropriate one based on visional and resolution effect of the spectrum that obtained by using different base values in the experiments. Coltype: represents the displaying pseudo-color. By default, the function use ‘hot’ as the pseudo-color mapping. Other values for the parameter ‘MAP’ are: cool, hsv, hone, prism, jet, copper etc. 47 5.2.5.2 display the pseudo-color mapping graph Figure 18 the spectrum of the audio signal where Winsiz = 1024, Shift = 256, Base = -10, Coltype = ‘hot’. This spectrum maps the value of the frequency that from maximum to minimum to the pseudo-color from brightest to the darkest accordingly, and cut the specified frequency band from 0 to 1000. Figure 19 the spectrum of the audio signal where Winsiz = 2^16 (about 3 sec.), Shift = 2^15 (half the window size), Base = -50, Coltype = ‘jet’. As could be seen clearly, larger frame length results in a much higher resolution in frequency domain but lower resolution in time domain. 48 5.3 Feature Extraction After the pre-processing step, the signal visible by the pseudo-color mapping spectrum and ready for the feature extraction. The main process in this step is End-point Detection, as the major energy lies on the certain low frequency band, that is under 110 (Hz) as could be seen clearly from the red color in the above spectrum graph, the signal has to pass through another filter that filter out the all the frequency above 110 (Hz), then the features could be extracted from that particular frequency band. The flowchart for this stage is displayed below: Figure 20 Feature Extraction flowchart 5.3.1 Introduction to End-point Detection The aim of End-point Detection (EPD) is to find out the starting point and ending point of the digital signal which in between is meaningful to the signal processing. There are typically two ways of end-point detection according to the different characteristic parameters used in the methods. 49 Features in time domain: Volume and Zero Crossing Rate (ZCR) a. Volume: simplest way to detect the endpoint but the miner airflow noise will lead to misjudgment. b. Volume and ZCR: ZCR could help to get rid of the miner airflow noises when using the volume as the features to EPD, combination of the two characteristics could handle most of the detection in a much precision level. Features in frequency domain: Variance and Entropy in spectrum a. Variance in spectrum: the effective part of the signal has a regular variation of spectra and thus smaller variance that could be used as the criterion of the EPD. b. Entropy in spectrum: the entropy of the digital signal is also much lower between the ending points that contributes to the detection. 5.3.2 End-point Detection Error The end-point detection is not always successful and there are also two types of errors. a. False Rejection: mistaking the meaningful part of the signal as the noise or the silence part results in the decrease of the audio signal detection rate. b. False Acceptance: that is mistaking the silence or noise part of the signal as the useful part also brings down the detection rate. To avoid or reduce the detection errors, the features of the the silence or noise part of the signal could be used as a references when design the detector and this is again a pretty experience dependent work. 5.3.3 the Zero Crossing Rate (ZRO) ZRO means the amount of zero-crossing value points within one frame, in general the zerocrossing rate of the voice sound is more or less larger than that of the silence sound (under the conditional of already got rid of the noise), and thus is used to detect the starting-point and ending-point within this project. 50 When calculate the zero-crossing rate, the importance of the exactly zero value should not be ignored. The zero value point could be the starting point or the ending point of one meaningful frame or nothing but a normal value belongs to one frame, and make a good use of the zero values always do a great contribution to the the end-point detections. Because the sample values retrieved from the digital audio file have been normalized, in order to avoid the possibility of increasing the ZRO by using the float number to calculate the bias, they should be unnormalized by multiplying the bit resolution to get back the original values. Figure 21 Zero Crossing Rate The method one does not counts the zero values into zero-crossing but the method two does, there is not much big difference in this project as the graph generated by method one has entirely covered by the method two’s as it is obviously shown in the first plot of the above figure that the variation of the sampling values is very large. 5.3.4 High-order Difference In an Arithmetic Progression a1, a2, ... , an, ... where an+1 = an + d, n = 1, 2, ... and ‘d’ so called common difference is a constant. In a general series expressed as y = y(t) βy(t) = y(t + 1) − y(t) 51 where βy(t) is called the first order difference of y(t) at the point t, so βπ¦π‘ = π¦π‘+1 − π¦π‘ is defined as the first order difference of the expression y(t) where β is the Difference Operator (Allaberen and Pavel, 2004). Combine the zero crossing rate with the High-order Difference (HOD) operation enables the end point detection to achieve a very high precision level. Of cause, here the order should be adjusted in the practical experiment to decide whether first-order or second-order or even third-order should be chosen to attain the best performance. Figure 22 End-point Detection using the ZCR and HOD It is obviously from the above figure that the method mentioned before has a fine performance in End-point Detection after filter out the noise in the high frequency (1200 Hz) and the other information contained a large amount of energy in a very low frequency (110 Hz) 52 5.4 Back-propagation Neural Network Algorithm and Implementation 5.4.1 design of the artificial neural network The number of the input neurons depends on the dimension of the features that extracted after pre-process stage, and the number of the output neurons are two that represent mouth breathing and nasal breathing respectively. The number of the hidden layer neurons is the twice the input neurons typically but it is still adjustable in the real practice to achieve the best performance, then the neural network could be designed as below: Figure 23 design of the tow-layers artificial back-propagation neural network 5.4.2 Back-propagation neural network implementation 5.4.2.1 initialization of the network As designed above, initialize the only one hidden layer and assign the connection weights with random numbers between 0 and 1 for the hidden layer and output layer respectively. 53 5.4.2.2 training samples Normalize the feature values extracted in previous stage to form a vector of n, π = (π₯1 , π₯2 , … , π₯π ) and the two types of target output are t1 = (1, 0) represents mouth breathing and t2 = (0, 1) for nasal breathing, so each training sample falls into this form: I1 = (x1 , x2 , … , xn ; 1, 0), or πΌ2 = (π₯1 , π₯2 , … , π₯π ; 0, 1) 5.4.2.3 calculate the actual output of the network the non-linear function: π¦π = [1 + exp(− ∑ πππ β π₯π )]−1 π is used to calculate the output of each node layer by layer without the input layer, finally get the output as: π = (π1 , π2 , … , ππ ) 5.4.2.4 adjust the weights Adjust the weights from the output layer to the hidden by using the following equation: π€ππ (π + 1) = π€ππ (π) − ππΏπ ππ where ππ is the output of the i node in the upper layer. If j is a node in the output layer, then 54 πΏπ = ππ (1 − ππ ) β (ππ − π‘π ) if j is a node in the hidden layer, then πΏπ = ππ (1 − ππ ∑ πΏπ β πππ where k represents all the nodes in the lower layer of layer where j node located. The features of each pattern cycle in the sound file have been extracted right after the endpoint detection process, and the training data is stored in a Matlab format ready for going through the training neural network. The flowchart for the Back-propagation Neural Network training algorithm is shown below: Figure 24 BP Neural Network training algorithm flowchart . 55 The non-linear function with a ‘S’ shape used for BP Neural Network algorithm is: π(π₯) = 1 1 + π −π₯ The input to the input layer is: ππ = {π₯π1 , π₯π2 , … π₯ππ , … π₯ππ } The output of the input layers is: ππ = {ππ1 , ππ2 , … πππ , … πππ } Apply the ‘S’ shape function to the output then: ππ = π(ππ ) The input to the hidden layer is: π πππ‘ππ = ∑ πππ πππ − ππ , π = 1, 2 , π π=1 The output of the hidden layer is: πππ = π(πππ‘ππ ), π = 1, 2, … π The input to the output layer is: π πππ‘ππ = ∑ πππ πππ − ππ , π = 1, 2 , πΏ π=1 The output of the output layer is: πππ = πππ = π(πππ‘ππ ), π = 1, 2, … πΏ The average square error function is defined as: L 1 Ep = ∑(ypk −dpk )2 2 k=1 56 Calculate all the Ep for each training sample, then the total error is: π πΈ = ∑ πΈπ π=1 Adjust the weights for the hidden and output layers respectively using the gradient descent method, as well as the threshold will be amended in every loop. The adjusting value for the weights and threshold between output layer and hidden layer is: p βWkj = ϑ(ypk − dpk ) β f ′ (net pk ) β Opj = ϑσpk Opj π βππ = ππππ The adjusting value for the weights and threshold between hidden layer and input layer is: L p βWji = ϑOpi β f ′ (net pj ) β ∑ δk Wkj = ϑσpj Opi k=1 π βππ = ππππ Where: σpk = (ypk− Opk )Opk (1 − Opk ) σpj = σpk ωjk Opj (1 − Opj ) 57 6. Evaluation 6.1 Interface and Controls The final interface a simple one but have a complex calculation behind, like the graph showing below: Figure 25 The main interface There are mainly four parts in this interface, the upper left part is the control part, the upper right part is the result showing part and the lower left lies on the spectrum graph, the fourth part which draw the signal wave located at the lower right. Figure 26 The Control part of the interface 58 In the control part the ‘Choose File’ button allows user to choose an audio file in the ‘wav’ format to detect the breath pattern and show the spectrum and signal wave in the lower part graph areas. The ‘Detect Breath’ button to the right has several functions integrated, first it detect the end point in the signal file then extract the feature of the signal at a certain frequency band under 110 (Hz) finally it pass the features to the Back-propagation Neural Network to detect the breath patterns, the BP Neural Network has already been trained before the detection. Figure 27 One sound file opened by the program. As displayed in the figure above, after chosen one sound file the spectrum as well as the plotting have been done and show in the lower part of the interface. And before the display of the graph the pre-processing including the framing, windowing and filter has been done in the background, thus it needs a little while for the complex calculation, however a small symbol is right close to the file name label to give the user a clue. Figure 28 Prompt that inf 59 Figure 29 Displaying spectrum The lower left part is the spectrum displaying area that display the pseudo-colour mapping spectra graph. The slider in the button slides the time that allow user to jump to a specific time to hear the playing of the sound file and also sliding while the sound playing to give the user a visual feeling where the sound have been play so far. The slider to the right adjust the frequency on the y-coordinate that actually enable the user to zoom in or out the spectrum to either get a overview or a more detailed view of the spectrum range from the sampling frequency to as low as 20 (Hz). Figure 30 Result displaying text box The text box showing above is used for result displaying. The detection result are displayed in the box, before the ‘Detect Breath’ button has been pushed down it give the user a hint that to push the button to get the result displayed right here. 60 Figure 31 Detection result showing hint Because the detect procedure of is complicated that a large amount of data have to be processed it will be a while before the result come out in the box, so in order not to give the user a feeling that the program is dead, a dynamic text prompt shows to the upper right side of the box to inform that the data is still processing before the result come out. Figure 32 Original signal wave display The plotting area at the lower right corner showing signal wave after been pre-processed. This form is also used for the End-point Detection result display, the tiny ‘Play’ button allows user to control the sound file playing and also have a chance to pause or stop while playing. 6.2 End-point Detection Evaluation The end-point detection process happens after the ‘Detect Breath’ button has been pushed down. 61 Figure 33 Mouth sound End-point Detection As the result above shows, the end-point detection function works fine for the mouth breath only signal. The blue line indicate the starting point where one breath procedure begins and the dark red line tells the ending point that one breath has finished. Figure 34 Nasal sound End-point Detection The above figure is the nasal breath only sound result of end-point detection, the function also has a very good performance for this breath pattern. 62 . Figure 35 Mixed breath pattern sound End-point Detection As shown clearly above that the end-point detection result is not as good as the previous ones which has a signle breath pattern, for the mixed breath pattern sound the end-point detection function only detect out a little more than harlf the breath cycles. The reason for this is that when a person breath with mouth for a while then change the pattern to nasal breath the breath sound is usualy smaller than a single breath pattern, if the thresthold for the end-point detection is too small it will detect more non end-points as not desired, on the contrary if the threshold is larger a lot true end-points will not be detected. Then here rise the conflection and the threshold has to be adjusted to fit every situation remains a good topic for the future work. 6.3 Breath Pattern Detection Evaluation The breath pattern detection is right after the end-point detection which followed by the feature extraction. The detection result will show in the text box as illustrated before. 63 Figure 36 Nasal breath only breathing pattern detection As the result above says, the breath pattern detection works really fine for the nasal breath only pattern that only one mistake happens in the second breath cycle. The classification rate is about 90% right for this detection. Figure 37 Breath pattern detection for mixed breathing As the same thing happens in the end-point detection process, the result for the mixed breath pattern do not have the same performance as the single breath pattern. The user has to judge the result as they have the chance to hear the breath sound clearly enough to tell whether it is mouth breathing or not. 64 7. Conclusion It is well known that the breathing pattern changing from mouth to nose do vital impact on patients respiratory disease even a healthy person.The proposed concept relates to a new training and monitoring device that will monitor and advise end-users of their breathing status in relation to their nose breathing versus mouth breathing activity and also deliver instructions on action required i.e. reversion to proper breathing. The initial step in the delivery of such a system is the investigation of whether the necessary discriminatory information when nasal versus breathing can be obtained from acoustic sensors placed at various positions on the body. The experiment result shows that the difference between nasal and mouth breath can be discriminated successfully with a high enough accuracy and therefore tried to program a proper application to integrate the code into a certain device to try to give appropriate feedback to end-users. 7.1 Future Work The program does not perform in the same level over the whole procedure, for the end-point detection part breath with single pattern both mouth and nasal have been detected with 100% accuracy, however for the mixed breath pattern that breath with mouth first then change to nose for a while do not get that high accuracy. The reason is that when changing the breath pattern people usually differ the strength and rate of breathing, that is breathing harder or quicker. As the end-point detection is the premise of classifying the breath pattern, the best issue for the future work then lies on the improvement of the end-point detection algorithm that works in various situations with better performance. 65 References A.Brandt, J. Tuma, T. Lago, and K. Ahlin (2005) Toolboxes for analysis of sound and vibration signals within Matlab. Axiom EduTech AB, Technical Univ. of Ostrava, Blekinge Institute of Technology. Page(s) 1 Allaberen Ashyralyev, Pavel Iosifovich SobolevskiΔ. (2004) New difference schemes for partial differential equations. Birkhäuser. P1 ~ 3 Baird, T., Neuman, M. (1992) A Thin Film Temperature Sensor For Measuring Nasal And Oral Breathing In Neonates, Engineering in Medicine and Biology Society, 1992. Vol.14. Proceedings of the Annual International Conference of the IEEE, Volume 6, Issue , 29 Oct-1 Nov 1992 Page(s):2511 – 2512 Brown L., Prasad N. (1997) Effect of vital signs on advanced life support interventions for prehospital patients. Prehosp.Emerg.Care 1997 Jul-Sep; 1(3):145-8. Breakell A, Townsend-Rose C. (2001) Clinical evaluation of the Respi-check mask: a new oxygen mask incorporating a breathing indicator. Emerg Med J 2001 Sep;18(5):366-9. Chiarugi F, Sakkalis V, Emmanouilidou D, Krontiris T, Varanini M, Tollis IG. Adaptive Threshold QRS Detector with Best Channel Selection Based on a Noise Rating System. Computers in Cardiology 2007;34: 157-160. Cooley J W, Tudey J W. (1965) An algorithm for the machine computation of complex Fourier Series[J]. Mathematical Computation, 1965, 19: 296 ~ 302. Eva Part-Enander, Anders Sjoberg. The MATLAB Handbook [M]. Harlow: Addison-Wesley, 1999 Forgacs P, Nathoo A, Richardson H. 1971) Breath sounds. Thorax 1971; 26:288-95 Gravelyn T., Weg J. (1980). Respiratory rate as an indication of acute respiratory dysfunction. JAMA 1980 Sep;244(10):1123-5. Gavriely, N., Nissan, M., Rubin, A., and Cugell, D. (1995). Spectral characteristics of chest wall breath sound in normal subjects. Thorax 50:1292–1300. 66 Jedruszek, jacek. Speech recognition. Alcatel telecommunications Review, 2003, 128-135p Mahagna, M., and Gavriely, N. (1994). Repeatability of measurements of normal lung sounds. Am. J. Respir. Crit. Care Med. 149:477–481. Malmberg, L. Sovzjdrvi, A., Paajanen, E., Piirild, P., Haahtela, T. and Katila, T. (1994) Changes in Frequency Spectra of Breath Sounds During Histamine Challenge Test in Adult Asthmatics and Healthy Control Subjects, Chest, Vol. 105, No. 1, pp: 122-133 Mocree A V, Barnwell T P. A mixed excitation LPC vocoder model for low bit rate speech coding [A]. IEEE Trans, on speech and audio processing[C]. 1995, 3: 242-250p Molla, MdKhademul Islam Hirose, Keikichi. On the effectiveness of MFCCs and their statistical distribution properties in speaker identification. 2004 IEEE Symposium on Virtual Environments, Human-computer Interfaces and Measurement Systems, VECIMS. 2004, 136-141p Pasterkamp H, Consunji-Araneta R, Oh Y, and Holbrow J. (1997) Chest surface mapping of lung sounds during methacholine challenge. Pediatr Pulmonol 23: 21–30, 1997. Roger Jang. Audio Signal Processing and Recognition. http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/filterApplication.asp?title=111FilterApplications Sun, Java SE Desktop Technologies http://java.sun.com/javase/technologies/desktop/media/jmf/ Walker Shonda Lachelle. Wavelet-based feature extraction for robust speech recognition; [dissertation]. The Florida State University, 2004, 50-58 Wikipedia, Requirements analysis http://en.wikipedia.org/wiki/Requirements_analysis Wikipedia, MATLAB http://en.wikipedia.org/wiki/Matlab 67 Appendix A %% design of low pass filter ============================================================== fs=8000; % Sampling rate filterOrder=5; % Order of filter cutOff=1200; % Cutoff frequency [b, a]=butter(filterOrder, cutOff/(fs/2), 'low'); % === Plot frequency response [h, w]=freqz(b, a); plot(w/pi*fs/2, abs(h), '-'); title('Magnitude frequency response'); xlabel('Frequency (Hz)'); ylabel('Magnitude'); legend('low pass filter'); grid on %% design of Bandpass filter ============================================================== fs=3000; % Sampling rate filterOrder=8; % Order of filter cutOff=[110 800]; % Cutoff frequency [b, a]=butter(filterOrder, cutOff/(fs/2), 'bandpass'); % === Plot frequency response [h, w]=freqz(b, a); plot(w/pi*fs/2, abs(h), '-'); title('Magnitude frequency response'); set(gca, 'XTick', (0:300:1500)); %set X Tick xlabel('Frequency (Hz)'); ylabel('Magnitude'); legend('bandpass filter'); grid on 68 Appendix B %% compare several low pass filters in one graph ============================================================== fs=8000; % Sampling rate cutOff=1000; % Cutoff frequency allH=[]; for filterOrder=1:8; [b, a]=butter(filterOrder, cutOff/(fs/2), 'low'); % === Plot frequency response [h, w]=freqz(b, a); allH=[allH, h]; end plot(w/pi*fs/2, abs(allH)); title('Frequency response of a low-pass utterworth filter'); xlabel('Frequency (Hz)'); ylabel('Magnitude'); legend('order=1', 'order=2', 'order=3', 'order=4', 'order=5', 'order=6', 'order=7', 'order=8'); %% apply filter to signal ============================================================== cutOff=1200; % Cutoff frequency filterOrder=5; % Order of filter [x, fs, nbits]=wavread('sample.wav'); [b, a]=butter(filterOrder, cutOff/(fs/2), 'low'); x=x(:,2); % 30-second signal y=filter(b, a, x); % ====== Plot the result time=(1:length(x))/fs; subplot(2,1,1); plot(time, x); xlabel('Time (sec.)'); ylabel('Energy'); legend('original signal'); grid on subplot(2,1,2); plot(time, y, 'k'); xlabel('Time (sec.)'); ylabel('Energy'); legend('filtered signal'); grid on 69 Appendix C %% pseudo-color mapping ============================================================== signal, fs, nbits] = wavread('both'); signal = signal(:,2); lengthSignal = length(signal); winSize = 2^(nextpow2(fs*3) - 1); %window size shift = winSize/2; %shift base = -50; %base colorType = 0; %color type frameNum = floor((lengthSignal - winSize)/shift) + 1; A = zeros(winSize/2 + 1, frameNum); for i = 1:frameNum n1 = (i - 1)*shift + 1; %start point of one frame n2 = n1 + (winSize - 1); %end point of one frame frame = signal(n1:n2); %one frame frame = frame.*hamming(winSize); %windowing frame y = fft(frame); y = y(1:winSize/2 + 1); y = y.*conj(y); y = 10*log10(y); %get the amplitude of frequency A(:,i) = y; end B1 = (A>base); B0 = (A<base); B = A.*B1 + base*B0; C = (B - base)./(max(max(B)) - base); y = (0:winSize/2)*fs/winSize; x = (0:frameNum - 1)*shift/fs; if colorType == 1 colormap(hot); else mycoltype = jet; mycoltype = mycoltype(64:-1:1,:); %reverse the color from the bottom to top colormap(mycoltype); end imagesc(x,y,C); axis xy; colorbar; colorbar('YTick',0:0.2:1); title('Spectrum Analysis'); Ylim([0 1000]); xlabel('Time (sec.)'); ylabel('Frequency (Hz)'); set(gca, 'YTick', (0:200:1000)); coordinate 70 %set the Y tick on Y set(gca, 'XTick', (0:3:lengthSignal/fs)); %set the X tick on X coordinate 71 Appendix D %% zero crossing rate ============================================================== clc; clear; fileName='mouth'; frameSize=2^11; overlap=0; cutOff=110; % Cutoff frequency filterOrder=5; % Order of filter [y, fs, nbits]=wavread(fileName); [b, a]=butter(filterOrder, cutOff/(fs/2), 'high'); % design the filter y = y(:,2); y=filter(b, a, y); % filter the frequency below 110 y = y*(2^(nbits - 1)); % the unnormalized signal sample value frameSeg=buffer(y, frameSize, overlap); frameNumber=size(frameSeg, 2); for i=1:frameNumber frameSeg(:,i)=frameSeg(:,i)-round(mean(frameSeg(:,i))); % Zero justification end zcrOne = sum(frameSeg(1:end-1, :).*frameSeg(2:end, :)<0); % Method one zcrTwo = sum(frameSeg(1:end-1, :).*frameSeg(2:end, :)<=0); % Method two time=(1:length(y))/fs; frameNumber=size(frameSeg, 2); frameTime=((0:frameNumber-1)*(frameSizeoverlap)+0.5*frameSize)/fs; subplot(2,1,1); plot(time, y); title('Signal Wave'); xlabel('Time (sec.)'); ylabel('Volume'); subplot(2,1,2); plot(frameTime, zcrOne, '-', frameTime, zcrTwo, '-'); title('Zero Crossing Rate'); xlabel('Time (sec.)'); ylabel('Volume'); legend('Method one', 'Method two'); 72 Appendix E %% End-point Detection ============================================================ clc;clear; fileName = 'both'; cutRange = [110 1200]; % Cutoff frequency filterOrder=5; % Order of filter [y, fs, nbits]=wavread(fileName); [b, a]=butter(filterOrder, cutRange/(fs/2), 'bandpass'); %Bnadpass filter to detect the breathing point y = y(:,2); signal = filter(b, a, y); % cut the frequency between 110 and 1200 winSize = 1024*4; %window size shift = winSize/2; %shift base = -10; frameNum = floor((length(signal) - winSize)/shift) + 1; A = zeros(winSize/2 + 1, frameNum); for i = 1:frameNum n1 = (i - 1)*shift + 1; n2 = n1 + (winSize - 1); frame = signal(n1:n2); frame = frame.*hamming(winSize); y = fft(frame); y = y(1:winSize/2 + 1); y = y.*conj(y); A(:,i) = y; end %apply Base value L1 = (A>base); L0 = (A<=base); B = A.*L1 + base*L0; D = (B - base)./(max(max(B)) - base); %normalise the value %y = (0:winSize/2)*fs/winSize; time = (0:frameNum - 1)*shift/fs; energy = abs(sum(D)); energy = [diff(energy) 0]; %first order difference threshold = 0.06; dd1 = abs(energy) < threshold; dd2 = abs(energy) >= threshold; energy = 0.*dd1 + energy.*dd2; endPoint = zeros(size(energy)); for i = 2:length(energy)-2; the point %from begin to end, find out 73 sh = energy(i:i+2); %one small frame if isequal(sh, [0,0,0]) if energy(i-1) ~= 0 endPoint(i) = 2; %identified one end-point end end end for i = length(energy)-1:-1:3; %from end to begin, find out the point sh = energy(i-2:i); %one small frame if isequal(sh, [0,0,0]) if energy(i+1) ~= 0 endPoint(i) = 2; % identified one end-point end end end for i = 1:length(endPoint); %delete the point which breathing duration less than 0.3 second if endPoint(i) ~= 0 for j = i+1:length(endPoint); %find out the nearest next end-point if endPoint(j) ~= 0 break; end end if energy(i+1) ~= 0 && time(j) - time(i) < 0.5 endPoint(i) = 0; %delete the wrong end-point endPoint(j) = 0; %delete the wrong end-point end end end x=(1:length(signal))/fs; subplot(211); plot(x, signal); title('End-point Detection'); set(gca, 'XTick', (0:1:max(time))); %set the Y tick on Y coordinate Ylim([-0.05 0.1]); xlabel('Time (sec)'); ylabel('Volume (Energy)'); hold on; grid on; flag = 0; %to plot a line by different color for i = 1:length(endPoint); %plot a line at the detecting points if endPoint(i) ~= 0 if flag == 0 74 plot([time(i) time(i)], [min(signal) max(signal)], 'k'); flag = 1; else plot([time(i) time(i)], [min(signal) max(signal)], 'r'); flag = 0; end end end legend('signal wave', 'breath start point', 'breath end point'); subplot(212); plot(time, energy); set(gca, 'XTick', (0:1:max(time))); %set the Y tick on Y coordinate Ylim([-2 3]); xlabel('Time (sec)'); ylabel('Volume (Energy)'); hold on; grid on; flag = 0; %to plot a line by different color for i = 1:length(endPoint); %plot a line at the detecting points if endPoint(i) ~= 0 if flag == 0 plot([time(i) time(i)], [min(energy)*0.7 max(energy)*0.7], 'k'); flag = 1; else plot([time(i) time(i)], [min(energy)*0.7 max(energy)*0.7], 'r'); flag = 0; end end end legend('signal wave', 'breath start point', 'breath end point'); 75 Appendix F % train BP Neural Network ============================================================ w_ji=rand(20,10);% ten nodes in input layer, twenty nodes in hidden layer w_kj=rand(2,20); % two nodes in output layer theta_j=rand(20,1); %initial random weights for hidden layer theta_k=rand(2,1); %initial random weights for output layer train_num=10000; %training times train_file=100; %training data number yita=0.1; %learning rate precise = 0.01; % train samples num = 1; while num < train_num file_num=1; %training sample e=0; %initial error as 0 while file_num<=train_file for t=1:2 switch t case 1, x = ta; y=[1 0]'; case 2, x = tb; y=[0 1]'; end % reading data x = x(:,file_num); minx = min(x); maxx = max(x); for i=1:10 %normalising data x(i)=(x(i)-minx)/(maxx-minx); end % feedforward algorithm o_i=1./(1+exp(-x)); %output of input layer10*1 x_j=w_ji*o_i-theta_j; %input to hidden layer20*1 o_j=1./(1+exp(-x_j)); %output of hidden layer20*1 x_k=w_kj*o_j-theta_k; %input to output layer2*1 o_k=1./(1+exp(-x_k)); %output of output layer2*1 % back-propagation algorithm delta_k=(y-o_k).*o_k.*(1-o_k); %error of output layer*1 delta_wkj=yita*delta_k*o_j'; %adjust weight learning rate w_kj=0.1 delta_thetak=yita*delta_k;%adjust weight delta_j=o_j.*(1-o_j).*(w_kj'*delta_k); %error in hidden layer delta_wji=yita*delta_j*o_i'; % 76 delta_thetaj=yita*delta_j;%adjust weight for hidden layer w_ji=delta_wji+w_ji; w_kj=delta_wkj+w_kj; theta_k=delta_thetak+theta_k; theta_j=delta_thetaj+theta_j; e=0.5*sum((y-o_k).^2); %ex(:,t) = e; for plotting if e < precise num = train_num; end end file_num=file_num+1; end num=num+1; end %plot(exx,'.'); %%recognise testing data ============================================================ file_num=1; recog_num=0; total_num=30; while file_num<=total_num x = testa(:,file_num); minx = min(x); maxx = max(x); for i=1:10 %normalising data x(i)=(x(i)-minx)/(maxx-minx); end o_i=1./(1+exp(-x)); %output of input layer10*1 x_j=w_ji*o_i-theta_j; %input to hidden layer20*1 o_j=1./(1+exp(-x_j)); %output of hidden layer20*1 x_k=w_kj*o_j-theta_k; %input to output layer2*1 o_k=1./(1+exp(-x_k)); %output of output layer2*1 [y,n]=max(o_k); if n==1 %recognised successfuly recog_num=recog_num+1; end file_num=file_num+1; end rate=recog_num/total_num*100; %detection rate 77 78