Dynamic Reconfiguration of IP Domain Middleware Stacks to

advertisement
Discriminating between Nasal and Mouth
Breathing
Peng Yuan
B00541592
BSc(Hon’s) Computing Science
Supervisor: Dr. Kevin Curran
Interim Report, April 2010
1
Acknowledgements
I would like to extend sincere thanks and appreciation to my project supervisor, Dr. Kevin
Curran, for the initial idea of the project and his invaluable help and guidance throughout this
year.
I would also like to show my appreciation to all the staff in the library and labs, for their kind
help and assistance during the last few months. Lastly I would like to thank my parents for
their unending support during my studying abroad and all my fellow classmates for helping
me in many ways throughout the project.
2
Abbreviations
3D
Three Dimension
AFS
Amplitude Frequency Spectrum
ANN
Artificial Neural Network
BPNN
Back-propagation Neural Network
CDSS
Clinical Decision Support System
CORSA
Computerized Respiratory Sound Analysis
DFT
Discrete Fourier Transform
DTFT
Discrete Time domain Fourier Transform
EPD
End-point Detection
FFT
Fast Fourier Transform
LPC
Linear Predictive Coding
LPCC
Linear Predictive Cepstrum Coefficients
MFCC
Mel Frequency Cepstrum Coefficients
UML
Unified Modeling Language
HOD
High-order Difference
HR
Heart Rate
RIP
Respiratory Inductive Plethysmography
RR
Respiratory Rate
STFT
Short Time Fourier Transform
ZCR
Zero Crossing Rate
3
Table of Contents
Acknowledgements ................................................................................................ 2
Abbreviations ........................................................................................................ 3
Table of Contents .................................................................................................. 4
Declaration ........................................................................................................... 8
Abstract ................................................................................................................. 9
1.
Introduction ...............................................................................................10
1.1
Aims and Objectives ............................................................................................................................................. 10
1.2
Outline of Thesis ................................................................................................................................................... 10
2.
Literature Review ......................................................................................12
2.1
Introduction to the exploitation of biological sound ............................................................................................... 12
2.2
The study of breathing pattern .............................................................................................................................. 12
2.3
Respiratory rate monitoring .................................................................................................................................. 13
2.4
Sound Analysis and Feature Extraction ............................................................................................................... 13
2.4.1
Filter Analysis of Sound Signal .................................................................................................................. 14
2.4.2
Feature Extraction .......................................................................................................................................... 14
2.5
Digital signal Characteristics in Time Domain ...................................................................................................... 14
2.5.1
Short-time Energy .......................................................................................................................................... 15
2.5.2
the Average Zero Crossing Rate of Short-time .................................................................................... 15
2.6
Digital signal Characteristics in Frequency Domain ....................................................................................... 15
2.6.1
the Linear Predictive Coding (LPC) parameters .................................................................................. 15
2.6.2
the algorithm for LPC parameters ............................................................................................................. 16
2.6.3
the Linear Predictive Cepstrum Coefficients (LPCC) ......................................................................... 18
2.6.4
the Mel Frequency Cepstrum Coefficients (MFCC) ............................................................................ 18
2.7
3.
Artificial Neural Network ....................................................................................................................................... 20
Requirements Analysis and Functional Specification ...............................22
3.1
Functional requirements ....................................................................................................................................... 22
3.2
Non-functional requirements ................................................................................................................................ 24
3.2.1
The frequency range to use ........................................................................................................................ 25
3.2.2
Placement of the sensor .............................................................................................................................. 25
3.3
4.
Summary .............................................................................................................................................................. 25
Design ........................................................................................................27
4.1
MATLAB ............................................................................................................................................................... 27
4.1.1
Introduction of the related Matlab function ............................................................................................. 27
4.1.1.1 Short time spectrum analysis ..................................................................................................................... 27
4.2
Equipment ............................................................................................................................................................ 28
4.2.1
Acoustic sensor ............................................................................................................................................... 28
4.2.2
Sound Recorder .............................................................................................................................................. 29
4
4.3
System architecture .............................................................................................................................................. 29
4.4
Data modeling ...................................................................................................................................................... 31
4.5
Analyzing methods ............................................................................................................................................... 32
4.6
Sound Signal Pre-processing ............................................................................................................................... 33
4.6.1
Cutting off frequency band .......................................................................................................................... 33
4.6.2
Filter Design ..................................................................................................................................................... 34
4.6.3
End-point Detection of the Signal ............................................................................................................. 36
4.6.3.1 Short-time average amplitude method .................................................................................................... 37
4.6.3.2 Short-time energy method ........................................................................................................................... 37
4.6.3.3 Short-time average zero crossing rate method .................................................................................... 37
4.7
the principle of Back-propagation Neural Network .............................................................................. 38
4.7.1
Feed-forward Calculation ............................................................................................................................. 39
4.7.2
the rules of weights adjustment in BP Neural Network ...................................................................... 39
4.7.3
5.
The breath pattern classification flowchart ............................................................................................. 40
Implementation ..........................................................................................41
5.1
Pre-processing ................................................................................................................................................ 42
5.1.1
Digital Filter Applications .............................................................................................................................. 42
5.1.2
Apply filter to the digital signal .................................................................................................................... 43
5.2
Principles of Spectrum Analysis and Display ........................................................................................ 44
5.2.1
Short Time FFT Spectrum Analysis of Discrete Signal ...................................................................... 44
5.2.2
the dynamic spectrum display of the Pseudo-color coded Mapping ............................................. 45
5.2.3
Broad-band spectrum and Narrow-band spectrum ............................................................................. 46
5.2.4
Pseudo-color mapping and display of the spectrum ........................................................................... 46
5.2.5
Implementation within Matlab ..................................................................................................................... 47
5.2.5.1 function specgram(FileName, Winsiz, Shift, Base, Coltype); ........................................................... 47
5.2.5.2 display the pseudo-color mapping graph ................................................................................................ 48
5.3
Feature Extraction .......................................................................................................................................... 49
5.3.1
Introduction to End-point Detection .......................................................................................................... 49
5.3.2
End-point Detection Error ............................................................................................................................ 50
5.3.3
the Zero Crossing Rate (ZRO) ................................................................................................................... 50
5.3.4
High-order Difference .................................................................................................................................... 51
5.4
Back-propagation Neural Network Algorithm and Implementation ................................................ 53
5.4.1
design of the artificial neural network ...................................................................................................... 53
5.4.2
Back-propagation neural network implementation .............................................................................. 53
5.4.2.1 initialization of the network .......................................................................................................................... 53
5.4.2.2 training samples .............................................................................................................................................. 54
5.4.2.3 calculate the actual output of the network .............................................................................................. 54
5.4.2.4 adjust the weights........................................................................................................................................... 54
6.
6.1
Evaluation ..................................................................................................58
Interface and Controls .................................................................................................................................. 58
5
6.2
End-point Detection Evaluation ................................................................................................................. 61
6.3
Breath Pattern Detection Evaluation ........................................................................................................ 63
7.
7.1
Conclusion .................................................................................................65
Future Work...................................................................................................................................................... 65
References ...........................................................................................................66
Appendix A ..........................................................................................................68
Appendix B ..........................................................................................................69
Appendix C ..........................................................................................................70
Appendix D ..........................................................................................................72
Appendix E ..........................................................................................................73
Appendix F ..........................................................................................................76
6
Table of Figures
Figure 1 Requirements analysis is in the first stage (Wikipedia) .......................................................... 22
Figure 2 System Use Case Diagram .................................................................................................... 23
Figure 3 System Sequence Diagram .................................................................................................... 24
Figure 4 An example of MATLAB simulation ........................................................................................ 27
Figure 5 Acoustic Sensor ...................................................................................................................... 29
Figure 6 Recorder ................................................................................................................................. 29
Figure 7 Proposed System Architecture ............................................................................................... 30
Figure 8 Proposed System Collaboration Diagram .............................................................................. 31
Figure 9 Data Flow Diagram ................................................................................................................. 32
Figure 10 Audio Signal Analysis ........................................................................................................... 33
Figure 11 Low Pass Filter at cutoff frequency 1000 (Hz) ...................................................................... 35
Figure 12 Bandpass filter at frequency range from 110 to 800 (Hz) ..................................................... 35
Figure 13 Frequency Response of several low-pass filters .................................................................. 36
Figure 14 Classify the breathing pattern ............................................................................................... 40
Figure 15 the overall procedure flowchart ............................................................................................. 41
Figure 16 Pre-processing flowchart ...................................................................................................... 42
Figure 17 the signal pass through a low-pass filter .............................................................................. 44
Figure 18 the spectrum of the audio signal ........................................................................................... 48
Figure 19 the spectrum of the audio signal ........................................................................................... 48
Figure 20 Feature Extraction flowchart ................................................................................................. 49
Figure 21 Zero Crossing Rate ............................................................................................................... 51
Figure 22 End-point Detection using the ZCR and HOD ...................................................................... 52
Figure 23 design of the tow-layers artificial back-propagation neural network ..................................... 53
Figure 24 BP Neural Network training algorithm flowchart ................................................................... 55
Figure 25 The main interface ................................................................................................................ 58
Figure 26 The Control part of the interface ........................................................................................... 58
Figure 27 One sound file opened by the program. ............................................................................... 59
Figure 28 Prompt that inf ....................................................................................................................... 59
Figure 29 Displaying spectrum ............................................................................................................. 60
Figure 30 Result displaying text box ..................................................................................................... 60
Figure 31 Detection result showing hint ................................................................................................ 61
Figure 32 Original signal wave display ................................................................................................. 61
Figure 33 Mouth sound End-point Detection ........................................................................................ 62
Figure 34 Nasal sound End-point Detection ......................................................................................... 62
Figure 35 Mixed breath pattern sound End-point Detection ................................................................. 63
Figure 36 Nasal breath only breathing pattern detection ...................................................................... 64
Figure 37 Breath pattern detection for mixed breathing ....................................................................... 64
7
Declaration
“I hereby declare that for a period of two years following the date on which the dissertation is
deposited in the Library of the University of Ulster, the dissertation shall remain confidential
with access or copying prohibited.
Following the expiry of this period I permit the Librarian of the University of Ulster to allow
the dissertation to be copied in whole or in part without reference to me on the understanding
that such authority applies to the provision of single copies made for study purposes or for
inclusion within the stock of another library. This restriction does not apply to the copying or
publication of the title and abstract of the dissertation. IT IS A CONDITION OF USE OF
THIS DISSERTATION THAT ANYONE WHO CONSULTS IT MUST RECOGNISE
THAT THE COPYRIGHT RESTS WITH THE AUTHOR AND THAT NO QUOTATION
FROM THE DISSERATION CAN NO INFORMATION DERIVED FROM IT MAY BE
PUBLISHED UNLESS THE SOURCE IS PROPERLY ACKNOWLEDGED.”
[Peng Yuan]
8
Abstract
The suggestions of changing from the mouth breathing to the nose breathing have been well
recognized by either patients or healthy person and have a positive impact on the daily life. This
project is trying to discriminate the nasal and mouth breathing patters in a pre-recorded sound file by
an acoustic sensor, and is further aims to detect or classify the mouth or nasal breathing pattern by a
artificial back-propagation neural network.
Two participants involved in this experiment to record the breath sound file, and several recordings
have been done over approximate half a minute period for each file sitting on a chair in a quiet room.
The first purpose of the this project is to investigate the recognition rate of classifying the breathing
patterns, and if that is high enough for identifying the differences in the two patterns the second issue
is to detect them and then try to integrate the result into a intelligent device with a alarming system as
well as the motivational feedback for the patients to help them change the pattern from mouth to
nasal.
The result in this project illustrate that the breath pattern could be discriminated in certain place of
the body both by visual spectrum or the BP neural network classifier self built. The sound file recoded
in from the sensor placed on hollow show the most promising accuracy as high as above 90%.
However, the performance for the mixed breath pattern is not as good as the single breath pattern
either with nasal or mouth and the reasons have also been analysis theoretically.
9
1. Introduction
It is well known that change the breathing pattern from the mouth to the nose is good for the
individual health as recommended by the doctors either for a healthy person or a patient. The
purpose of this project is firstly to investigate the principles of automated discrimination of
breathing patterns using an acoustic sensor, and if the two breathing types can be classified
with high accuracy for some certain locations, this project is secondly trying to program and
integrate it to a decision support system device so that it can discriminate those difference,
and also try to optimise the algorithms to make the motivational feedback system more
intelligence and to make sure that it can work in various environments with improved
classification accuracy.
1.1 Aims and Objectives
Two participants involved in this experiment to record the breath sound file, and several
recordings have been done over approximate half a minute period for each file sitting on a
chair in a quiet room.
The first purpose of the this project is to investigate the recognition rate of classifying the
breathing patterns, and if that is high enough for identifying the differences in the two
patterns the second issue is to detect them and then try to integrate the result into a intelligent
device with a alarming system as well as the motivational feedback for the patients to help
them change the pattern from mouth to nasal.
1.2 Outline of Thesis
There are seven chapters of all in the final report, an overview of the content of each chapter
is listed as below:
Chapter 2 presents the development of using biological sound as reference to diagnose
disease along with the attempt to monitor the status inside the body to assist curing. In the
meanwhile some related information about the background of this project, include relevant
research of sound signal analysis and processing method, the artificial neural network.
10
Chapter 3 performs the requirements analysis, which includes functional requirement and
non-functional requirement as well as the use case diagrams and sequence diagrams based on
the user requirements. A summary of the specification of this project is also presented.
Chapter 4 gives a briefly introduction of the programming language and analysis tools as well
as the equipments that will be used in this project. And then presents the initial overall system
architecture design and data flow diagram along with the proposed specification of
implementation, the digital sound signal analysis methods have also been initially designed as
well as the BP neural network.
Chapter 5 detailed the implementation of each technology pre-designed that used within this
project such as band-pass filter, end-point detection, pseudo-colour display and the backpropagation neural network.
Chapter 6 is the evaluation stage that tests the performance of this application. The interface
has been briefly introduced in this section and the result of the detection is displayed and
analyzed with several figures.
Chapter 7 is the final chapter in this report that concludes a summary of this project and the
proposed work remained to do in the future. The summary conclude all the things have been
done so far and give a briefly discussion about the result obtained as well as the problem that
still exits to be a good issue for the future work.
11
2. Literature Review
2.1 Introduction to the exploitation of biological sound
The normal lung sounds have the interaction variations which is universally known.
Meanwhile the lung sounds have the variability both in a single day and several days
continued (Mahagna and Gavriely, 1994). As fundamental of these major variations, nasal
and mouth of specific changes will only be seen from a larger amount of discussion of
subjects. The aim that analysis the breathing pattern within the help of the computer is to
understand and store them objectively. As the hearing system of human beings attenuate at a
lower frequency, especially below 300 (Hz), a computer -aided devices that could record the
sound within that low frequency will be essential helping for sound recognition. Since the
equipment is invasive and the whole procedure does not cost much, it has the potential to be
used for a healthy person even for pneumonia patients which are regards as a high risk group.
Thus the analysis of lung sound spectra graph could be used for incipient pneumonia patients
to avoid the appearance of any radiologic abnormality or turned into even worse situation in
the daily life.
2.2 The study of breathing pattern
(Forgacs et al., 1971) illustrate that the intensity of breath sounds at the mouth has a
relationship with the forced expiratory volume within one second of the unhealthy person
with chronic bronchitis and asthma. There lies a large potential possibility to assess whether
the ventilatory system have anything to do with the respiratory sound signal due to
development of the modern signal processing high technology. The major issue is focusing
on the variable wheezing sounds that the distribution of the frequency band could clearly be
seen from the lung sound spectra graph. (Malmberg et al., 1994) has a study of investigating
the connection of the ventilatory system and frequency spectrum of the breathing sound
between a heathy person and a patient with asthmatic. They found that in asthmatics, the
breath sound frequency distribution in terms of median frequency reflected acute changes in
airways obstruction with high sensitivity and specificity (Malmberg et al., 1994) found that
the frequency distribution of the breathing sound, especially in the middle frequency band,
shows the acute diversification of the obstacles in airways. The patient emergency are system
12
has been influenced by a large number of signs information (Brown, 1997). The respiratory
rate is a vital sign in the way that reflect the potential illness accurately and if be correctly
used is a essential marker of metabolic dysfunction to help decision making in a hospital
(Gravelyn and Weg, 1980; Krieger et al., 1986). Primary clinical courses show the
importance of the changes in breathing rate and the requirement for long day use, invasive
and reliable respiratory monitoring devices has long been arisen.
2.3 Respiratory rate monitoring
The Respi-Check Respiratory Rate Monitor (Intersugical, 2009) is suitable for adults and
children in an older age to use. This electronic device apply an infrared sensor to the
breathing Respi-Check mask as a indicator (Breakell and Townsend-Rose, 2001). Respiratory
rate is then detected continuously and the result is shown on the digital screen. The preinstalled audio and visual alarms system will be activated if no breathing has been detected
for a continue 10 seconds. Another particular alarming system works for cable disconnection
and battery changing. Researchers at the University of Arkansas established and measured
two similar with slightly difference biosensors that has the function of detecting important
physiological signs. Smart vests and fabrics are the typical organic semiconductors which
enable the manufactures to make light, flexible devices that could be integrated with
biomedical applications easily.
2.4 Sound Analysis and Feature Extraction
Based on the research of the characteristics of the human voice and hearing, a lot of
theoretical models of the sound signal analysis have been developed by researchers, such as
Short
Time
Fourier
Transform,
Filter
Analysis,
LPC
Analysis
and
Wavelet
Analysis(Jedruszek; Walker, 2003). These theories have been widely used for Sound Coding,
Sound Synthesis and Feature extraction of the Sound. In Linear Predictive Coding (LPC), the
feature of the sound could be extracted by calculating the coefficients in different order of the
linear predictive; The Filter Analysis Theory first filter out the frequency of the sound signal
by using a bandpass filter, then extract the frequency feature based on simulating the function
of the hearing system of the biological nerve cells.
13
2.4.1 Filter Analysis of Sound Signal
The main part of the Filter Analysis Technology is a bandpass filter which is used to
separate and extract the different and useful frequency band of the signal. However, a
complete and perfect Filter Analysis Model should be a bandpass filter that followed by nonlinear processing, low-pass filter, resampling by a lower sampling rate and compression of
the signal’s amplitude process. The common function for non-linear processing is Sin
function and Triangular windowing function. In order to smooth the sudden changing part of
the signal, it should pass through a low-pass filter after the non-linear processing. The
Alternative process which re-sample the signal by a lower sampling rate or compress the
amplitude aims for reducing the calculation at later stage.
2.4.2 Feature Extraction
The fundamental problem of the sound signal recognition lies on what and how to choose the
reasonable features and characteristics of the signal. Sound signal is a typical time-varying
signal, however if zoom in the time of the signal to observe it at a milli-seconds level the
sound signal shows some certain periods that seems to be a stable signal to some extents, then
the features are extracted from several stable signals to represents the original signal.
In general, the characteristics parameter of the sound signal fall into two types, one is the
features in time domain and the other is the features in frequency domain after transformation
of the signal. Usually, just the sampling values in one frame, such as the average amplitude of
short-time or the average zero crossing rate of short-time, could constitute the characteristics
parameter in time domain. But the another type of features are obtained by transforming the
original signal into frequency domain, for example Fast Fourier Transform, to get the LPCC
or MFCC features of the signal. The form type has the advantage of simple calculation but
has a large dimensions of feature parameters and not suitable for represent the amplitude
spectrum features. On the contrary, the latter type has a quite complex calculation of
transforming, but could characterise the amplitude spectrum of the signal from several
different angles.
2.5 Digital signal Characteristics in Time Domain
14
2.5.1 Short-time Energy
The short-time energy of the sound signal reflects the characteristics of amplitude over the
time, the mathematical formula description is:
∞
En = ∑ [x(m) w(n − m]2
m=−∞
2.5.2 the Average Zero Crossing Rate of Short-time
In the discrete time domain signal, if the adjacent sampling value has a different algebraic
symbol, such as 3 followed by -1 or -2 followed by 2, the situation is called zero crossing
(Mocree and Barnwell; Molla, and Keikichi, 2004). As the sound signal is a broad-band
signal, in order to extract the feature precisely a short-time transform should apply to the
original signal named Short-time Zero Crossing Rate, defined as:
∞
Zn = ∑ |sgn[x(m) − sgn[x(m − 1)]| w(n − m)
m=−∞
Where
sgn[x(n)] = {
1
0
x(n) ≥ 0
x(n) < 0
And
1
w(n) = {2N ,
0,
0≤𝑛≤𝑁−1
π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’
2.6 Digital signal Characteristics in Frequency Domain
2.6.1 the Linear Predictive Coding (LPC) parameters
The LPC analysis is based on the theory that the signal in this moment could be
15
approximately figured out by the linear combination several signals before. By minimise the
average variance between the actual sampling value and the linear predictive sampling value,
the LPC parameters could be obtained.
2.6.2 the algorithm for LPC parameters
For a linear predictive model, the value of the n point S(n) is expressed by the linear
combination of several (p) sample points before n:
s(n) ≈ a1 s(n − 1) + a2 s(n − 2) + β‹― + ap s(n − p)
where a1, a2, ... ap is constant, then the above equation could be abstracted as:
p
s(n) = ∑ ak s(n − k) + Gu(n)
k=1
where Gu(n) is the normalised impulse response and the product of the gain coefficients.
Then the approximate system output is defined as:
p
sΜ… (n) = ∑ ak s(n − k)
k=1
so the approximate error of the system is:
p
e(n) = s(n) − sΜ… (n) = s(n) − ∑ ak s(n − k)
k=1
As the linear predictive error is equals to the production of impulse and gain coefficients, that
is:
e(n) = Gu(n)
define the short-time sound signal and error as below:
Sn (m) = Sn (n + m)
en (m) = en (n + m)
16
then the sum of the squares error is:
2
p
En = ∑ e2n (m) = ∑[sn (m) − ∑ ak sn (m − k)]
m
m
k=1
calculate the derivation of the LPC parameters that in the above equation in different orders
and set the result to zero respectively, the following equation is then obtained:
p
∑ sn (m − i)sn (m) = ∑ aΜ…k ∑ sn (m − i)sn (m − k)
m
k=1
m
Based on the related function:
φ(i, k) = ∑ sn (m − i)sn (m − k)
m
then:
p
φ(i, 0) = ∑ aΜ…k φn (i, k) k = 1, 2, … p
k=1
the above equation contains several equations and variables, and the LPC parameters could
be obtained by solving the equations
The minimum average squares error of the system is expressed as:
p
p
Μ…n = ∑ sn2 (m) − ∑ aΜ…k ∑ sn (m)sn (m − k) = φ(0,0) − ∑ aΜ…k φn (0, k)
E
m
k=1
m
k=1
There are many ways to solve the above equations, such as autocorrection (by Durbin),
Covariance method, etc. The recurrence formula of Durbin’s method is:
En (0) = R n (0)
π‘˜π‘– =
𝑖−1
𝑅𝑛 (𝑖) − ∑𝑖−1
𝑗=1 π‘Žπ‘— 𝑅𝑛 (𝑖 − 𝑗)
𝐸𝑛𝑖−1
17
(i)
ai = k i
(i)
(i−1)
aj = aj
− k i ai−1
i−j
(i)
1≤j<𝑖
(i−1)
En = (1 − k12 En
where the superscript i represent the i time iteration, and only calculate and update the a1,
a2, ... , ai in each iteration until i = p.
2.6.3 the Linear Predictive Cepstrum Coefficients (LPCC)
In the sound recognition system, the LPC parameters are seldom used directly but instead
using the Linear Predictive Cepstrum Coefficients (LPCC) derived by the LPC, and the
cepstrum could increase the stability of the coefficients. Then recurrence relation between
LPC and LPCC are as below:
c0 = log G2
m−1
cm = am + ∑
k=1
m−1
cn = ∑
k=1
k
c a
m k m−k
k
c a
m k m−k
1≤m<𝑝
m>𝑝
where C0 is the DC component.
2.6.4 the Mel Frequency Cepstrum Coefficients (MFCC)
The LPC parameters are the acoustic feature derived by the research on the voice mechanism
of human beings, but the Mel Frequency Cepstrum Coefficients (MFCC) are obtained by the
achievement of research on the human hearing system. The basic theory is that when two
tones of the similar frequency appears at the same moment, only one tone could be heard by
the human. The Critical bandwidth is the exactly bandwidth boundary where the human feel
the sudden change objectively, when the frequency of the tone is less than the bandwidth
boundary, people usually mistake hearing the two tones as one, and that is called the
18
shielding effect.
Mel calibration is one of the methods to measure the critical bandwidth, and the calculation
of the MFCC is based on the Mel frequency, the transformation with linear frequency is as:
fmel = 2595 log10 (1 +
f
)
700
MFCC is calculate per frame, first get the power spectrum s(n) of the frame by Fast Fourier
Transform, then transform it to the power spectrum under the Mel frequency. But before
doing that, the original signal should pass through several bandpass filters:
Hm (n), m = 0,1, … M − 1,
n = 0, 1, … .
N
−1
2
where M is the number of the filters, N is the frame length.
The process of MFCC calculation is (Mocree, 1995);
a. first get the frame length (N), then apply the Discrete FFT to the signal after preemphasising each frame S(n), the power spectrum S(n) is obtained by squaring the
modulated the result calculated before.
b. get the power value of the S(n) that pass through the M filters Hm(n), that is calculate the
sum of the production of the S(n) and Hm(n) in each discrete frequency, the result is the
parameter Pm, m = 0, 1, ... , M - 1.
c. calculate the natural logarithm of the Pm to get Lm, m = 0, 1 ... , M - 1, then apply the
discrete cosine transform to the result Lm to get Dm, m = 0, 1, ... , M - 1.
d. leave out the D0 that represent the DC component, then D1, D2, ... Dk are regarded as the
MFFC.
However, the standard MFCC only show the static features of the sound signal, to get the
dynamic features, which is more sensitive to human hearing, the differential cepstrum should
be involved which is shown as below:
d(n) =
k
I
∑ i βˆ™ c(n + i)
√∑ki=k−1 i2 i=−k
19
where c and d the parameter of one frame, k is a constant, then the differential coefficient of
the current frame is called the linear combination of the former two frames and latter two
frames, and the equation above is named the first order differential of MFCC. Apply the same
equation to the first order differential of MFCC result in the second differential of it, and so
on. In the actual usage, merger the MFCC with different order difference to form a vector as
the real MFCC of one frame signal.
In the description of the acoustic features, lower order coefficients could not represent the
sound signal precisely but higher order leads to more complicated calculations, so it is very
essential to choose a appropriate order, most sound recognition system use the order range
from 10 to 15 for the LPC, LPCC or MFCC.
2.7 Artificial Neural Network
The Artificial Neural Network (ANN), which is composed of a large number of microprocessing units, is an interconnected nonlinear and self adaptive information processing
system. Inspired by the modern neural science research achievements, the ANN is trying to
simulate the way that the biological neuron network does to process and store a large amount
of information simultaneously.
In the artificial neural network, each processing unit represents different objects such as
features, letters, sentences or meaningful abstract patterns. The whole network is built up by
three main components which is the input layer, the hidden layer and the output layer,
input layer receives the signal and data from the outside world and the output layer gives the
result of that being processed by the network. The hidden is placed between the input and
output layers that could not be observed from the outer system directly, and the number of the
hidden layer is flexible, the more quantity the hidden layers have the more complex
computation the network will have and thus could deal with a larger amount of intricate
information.
The weights between the neurons reflect the connection strength of the processing units, and
the connected relationships between them represent the expression and processing of the
input information. Artificial Neural Network, in general, is a non-procedural, have the self
adaptability and the biological brain style alike information processing system. It is
essentially to adjust the connections and weight between the units to have the ability of
20
parallel and distributed information processing. It is an interdisciplinary fields that involves
neural science, thinking science, artificial intelligence, computer science, etc.
21
3. Requirements Analysis and Functional Specification
The user requirements analysis means to determine the product-specific performance and
document the customer needs that generate the functional and non-functional requirements.
An initial analysis is in a trying to define the project clearly and specify the proposed solution
approach.
Figure 1 Requirements analysis is in the first stage (Wikipedia)
Requirements analysis is the first stage in the systems development process followed by
function specification, software architecture, software design, implementation, software
testing, software deployment and software maintenance.
3.1 Functional requirements
Functional requirements define the function of certain software based on the user requirement
analysis which specifies the inputs, outputs, external interface and other special management
information needs. The basic functional requirements are listed as bellow:
a. Collect the record sound of breath either with single nasal pattern or mouth pattern or
mixed patterns in different place of the body with acoustic sensor.
b. Collect the record sound of breath either with single nasal pattern or mouth pattern or
mixed patterns in two participants.
c. Find the place that has the best performance with the sensor and the highest accuracy in
the pattern detection.
22
d. Try to find whether the time when record the sound file has an effect to the detection
result, say in the morning or in the late after, etc.
e. Discriminate the nasal and mouth breath with mixed patterns in a short time to identify
the shortest time needed for the analysis.
f. Improve the of algorithm to detect the nasal and mouth breath pattern in different of
situations with better performance.
Here is the system Use Case diagram:
Figure 2 System Use Case Diagram
23
Here is the System Sequence Diagram:
Figure 3 System Sequence Diagram
3.2 Non-functional requirements
The non-functional requirements means the requirement does not relate to the functionality
but how it performs its task concerning the attributes such as reliability, efficiency, usability,
maintainability and portability. For this project the non-functional requirement is shown as
bellow:
a. The pre-recorded sound should be discriminated no matter when it was recorded and who
it was recorded from.
24
b. The discrimination should perform within the specific time such as when breathe with a
certain pattern last for 3 seconds it can then be detected
c. The device should be easy to use for end users, for non-invasive to human body and nonobstruct to the daily life.
d. Easy to transfer the program from one device to another.
e. Facilitate to maintenance in the follow-up usage.
3.2.1 The frequency range to use
For a healthy person, the frequency band of the vesicular breathing sounds is range from 0 all
the way to 1000 (Hz), meanwhile the power spectrum shows the main energy lies between
the frequency 60 and 600 (Hz) (Pasterkamp et al., 1997). (Gross et al., 2000) also illustrate
some other sound like wheezing has been carried by the frequency over 2000 (Hz). The
general lung sound detection using the frequency in low, middle and high band which is 100
to 300, 300 to 600, 600 to 1200 (Hz) respectively (Gross et al., 2000; Sanchez and
Pasterkamp, 199; Soufflet et al., 1990; Shykoff et al., 1988). So this project focus on the
frequency band 1 to 1200 (Hz).
3.2.2 Placement of the sensor
This project focus on the frequency below 1200 (Hz), in order to explore the different voices
in different locations, the sensor has been placed in five areas of the body, the chest, chin,
hollow, right shoulder and left shoulder. And the performance in each place should be
assessed before feature extraction and pass through the BP neural network. As a very
sensitive sensor is used, the noise really misleading the judgments of the detection, so get rid
of the noise is the first consideration before the pre-processing
3.3 Summary
The first step is to find the best place for the senor and build the recording system that record
the breath sound inside the body as digital file for analyzing. Before the sound analysis there
is a pre-processing to get rid of the noise to facilitate the detection in a later stage. Some
place has a good performance for the analysis system and others do not, this project also has
a assessment of different people and recorded in different time of the day. Certain frequency
25
band has different usage in different steps of the whole project, this is the major problem
need to be figured out. Discriminate the breath pattern in the daily left outside the laboratory
is the final attempt of the report, but to all (Baird and Neuman, 1992) know, such a frequently
used device have not come out into the products level.
26
4. Design
4.1 MATLAB
MATLAB is a numerical computing environment and fourth generation programming
language (Wikipedia, MATLAB). It is actually a software package for engineering analysis
and powerful enough to meet almost anything an engineer could desire. MATLAB has a
great graphics capability that simple for users to draw just about anything which are required
and also a powerful simulating capability that the analysis toolbox consist of hundreds of
simulators with which the engineer can simulate a program to see how mimic it can perform,
and hence do the modification and improvement.
Figure 4 An example of MATLAB simulation
4.1.1 Introduction of the related Matlab function
This section introduces several Matlab functions that related to spectrum analysis and
display, as well as the equation derivation and the way to choose appropriate parameters.
4.1.1.1
Short time spectrum analysis
A. frame and windowing function (Eva and Anders, 1999)
27
The windowing function that provided by the matlab is hamming(N), hanning(N),
blackman(N) and bartlett(N), N is the window length (frame length). Every windowing
function has its featured characteristics and usage that be used in different situations as
requested, in this case the Hamming window has been added to the original audio signal.
Normally a power of 2 has been chosen as the frame length (N), such as 512 and 1024, to
facilitate the calculation the Fast Fourier Transform (FFT), though any constant could be
applied.
B. Fast Fourier Transform (FFT) function
Function fft(S) is provided by Matlab, where parameter ‘S’ is one frame of the windowed
signal. Notice that the frequency domain sampling value of the real-time signal after FFT is
symmetrical about the mid-point (that is half the sampling frequency), so only the first half of
the result matrix of expression fft(S) is useful.
C. Get the conjugate of the Complex
Matlab provide function conj(Z) to get the conjugate of the complex ‘Z’, Here parameter ‘Z’
is the result matrix of fft(S). This function could also be used to calculate the amplitude
(|X(m, k)|) of expression X(m, k) which is a complex.
4.2 Equipment
The equipments needed to build the system are provided by Axelis Ltd. Here is a briefly
introduction of such equipments.
4.2.1 Acoustic sensor
The acoustic sensor used was supplied by Axelis Ltd. and is covered by United State Patent
No: US 6,937,736 filed by Measurement Specialties Inc in 2005.
28
Figure 5 Acoustic Sensor
4.2.2 Sound Recorder
Along with the acoustic sensor there is a recorder connect to it which can record the sound
that the acoustic sensor sensed directly into a flash drive plugged in.
Figure 6 Recorder
The recorder is designed in an easy to use approach with a flash drive plugged in by USB
port and has a rechargeable battery that enable the user to do the record everywhere
appropriately. And also it will save the sound automatically when begin to record in a high
quality of Windows Audio file to the flash drive ready for analyzing.
4.3 System architecture
System architecture is the overall hardware/software configuration and database design
including subsystem components supporting a particular application. There usually is a
mapping of functionality onto the hardware as well as software components.
29
By creating the system architecture the system disintegrated into small structural elements,
and also subsystems that simplify the problem by dividing the whole system into reasonably
independent pieces that can therefore be solved separately.
Figure 7 Proposed System Architecture
Figure 7 shows how this proposed system works. First the acoustic sensor is wore by a person
needed, then the acoustic sensor could sense the sound inside the body of the user and pass
the sensed data to a sound recorder, after the sound has been processed into a proper type the
sound analyzer will analyze the sound file, if a inappropriate breath pattern has been detected
the analyzer will inform the alarm system to give proper recommendations.
30
Figure 8 Proposed System Collaboration Diagram
4.4 Data modeling
Data modeling is to create a data model that descripts the data flow in a software. It defines
and analyzes the data requirements needed to support a specific software, it also present the
data that associated and define the relationship between the data components and structure.
Here is the data flow diagram:
31
Figure 9 Data Flow Diagram
Figure 9 presents how the data flows inside the whole process. After the sound data has been
sensed by the acoustic sensor it will be stored in the sound recorder ready for processing, then
the processed data will be passed to the analyzer for analyzing.
4.5 Analyzing methods
In the last few years, MATLAB has become the main tool to process data and models
mathematically and been generally used by universities academic research as well as business
product, and its power in dealing with mathematics has been well approved (Brandt, 2005).
One great advantage of using MATLAB for analysis of audio signal is that the user will be
forced to understand the process result more comprehensibly than in the beginning when
even do not know how to use the menu-based software
32
Figure 10 Audio Signal Analysis
This also means once the user passes the threshold that exists originally, he will become
specialized in a particular field of analyzing with MATLAB. In the mean time the
MATLAB’s path is automatically traceable which means the user can have a clear clue about
what happened anywhere in the middle of processing, this is especially important in some
analysis requirement.
4.6 Sound Signal Pre-processing
4.6.1 Cutting off frequency band
Research shows the frequency of lung sound lies mainly below 1000 (Hz) and this project
focus the frequency under 1200 (Hz). Different frequency bands are used for different
function for this experiment, the much lower frequency under 100 (Hz) is cut off to extract
acoustic features and the higher frequency above 1200 (Hz) carry a lot of noise which has
been filtered out at the first stage, the rest in between is used for end-point detection. So the
33
original signal should pass through several bandpass filters to cut out the specified frequency
band.
However, unlike the speech signal come out from the lips that has a attenuation of 6dB/oct,
the pre-recorded sound signal do not have to be pre-emphasized as they came from the
acoustic sensor attached to the hollow.
4.6.2 Filter Design
Based on the theory above, a filter called Butterworth could be designed by the Matlab
function ‘butter’:
[v, u] = butter(order, Wn, function)
where parameter ‘order’ is the order of the filter that results in a better filter effect when use a
larger order but also brings in a larger quantity of calculations, and the length (L) of the
parameter vectors ‘u’ and ‘v’ have a relationship with parameter ‘order’ that:
Lu,v = order + 1
Parameter ‘Wn’ is the normalization value of the frequency that to be filtered. When the
sample frequency is expressed as fs, as the highest frequency that could be processed is fs/2,
if it is the frequency (f) 2000 that aims to be filtered out then:
fs
Wn = f/( )
2
Parameter ‘function’ is a string that indicate the specified function of the filter. For example
function = ‘low’ means a Low Pass Filter, function = ‘high’ represents a High Pass Filter.
34
Figure 11 Low Pass Filter at cutoff frequency 1000 (Hz)
As shown in the above figure Frequency Response, when the original signal pass through the
filter, the different frequency will be applied to the decaying rate from 1 to 0 accordingly. It
could be seen obviously that it is a Low Pass Filter at cutoff frequency 1000 (Hz).
Figure 12 Bandpass filter at frequency range from 110 to 800 (Hz)
Based on the formula Lu,v = order + 1, the higher the order of the frequency is the more
effective the filter is as the parameter vectors ‘u’ and ‘v’ will have a larger length, but it will
35
requires a more complex calculation as well. On the contrary, decreasing the order of the
filter means a smaller length of the ‘u’ and ‘v’, reduced calculation that lead to worse filter
effect.
Figure 13 Frequency Response of several low-pass filters
It is evidently to see from the above figure, the filter becomes increasingly effective when
applies a larger order gradually from 1 to 8.
4.6.3 End-point Detection of the Signal
The features of the sound signal effect the performance of the whole recognition system, end
point detection, that is detect the beginning as well as the ending of the meaningful sound
signal, is the premise of the feature extraction. Many ways could be used for end point
detection in time domain, typical ones are short-time average amplitude, short-time average
zero crossing rate and short-time energy.
36
4.6.3.1
Short-time average amplitude method
Apparently, when the meaningful part of the signal come out, the short-time average
amplitude change obviously. According to this change end points could be detected, and the
equation for calculating short-time average amplitude is:
𝐍
𝐦(𝐒) = ∑ |𝐱 𝐒 (𝐧)|
𝐧=𝟏
4.6.3.2
Short-time energy method
In most of the actual experiment, the concept of short-time average amplitude is usually
substitute by short-time energy to describe the amplitude features of the sound signal, several
ways to calculate the energy are:
N
e(i) = ∑ |xi (n)|
n=1
which is called the absolute energy, and
N
e(i) = ∑ xi2 (n)
n=1
which is called square energy, or
N
e(i) = ∑ log xi2 (n)
n=1
which is named logarithm energy.
The short-time energy increase a lot when the useful part of the signal begins until the end of
that, it then reduces gradually. As discussed the short-time energy is also a good way to
detect the end point.
4.6.3.3
Short-time average zero crossing rate method
37
In the general acoustic sound signal most of the energy lies on the higher frequency band, and
high frequency means a higher zero crossing rate, then the energy would have some
relationship with the zero crossing rate. but the breathing sound is very unlike the normal
speech signals. Firstly, a large part of the recored file are noise as the equipment used is a
very sensitive sensor which recorded the breathing sound inside the body as well as the noise
from the skin and airflow through the skin, sometime the noise is much larger than the useful
breathing sound. Secondly most of the energy lies on the frequency band below 100 (Hz)
which is a frequency band the human beings could barely hear it, this frequency band is also
the most useful band that the features extracted from. So it is uncommon to see the much
lower frequency band has a higher zero crossing rate as the changing rate is larger in that
particular band.
4.7 the principle of Back-propagation Neural Network
Back-propagation learning algorithm is also called BP algorithm, and the artificial neural
network that related is also known as BP network. The BP learning is a supervised multilayered feedforward neural network algorithm.
In a single ANN without hidden layer, the δ learning algorithm could be applied to the input
and output sampling data to train the network. However, the multilayered feedforward
perceptron introduces one or several hidden layers in between whose target output is
unknown to the network, therefore the output error of the hidden layer could not be
calculated directly and the supervised learning algorithm for the single layer perceptron
training could not work out either.
It is vital important that the back propagation refers to the output errors but not feedback the
result of the output to the hidden layers or even to the input layer. The network itself does not
have the feedback function but back propagate the output errors to adjust the connection
weights of the hidden layers and output layer, so the BP network could not be regarded as
nonlinear dynamic systems but a nonlinear mapping system.
38
4.7.1 Feed-forward Calculation
Consider a tow layers neural network, that introduces one hidden layer between the input and
output layers, the direction of the arrows indicates the way that the information flow through
the network. The node pointed by the arrow is called the low layer of the arrow and the node
in the arrow tail is the upper layer of the arrow, then the output of the j node in a given
training samples could be expressed as:
net j = ∑ oi βˆ™ wij
where oi is the output of the i node in the upper layer, wij is the connection weight between
the i node in the upper layer and the j node in current layer, as for the input layer the input is
always equals to the output at any node.
The output (oj) of the j node is the transformation of its input by the expression given below:
oj = fs (net j ) =
1
1 + e−netj
where the output (oj) is taken as the input of nodes in the lower layers.
Abstract the above expression to get the function:
fs (a) =
1
1 + e−a
then the differential expression of the output (oj) is given as:
fs′ (a) =
−1
βˆ™ (−e−a ) = fs (a)[1 − fs (a)]
(1 + e−a )2
4.7.2 the rules of weights adjustment in BP Neural Network
If set the target output of the j node in the output layer as tj, the output error is then obtained
as tj - oj, back propagate this error value from the output layer to the hidden layers and
continually adjust the weights according to the principle of the amendment to decrease the
errors. The error function for the network is:
39
1
e = ∑(t j −oj )2
2
j
In order to have the error (e) a decrease trend, the amendment of the weights should follow
the gradient descent of the error function, that is:
βˆ†ωij = −ϑ
∂e
∂ωij
where η is a gain coefficient that greater than zero.
4.7.3 The breath pattern classification flowchart
Figure 14 Classify the breathing pattern
40
5. Implementation
The whole procedure of this project involves five stages. In the user stage the audio files that
contain the breathing sound are recorded by the acoustic sensor by two people, and there are
three types of sound files one is mouth breathing only and one is nose breathing only and the
third type mix breath patterns with mouth and nose. Both the two of us record the file in a
quiet room with smooth breathing sitting on a chair for about half a minute.
The second stage is called pre-processing, before the feature extraction the pre-recorded
sound file has been filtered out certain frequency band and added window to smooth the
signal as well as Fast Fourier Transform.
Characteristic extraction is in the third stage that involves process like End-point Detection,
passing through band-pass filter, etc.
The Back-propagation Neural Networks is established independently after the feature
extraction, about fifty training data including mouth breathing and nose breathing are trained
by the Neural Network to adjust the weights.
When come to the recognition stage, the testing data are input the Neural Network for
detection using the weights obtained in the training process, and some expert experience are
add to the result of the detection manually. The whole process are shown int the flowchart
below:
Figure 15 the overall procedure flowchart
41
5.1 Pre-processing
The pre-processing procedure also has several steps, after load data from the audio file the
first step is to filter out the noise in the higher frequency above 1200, after that the signal
should be cut into smaller frame and windowed for each one, then apply the Fast Fourier
Transform to the windowed frame, finally we get the spectrum map by pseudo-color
mapping. The steps are illustrated clearly by the graph:
Figure 16 Pre-processing flowchart
5.1.1 Digital Filter Applications
Theoretically, a filter is constituted by two vectors ‘u’ and ‘v’ with the length ‘m’ and ‘n’ for
the vector ‘u’ and ‘v’ respectively, the expression is shown as below:
u = [u1 , u2 , β‹― , um ], u1 = 1
42
v = [v1 , v2 , … , vn ]
When apply the digital filter with the parameter vectors ‘u’ and ‘v’ to a discrete audio signal
s(t), result in the filtered signal S(n) shown as below:
uβˆ™ ∗ S(t) = vβˆ™ ∗ s(t)
after merging the polynomial, S(n) is expressed like this:
S(t) = v1 s[t] + v2 s[t − 1] + … + vn s[t − n + 1] − u2 S[t − 1] + u3 S[n − 2] − β‹―
− um S[t − m + 1]
For instance, choose the particular vector value u = [1] and v = [1/3, 1/3, 1/3, 1/3], then the
output of the filter is:
S(t) = [s(t) + s(t − 1) + s(t − 2) + s(t − 3)]/4
This customized filter take the average value of the previous four points, thus has the effect
of Low Pass, in other words it filters off the high-frequency band of the original signal by
averaging out them but leaving out the low-frequency band relatively. Such kind of filter is
named Low Pass Filter.
5.1.2 Apply filter to the digital signal
In order to filter off the noise in the sound signal, pass the signal through a low-pass filter that
the frequency below 1200 (Hz) will get pass while the frequency above 1200 (Hz) get filtered
out. The original signal and filtered signal are shown below:
43
Figure 17 the signal pass through a low-pass filter
5.2 Principles of Spectrum Analysis and Display
5.2.1 Short Time FFT Spectrum Analysis of Discrete Signal
The spectrum analysis of signal is based on Short Time Fourier Transform (STFT) analysis
of discrete time domain. Discrete time domain sampling signal can be expressed as x(n)
where n = 0, 1, … , N-1 means the sampling point number and N is the signal length. In the
process of the digital signal people usually frame the signal by adding window on it, then
x(n) could be expressed as Xm(n) where n = 0, 1, … , N-1 and 'm' means the number of the
frame, 'n' is the time number of the synchronous frame, N is the sampling points within one
frame known as the frame length. the Discrete Time domain Fourier Transform (DTFT) of
windowed signal Xm(n) could be illustrated as below:
N−1
jω
X(m, e ) = ∑ wm (n) βˆ™ xm (n) βˆ™ e−jωn
n=0
in order to simplify the discrete calculation, the Discrete Fourier Transform (DFT) of wm(n)
* xm(n) has usually been used:
44
N−1
X(m, k) = ∑ wm (n) βˆ™ xm (n) βˆ™ e−
j2πnk
2N , k
= 0, 1, … N − 1
n=0
then the |X(m, k)| is the estimated value of short-time amplitude in terms of one frame
Xm(n). Take m as the time variable, k as the frequency variable then |X(m, k)| is the dynamic
spectrum of signal x(n). Since the Decibel (DB) could be calculated as:
DB(x(n)) = 20 ∗ log10 (|X(m, k)|)
we can get the dynamic spectrum of the signal displayed by DB. Again simplify the
calculation of the |X(m, k)| by Fast Fourier Transform (FFT). (Cooley and Tukey, )
5.2.2 the dynamic spectrum display of the Pseudo-color coded Mapping
Take 'm' as the abscissa, 'k' as the ordinate and the value of |X(m, k)| as the pseudo-color
mapping on the two-dimensional plane, we get the dynamic spectrum of the signal x(n).
Mapping the value of |X(m, k)| to the pseudo-color enables better resolution and visual
effects of the dynamic spectrum as well as the improvement of the diagram's readability. The
method is firstly mapping the minimum value (Xmin) of |X(m, k)| to the normalized zero, the
maximum value (Xmax) of |X(m, k)| to the normalized 1 and the rest of them to the Ci
between 0 and 1 linearly. Secondly, display the the Ci by the mapped color on the monitor. In
order to make full use of the dynamic range of the color space, the appreciated base spectrum
value should be chosen. The value that less than the base is limited on the base and that
greater than the base then be normalized linearly. The color value matrix is expressed as C =
{c(m, k)} then the mapping from |X(m, k)| to c(m, k) is illustrated mathematically as below:
c(m, k) =
B(m, k) − Base
[Max(B(m, k)∀(m, k))] − Base
where:
𝐡(π‘š, π‘˜) = {
|𝑋(π‘š, π‘˜)|,
π΅π‘Žπ‘ π‘’,
45
|𝑋(π‘š, π‘˜)| > 0
|𝑋(π‘š, π‘˜)| ≤ 0
5.2.3 Broad-band spectrum and Narrow-band spectrum
According to the Discrete Fourier Transform (DFT) analysis principle, the frequency
resolution of the spectrum refers to the interval between the discrete frequency, that means
the frequency interval (f0) represented by variable ‘k’ in the expression X(m, k). The value
depends on the frame length N and the sampling frequency of the signal fs. Based on the
Nyquist sampling theorem, f0, fs and N fall into the relationship as below:
f0 = fs /N
As the formula suggested, the frequency interval (f0) has nothing to do with the frequency
that the signal contains. As long as the sampling frequency is a constant, increase the frame
length (N) will result in the higher resolution of the spectrum or the smaller bandwidth that
represented by the ‘k’ in the expression X(m, k), in that case the spectrum will tend to be a
Narrow-band one, otherwise it will be a Broad-band spectrum.
Increase the resolution in frequency domain by using a larger value of N will result in a lower
resolution in time domain of the spectrum. The way to resolve the contradiction is to
introduce the sub-frame by frame shift (N1, N1 < N ) while choosing a larger but appropriate
frame length (N), in this way a spectrum that with balanced resolution in frequency domain
and time domain will be obtained. the sub-frame shift could be illustrated as below:
π‘₯π‘š (𝑛) = π‘₯(𝑛 + 𝑁1 ∗ π‘š),
𝑛 = 0, 1, … , 𝑁 − 1, 𝑁1 < 𝑁
5.2.4 Pseudo-color mapping and display of the spectrum
Pseudo-color mapping function colormap(MAP) is built in Matlab, parameter ‘MAP’ is the
vector, which is a 64 rows by 3 columns matrix, that used for pseudo-color mapping, each
column represents the saturation of the color red, green and blue respectively. For instance,
expression MAP = [0 0 0] means the pure black mapping, expression MAP = [1 1 1] means
the pure white mapping and MAP = [1 0 0] means the pure red mapping. Meanwhile
parameter ‘MAP’ could also be the matrix registered in Matlab, such as MAP = gray shows
the gray mapping linearly, MAP = hot shows the progressively increase of the color
46
saturation from black to red and yellow then to white mapping. MAP = copper shows the
bronze mapping linearly.
The function imagesc(t, f, C) is also built in Matlab for displaying spectrum, where parameter
‘t’ is the time coordinate, ‘f’ is the frequency coordinate and ‘C’ is the color value that the
amplitude of frequency being mapped to pseudo-color. If the whole audio signal has been
framed into the ‘M’ pieces, parameter ‘t’ is an M-dimensional row vector and its value is the
starting point of each frame corresponding to the time. As the useful sampling points in the
frequency domain is only half of the sampling frequency (N/2), parameter ‘f’ is an N/2dimensional row vector and its value is the amplitude of corresponding frequency.
Accordingly, parameter ‘C’ is an M by (N/2) matrix.
5.2.5 Implementation within Matlab
5.2.5.1
function specgram(FileName, Winsiz, Shift, Base, Coltype);
FileName: indicates the digital file that contains the audio signal to be processed. The audio
file is pre-recorded and saved as the ‘wav’ format, the sample values are evaluated to the
matrix ‘Signal’ which is expression x(n) introduced before, and the sampling frequency is
evaluated to fs mentioned in above section.
Winsiz: defines the length of the frame, normally choose the power of 2, for example 1024,
as default value to simply the calculation of FFT. The Broad-band spectrum or the Narrowband spectrum could be obtained by choosing the suitable value of the parameter ‘Winsiz’.
Shift: shows the frame shift value (N1). Generally, N1 is less than or equals to the ‘Winsiz’,
the smaller the N1 is the higher resolution the frequency domain has.
Base: sets the spectrum base value. This value is practical experience dependent, there is no
fix value but choose a appropriate one based on visional and resolution effect of the spectrum
that obtained by using different base values in the experiments.
Coltype: represents the displaying pseudo-color. By default, the function use ‘hot’ as the
pseudo-color mapping. Other values for the parameter ‘MAP’ are: cool, hsv, hone, prism, jet,
copper etc.
47
5.2.5.2
display the pseudo-color mapping graph
Figure 18 the spectrum of the audio signal
where Winsiz = 1024, Shift = 256, Base = -10, Coltype = ‘hot’. This spectrum maps the value
of the frequency that from maximum to minimum to the pseudo-color from brightest to the
darkest accordingly, and cut the specified frequency band from 0 to 1000.
Figure 19 the spectrum of the audio signal
where Winsiz = 2^16 (about 3 sec.), Shift = 2^15 (half the window size), Base = -50, Coltype
= ‘jet’. As could be seen clearly, larger frame length results in a much higher resolution in
frequency domain but lower resolution in time domain.
48
5.3 Feature Extraction
After the pre-processing step, the signal visible by the pseudo-color mapping spectrum and
ready for the feature extraction. The main process in this step is End-point Detection, as the
major energy lies on the certain low frequency band, that is under 110 (Hz) as could be seen
clearly from the red color in the above spectrum graph, the signal has to pass through another
filter that filter out the all the frequency above 110 (Hz), then the features could be extracted
from that particular frequency band. The flowchart for this stage is displayed below:
Figure 20 Feature Extraction flowchart
5.3.1 Introduction to End-point Detection
The aim of End-point Detection (EPD) is to find out the starting point and ending point of the
digital signal which in between is meaningful to the signal processing. There are typically
two ways of end-point detection according to the different characteristic parameters used in
the methods.
49
Features in time domain: Volume and Zero Crossing Rate (ZCR)
a. Volume: simplest way to detect the endpoint but the miner airflow noise will lead to
misjudgment.
b. Volume and ZCR: ZCR could help to get rid of the miner airflow noises when using the
volume as the features to EPD, combination of the two characteristics could handle most
of the detection in a much precision level.
Features in frequency domain: Variance and Entropy in spectrum
a. Variance in spectrum: the effective part of the signal has a regular variation of spectra and
thus smaller variance that could be used as the criterion of the EPD.
b. Entropy in spectrum: the entropy of the digital signal is also much lower between the
ending points that contributes to the detection.
5.3.2 End-point Detection Error
The end-point detection is not always successful and there are also two types of errors.
a. False Rejection: mistaking the meaningful part of the signal as the noise or the silence
part results in the decrease of the audio signal detection rate.
b. False Acceptance: that is mistaking the silence or noise part of the signal as the useful
part also brings down the detection rate.
To avoid or reduce the detection errors, the features of the the silence or noise part of the
signal could be used as a references when design the detector and this is again a pretty
experience dependent work.
5.3.3 the Zero Crossing Rate (ZRO)
ZRO means the amount of zero-crossing value points within one frame, in general the zerocrossing rate of the voice sound is more or less larger than that of the silence sound (under
the conditional of already got rid of the noise), and thus is used to detect the starting-point
and ending-point within this project.
50
When calculate the zero-crossing rate, the importance of the exactly zero value should not be
ignored. The zero value point could be the starting point or the ending point of one
meaningful frame or nothing but a normal value belongs to one frame, and make a good use
of the zero values always do a great contribution to the the end-point detections.
Because the sample values retrieved from the digital audio file have been normalized, in
order to avoid the possibility of increasing the ZRO by using the float number to calculate the
bias, they should be unnormalized by multiplying the bit resolution to get back the original
values.
Figure 21 Zero Crossing Rate
The method one does not counts the zero values into zero-crossing but the method two does,
there is not much big difference in this project as the graph generated by method one has
entirely covered by the method two’s as it is obviously shown in the first plot of the above
figure that the variation of the sampling values is very large.
5.3.4 High-order Difference
In an Arithmetic Progression a1, a2, ... , an, ... where an+1 = an + d, n = 1, 2, ... and ‘d’ so
called common difference is a constant. In a general series expressed as y = y(t)
βˆ†y(t) = y(t + 1) − y(t)
51
where βˆ†y(t) is called the first order difference of y(t) at the point t, so
βˆ†π‘¦π‘‘ = 𝑦𝑑+1 − 𝑦𝑑
is defined as the first order difference of the expression y(t) where βˆ† is the Difference
Operator (Allaberen and Pavel, 2004).
Combine the zero crossing rate with the High-order Difference (HOD) operation enables the
end point detection to achieve a very high precision level. Of cause, here the order should be
adjusted in the practical experiment to decide whether first-order or second-order or even
third-order should be chosen to attain the best performance.
Figure 22 End-point Detection using the ZCR and HOD
It is obviously from the above figure that the method mentioned before has a fine
performance in End-point Detection after filter out the noise in the high frequency (1200 Hz)
and the other information contained a large amount of energy in a very low frequency (110
Hz)
52
5.4 Back-propagation Neural Network Algorithm and Implementation
5.4.1 design of the artificial neural network
The number of the input neurons depends on the dimension of the features that extracted
after pre-process stage, and the number of the output neurons are two that represent mouth
breathing and nasal breathing respectively. The number of the hidden layer neurons is the
twice the input neurons typically but it is still adjustable in the real practice to achieve the
best performance, then the neural network could be designed as below:
Figure 23 design of the tow-layers artificial back-propagation neural network
5.4.2 Back-propagation neural network implementation
5.4.2.1
initialization of the network
As designed above, initialize the only one hidden layer and assign the connection weights
with random numbers between 0 and 1 for the hidden layer and output layer respectively.
53
5.4.2.2
training samples
Normalize the feature values extracted in previous stage to form a vector of n,
𝑛 = (π‘₯1 , π‘₯2 , … , π‘₯𝑛 )
and the two types of target output are t1 = (1, 0) represents mouth breathing and t2 = (0, 1)
for nasal breathing, so each training sample falls into this form:
I1 = (x1 , x2 , … , xn ; 1, 0), or
𝐼2 = (π‘₯1 , π‘₯2 , … , π‘₯𝑛 ; 0, 1)
5.4.2.3
calculate the actual output of the network
the non-linear function:
𝑦𝑗 = [1 + exp(− ∑ πœ”π‘–π‘— βˆ™ π‘₯𝑖 )]−1
𝑖
is used to calculate the output of each node layer by layer without the input layer, finally get
the output as:
𝑂 = (π‘œ1 , π‘œ2 , … , π‘œπ‘š )
5.4.2.4
adjust the weights
Adjust the weights from the output layer to the hidden by using the following equation:
𝑀𝑖𝑗 (𝑁 + 1) = 𝑀𝑖𝑗 (𝑁) − πœ—π›Ώπ‘— π‘œπ‘–
where π‘œπ‘– is the output of the i node in the upper layer.
If j is a node in the output layer, then
54
𝛿𝑗 = π‘œπ‘— (1 − π‘œπ‘— ) βˆ™ (π‘œπ‘— − 𝑑𝑗 )
if j is a node in the hidden layer, then
𝛿𝑗 = π‘œπ‘— (1 − π‘œπ‘— ∑ π›Ώπ‘˜ βˆ™ πœ”π‘—π‘˜
where k represents all the nodes in the lower layer of layer where j node located.
The features of each pattern cycle in the sound file have been extracted right after the endpoint detection process, and the training data is stored in a Matlab format ready for going
through the training neural network. The flowchart for the Back-propagation Neural Network
training algorithm is shown below:
Figure 24 BP Neural Network training algorithm flowchart
.
55
The non-linear function with a ‘S’ shape used for BP Neural Network algorithm is:
𝑓(π‘₯) =
1
1 + 𝑒 −π‘₯
The input to the input layer is:
𝑋𝑝 = {π‘₯𝑝1 , π‘₯𝑝2 , … π‘₯𝑝𝑖 , … π‘₯𝑝𝑁 }
The output of the input layers is:
𝑂𝑝 = {π‘œπ‘1 , π‘œπ‘2 , … π‘œπ‘π‘– , … π‘œπ‘π‘ }
Apply the ‘S’ shape function to the output then:
𝑂𝑝 = 𝑓(𝑋𝑝 )
The input to the hidden layer is:
𝑁
𝑛𝑒𝑑𝑝𝑗 = ∑ π‘Šπ‘—π‘– 𝑂𝑝𝑖 − πœƒπ‘— ,
𝑗 = 1, 2 , 𝑀
𝑖=1
The output of the hidden layer is:
𝑂𝑝𝑗 = 𝑓(𝑛𝑒𝑑𝑝𝑗 ),
𝑗 = 1, 2, … 𝑀
The input to the output layer is:
𝑀
π‘›π‘’π‘‘π‘π‘˜ = ∑ π‘Šπ‘˜π‘— 𝑂𝑝𝑗 − πœƒπ‘˜ ,
π‘˜ = 1, 2 , 𝐿
𝑗=1
The output of the output layer is:
π‘‘π‘π‘˜ = π‘‚π‘π‘˜ = 𝑓(π‘›π‘’π‘‘π‘π‘˜ ),
π‘˜ = 1, 2, … 𝐿
The average square error function is defined as:
L
1
Ep = ∑(ypk −dpk )2
2
k=1
56
Calculate all the Ep for each training sample, then the total error is:
𝑃
𝐸 = ∑ 𝐸𝑝
𝑝=1
Adjust the weights for the hidden and output layers respectively using the gradient descent
method, as well as the threshold will be amended in every loop. The adjusting value for the
weights and threshold between output layer and hidden layer is:
p
βˆ†Wkj = ϑ(ypk − dpk ) βˆ™ f ′ (net pk ) βˆ™ Opj = ϑσpk Opj
𝑝
βˆ†πœƒπ‘˜ = πœ—πœŽπ‘π‘˜
The adjusting value for the weights and threshold between hidden layer and input layer is:
L
p
βˆ†Wji = ϑOpi βˆ™ f ′ (net pj ) βˆ™ ∑ δk Wkj = ϑσpj Opi
k=1
𝑝
βˆ†πœƒπ‘— = πœ—πœŽπ‘π‘—
Where:
σpk = (ypk− Opk )Opk (1 − Opk )
σpj = σpk ωjk Opj (1 − Opj )
57
6. Evaluation
6.1 Interface and Controls
The final interface a simple one but have a complex calculation behind, like the graph
showing below:
Figure 25 The main interface
There are mainly four parts in this interface, the upper left part is the control part, the upper
right part is the result showing part and the lower left lies on the spectrum graph, the fourth
part which draw the signal wave located at the lower right.
Figure 26 The Control part of the interface
58
In the control part the ‘Choose File’ button allows user to choose an audio file in the ‘wav’
format to detect the breath pattern and show the spectrum and signal wave in the lower part
graph areas. The ‘Detect Breath’ button to the right has several functions integrated, first it
detect the end point in the signal file then extract the feature of the signal at a certain
frequency band under 110 (Hz) finally it pass the features to the Back-propagation Neural
Network to detect the breath patterns, the BP Neural Network has already been trained before
the detection.
Figure 27 One sound file opened by the program.
As displayed in the figure above, after chosen one sound file the spectrum as well as the
plotting have been done and show in the lower part of the interface. And before the display of
the graph the pre-processing including the framing, windowing and filter has been done in the
background, thus it needs a little while for the complex calculation, however a small symbol
is right close to the file name label to give the user a clue.
Figure 28 Prompt that inf
59
Figure 29 Displaying spectrum
The lower left part is the spectrum displaying area that display the pseudo-colour mapping
spectra graph. The slider in the button slides the time that allow user to jump to a specific
time to hear the playing of the sound file and also sliding while the sound playing to give the
user a visual feeling where the sound have been play so far. The slider to the right adjust the
frequency on the y-coordinate that actually enable the user to zoom in or out the spectrum to
either get a overview or a more detailed view of the spectrum range from the sampling
frequency to as low as 20 (Hz).
Figure 30 Result displaying text box
The text box showing above is used for result displaying. The detection result are displayed
in the box, before the ‘Detect Breath’ button has been pushed down it give the user a hint that
to push the button to get the result displayed right here.
60
Figure 31 Detection result showing hint
Because the detect procedure of is complicated that a large amount of data have to be
processed it will be a while before the result come out in the box, so in order not to give the
user a feeling that the program is dead, a dynamic text prompt shows to the upper right side
of the box to inform that the data is still processing before the result come out.
Figure 32 Original signal wave display
The plotting area at the lower right corner showing signal wave after been pre-processed.
This form is also used for the End-point Detection result display, the tiny ‘Play’ button
allows user to control the sound file playing and also have a chance to pause or stop while
playing.
6.2 End-point Detection Evaluation
The end-point detection process happens after the ‘Detect Breath’ button has been pushed
down.
61
Figure 33 Mouth sound End-point Detection
As the result above shows, the end-point detection function works fine for the mouth breath
only signal. The blue line indicate the starting point where one breath procedure begins and
the dark red line tells the ending point that one breath has finished.
Figure 34 Nasal sound End-point Detection
The above figure is the nasal breath only sound result of end-point detection, the function
also has a very good performance for this breath pattern.
62
.
Figure 35 Mixed breath pattern sound End-point Detection
As shown clearly above that the end-point detection result is not as good as the previous ones
which has a signle breath pattern, for the mixed breath pattern sound the end-point detection
function only detect out a little more than harlf the breath cycles. The reason for this is that
when a person breath with mouth for a while then change the pattern to nasal breath the
breath sound is usualy smaller than a single breath pattern, if the thresthold for the end-point
detection is too small it will detect more non end-points as not desired, on the contrary if the
threshold is larger a lot true end-points will not be detected. Then here rise the conflection
and the threshold has to be adjusted to fit every situation remains a good topic for the future
work.
6.3 Breath Pattern Detection Evaluation
The breath pattern detection is right after the end-point detection which followed by the
feature extraction. The detection result will show in the text box as illustrated before.
63
Figure 36 Nasal breath only breathing pattern detection
As the result above says, the breath pattern detection works really fine for the nasal breath
only pattern that only one mistake happens in the second breath cycle. The classification rate
is about 90% right for this detection.
Figure 37 Breath pattern detection for mixed breathing
As the same thing happens in the end-point detection process, the result for the mixed breath
pattern do not have the same performance as the single breath pattern. The user has to judge
the result as they have the chance to hear the breath sound clearly enough to tell whether it is
mouth breathing or not.
64
7. Conclusion
It is well known that the breathing pattern changing from mouth to nose do vital impact on
patients respiratory disease even a healthy person.The proposed concept relates to a new
training and monitoring device that will monitor and advise end-users of their breathing
status in relation to their nose breathing versus mouth breathing activity and also deliver
instructions on action required i.e. reversion to proper breathing. The initial step in the
delivery of such a system is the investigation of whether the necessary discriminatory
information when nasal versus breathing can be obtained from acoustic sensors placed at
various positions on the body. The experiment result shows that the difference between nasal
and mouth breath can be discriminated successfully with a high enough accuracy and
therefore tried to program a proper application to integrate the code into a certain device to
try to give appropriate feedback to end-users.
7.1 Future Work
The program does not perform in the same level over the whole procedure, for the end-point
detection part breath with single pattern both mouth and nasal have been detected with 100%
accuracy, however for the mixed breath pattern that breath with mouth first then change to
nose for a while do not get that high accuracy. The reason is that when changing the breath
pattern people usually differ the strength and rate of breathing, that is breathing harder or
quicker. As the end-point detection is the premise of classifying the breath pattern, the best
issue for the future work then lies on the improvement of the end-point detection algorithm
that works in various situations with better performance.
65
References
A.Brandt, J. Tuma, T. Lago, and K. Ahlin (2005) Toolboxes for analysis of sound and vibration
signals within Matlab. Axiom EduTech AB, Technical Univ. of Ostrava, Blekinge Institute of
Technology. Page(s) 1
Allaberen Ashyralyev, Pavel Iosifovich SobolevskiΔ­. (2004) New difference schemes for partial
differential equations. Birkhäuser. P1 ~ 3
Baird, T., Neuman, M. (1992) A Thin Film Temperature Sensor For Measuring Nasal And Oral
Breathing In Neonates, Engineering in Medicine and Biology Society, 1992. Vol.14. Proceedings of
the Annual International Conference of the IEEE, Volume 6, Issue , 29 Oct-1 Nov 1992 Page(s):2511
– 2512
Brown L., Prasad N. (1997) Effect of vital signs on advanced life support interventions for prehospital patients. Prehosp.Emerg.Care 1997 Jul-Sep; 1(3):145-8.
Breakell A, Townsend-Rose C. (2001) Clinical evaluation of the Respi-check mask: a new oxygen
mask incorporating a breathing indicator. Emerg Med J 2001 Sep;18(5):366-9.
Chiarugi F, Sakkalis V, Emmanouilidou D, Krontiris T, Varanini M, Tollis IG. Adaptive Threshold
QRS Detector with Best Channel Selection Based on a Noise Rating System. Computers in
Cardiology 2007;34: 157-160.
Cooley J W, Tudey J W. (1965) An algorithm for the machine computation of complex Fourier
Series[J]. Mathematical Computation, 1965, 19: 296 ~ 302.
Eva Part-Enander, Anders Sjoberg. The MATLAB Handbook [M]. Harlow: Addison-Wesley, 1999
Forgacs P, Nathoo A, Richardson H. 1971) Breath sounds. Thorax 1971; 26:288-95
Gravelyn T., Weg J. (1980). Respiratory rate as an indication of acute respiratory dysfunction. JAMA
1980 Sep;244(10):1123-5.
Gavriely, N., Nissan, M., Rubin, A., and Cugell, D. (1995). Spectral characteristics of chest wall
breath sound in normal subjects. Thorax 50:1292–1300.
66
Jedruszek, jacek. Speech recognition. Alcatel telecommunications Review, 2003, 128-135p
Mahagna, M., and Gavriely, N. (1994). Repeatability of measurements of normal lung sounds. Am. J.
Respir. Crit. Care Med. 149:477–481.
Malmberg, L. Sovzjdrvi, A., Paajanen, E., Piirild, P., Haahtela, T. and Katila, T. (1994) Changes in
Frequency Spectra of Breath Sounds During Histamine Challenge Test in Adult Asthmatics and
Healthy Control Subjects, Chest, Vol. 105, No. 1, pp: 122-133
Mocree A V, Barnwell T P. A mixed excitation LPC vocoder model for low bit rate speech coding
[A]. IEEE Trans, on speech and audio processing[C]. 1995, 3: 242-250p
Molla, MdKhademul Islam Hirose, Keikichi. On the effectiveness of MFCCs and their statistical
distribution properties in speaker identification. 2004 IEEE Symposium on Virtual Environments,
Human-computer Interfaces and Measurement Systems, VECIMS. 2004, 136-141p
Pasterkamp H, Consunji-Araneta R, Oh Y, and Holbrow J. (1997) Chest surface mapping of lung
sounds during methacholine challenge. Pediatr Pulmonol 23: 21–30, 1997.
Roger Jang. Audio Signal Processing and Recognition.
http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/filterApplication.asp?title=111FilterApplications
Sun, Java SE Desktop Technologies
http://java.sun.com/javase/technologies/desktop/media/jmf/
Walker Shonda Lachelle. Wavelet-based feature extraction for robust speech recognition;
[dissertation]. The Florida State University, 2004, 50-58
Wikipedia, Requirements analysis http://en.wikipedia.org/wiki/Requirements_analysis
Wikipedia, MATLAB http://en.wikipedia.org/wiki/Matlab
67
Appendix A
%% design of low pass filter
==============================================================
fs=8000;
% Sampling rate
filterOrder=5;
% Order of filter
cutOff=1200;
% Cutoff frequency
[b, a]=butter(filterOrder, cutOff/(fs/2), 'low');
% === Plot frequency response
[h, w]=freqz(b, a);
plot(w/pi*fs/2, abs(h), '-'); title('Magnitude frequency
response');
xlabel('Frequency (Hz)');
ylabel('Magnitude');
legend('low pass filter');
grid on
%% design of Bandpass filter
==============================================================
fs=3000;
% Sampling rate
filterOrder=8;
% Order of filter
cutOff=[110 800];
% Cutoff frequency
[b, a]=butter(filterOrder, cutOff/(fs/2), 'bandpass');
% === Plot frequency response
[h, w]=freqz(b, a);
plot(w/pi*fs/2, abs(h), '-'); title('Magnitude frequency
response');
set(gca, 'XTick', (0:300:1500)); %set X Tick
xlabel('Frequency (Hz)');
ylabel('Magnitude');
legend('bandpass filter');
grid on
68
Appendix B
%% compare several low pass filters in one graph
==============================================================
fs=8000;
% Sampling rate
cutOff=1000;
% Cutoff frequency
allH=[];
for filterOrder=1:8;
[b, a]=butter(filterOrder, cutOff/(fs/2), 'low');
% === Plot frequency response
[h, w]=freqz(b, a);
allH=[allH, h];
end
plot(w/pi*fs/2, abs(allH)); title('Frequency response of a
low-pass utterworth filter');
xlabel('Frequency (Hz)');
ylabel('Magnitude');
legend('order=1', 'order=2', 'order=3', 'order=4', 'order=5',
'order=6', 'order=7', 'order=8');
%% apply filter to signal
==============================================================
cutOff=1200;
% Cutoff frequency
filterOrder=5;
% Order of filter
[x, fs, nbits]=wavread('sample.wav');
[b, a]=butter(filterOrder, cutOff/(fs/2), 'low');
x=x(:,2);
% 30-second signal
y=filter(b, a, x);
% ====== Plot the result
time=(1:length(x))/fs;
subplot(2,1,1);
plot(time, x);
xlabel('Time (sec.)');
ylabel('Energy');
legend('original signal');
grid on
subplot(2,1,2);
plot(time, y, 'k');
xlabel('Time (sec.)');
ylabel('Energy');
legend('filtered signal');
grid on
69
Appendix C
%% pseudo-color mapping
==============================================================
signal, fs, nbits] = wavread('both');
signal = signal(:,2);
lengthSignal = length(signal);
winSize = 2^(nextpow2(fs*3) - 1); %window size
shift = winSize/2; %shift
base = -50; %base
colorType = 0; %color type
frameNum = floor((lengthSignal - winSize)/shift) + 1;
A = zeros(winSize/2 + 1, frameNum);
for i = 1:frameNum
n1 = (i - 1)*shift + 1; %start point of one frame
n2 = n1 + (winSize - 1); %end point of one frame
frame = signal(n1:n2); %one frame
frame = frame.*hamming(winSize); %windowing frame
y = fft(frame);
y = y(1:winSize/2 + 1);
y = y.*conj(y);
y = 10*log10(y); %get the amplitude of frequency
A(:,i) = y;
end
B1 = (A>base);
B0 = (A<base);
B = A.*B1 + base*B0;
C = (B - base)./(max(max(B)) - base);
y = (0:winSize/2)*fs/winSize;
x = (0:frameNum - 1)*shift/fs;
if colorType == 1
colormap(hot);
else
mycoltype = jet;
mycoltype = mycoltype(64:-1:1,:); %reverse the color from
the bottom to top
colormap(mycoltype);
end
imagesc(x,y,C);
axis xy;
colorbar;
colorbar('YTick',0:0.2:1);
title('Spectrum Analysis');
Ylim([0 1000]);
xlabel('Time (sec.)');
ylabel('Frequency (Hz)');
set(gca, 'YTick', (0:200:1000));
coordinate
70
%set the Y tick on Y
set(gca, 'XTick', (0:3:lengthSignal/fs)); %set the X tick on X
coordinate
71
Appendix D
%% zero crossing rate
==============================================================
clc; clear;
fileName='mouth';
frameSize=2^11;
overlap=0;
cutOff=110;
% Cutoff frequency
filterOrder=5;
% Order of filter
[y, fs, nbits]=wavread(fileName);
[b, a]=butter(filterOrder, cutOff/(fs/2), 'high'); % design
the filter
y = y(:,2);
y=filter(b, a, y); % filter the frequency below 110
y = y*(2^(nbits - 1)); % the unnormalized signal sample value
frameSeg=buffer(y, frameSize, overlap);
frameNumber=size(frameSeg, 2);
for i=1:frameNumber
frameSeg(:,i)=frameSeg(:,i)-round(mean(frameSeg(:,i)));
% Zero justification
end
zcrOne = sum(frameSeg(1:end-1, :).*frameSeg(2:end, :)<0);
% Method one
zcrTwo = sum(frameSeg(1:end-1, :).*frameSeg(2:end, :)<=0);
% Method two
time=(1:length(y))/fs;
frameNumber=size(frameSeg, 2);
frameTime=((0:frameNumber-1)*(frameSizeoverlap)+0.5*frameSize)/fs;
subplot(2,1,1);
plot(time, y);
title('Signal Wave');
xlabel('Time (sec.)');
ylabel('Volume');
subplot(2,1,2);
plot(frameTime, zcrOne, '-', frameTime, zcrTwo, '-');
title('Zero Crossing Rate');
xlabel('Time (sec.)');
ylabel('Volume');
legend('Method one', 'Method two');
72
Appendix E
%% End-point Detection
============================================================
clc;clear;
fileName = 'both';
cutRange = [110 1200];
% Cutoff frequency
filterOrder=5;
% Order of filter
[y, fs, nbits]=wavread(fileName);
[b, a]=butter(filterOrder, cutRange/(fs/2), 'bandpass');
%Bnadpass filter to detect the breathing point
y = y(:,2);
signal = filter(b, a, y); % cut the frequency between 110 and
1200
winSize = 1024*4; %window size
shift = winSize/2; %shift
base = -10;
frameNum = floor((length(signal) - winSize)/shift) + 1;
A = zeros(winSize/2 + 1, frameNum);
for i = 1:frameNum
n1 = (i - 1)*shift + 1;
n2 = n1 + (winSize - 1);
frame = signal(n1:n2);
frame = frame.*hamming(winSize);
y = fft(frame);
y = y(1:winSize/2 + 1);
y = y.*conj(y);
A(:,i) = y;
end
%apply Base value
L1 = (A>base);
L0 = (A<=base);
B = A.*L1 + base*L0;
D = (B - base)./(max(max(B)) - base); %normalise the value
%y = (0:winSize/2)*fs/winSize;
time = (0:frameNum - 1)*shift/fs;
energy = abs(sum(D));
energy = [diff(energy) 0]; %first order difference
threshold = 0.06;
dd1 = abs(energy) < threshold;
dd2 = abs(energy) >= threshold;
energy = 0.*dd1 + energy.*dd2;
endPoint = zeros(size(energy));
for i = 2:length(energy)-2;
the point
%from begin to end, find out
73
sh = energy(i:i+2); %one small frame
if isequal(sh, [0,0,0])
if energy(i-1) ~= 0
endPoint(i) = 2; %identified one end-point
end
end
end
for i = length(energy)-1:-1:3; %from end to begin, find out
the point
sh = energy(i-2:i); %one small frame
if isequal(sh, [0,0,0])
if energy(i+1) ~= 0
endPoint(i) = 2; % identified one end-point
end
end
end
for i = 1:length(endPoint); %delete the point which breathing
duration less than 0.3 second
if endPoint(i) ~= 0
for j = i+1:length(endPoint); %find out the nearest
next end-point
if endPoint(j) ~= 0
break;
end
end
if energy(i+1) ~= 0 && time(j) - time(i) < 0.5
endPoint(i) = 0; %delete the wrong end-point
endPoint(j) = 0; %delete the wrong end-point
end
end
end
x=(1:length(signal))/fs;
subplot(211);
plot(x, signal);
title('End-point Detection');
set(gca, 'XTick', (0:1:max(time))); %set the Y tick on Y
coordinate
Ylim([-0.05 0.1]);
xlabel('Time (sec)');
ylabel('Volume (Energy)');
hold on;
grid on;
flag = 0; %to plot a line by different color
for i = 1:length(endPoint); %plot a line at the detecting
points
if endPoint(i) ~= 0
if flag == 0
74
plot([time(i) time(i)], [min(signal) max(signal)],
'k');
flag = 1;
else
plot([time(i) time(i)], [min(signal) max(signal)],
'r');
flag = 0;
end
end
end
legend('signal wave', 'breath start point', 'breath end
point');
subplot(212);
plot(time, energy);
set(gca, 'XTick', (0:1:max(time))); %set the Y tick on Y
coordinate
Ylim([-2 3]);
xlabel('Time (sec)');
ylabel('Volume (Energy)');
hold on;
grid on;
flag = 0; %to plot a line by different color
for i = 1:length(endPoint); %plot a line at the detecting
points
if endPoint(i) ~= 0
if flag == 0
plot([time(i) time(i)], [min(energy)*0.7
max(energy)*0.7], 'k');
flag = 1;
else
plot([time(i) time(i)], [min(energy)*0.7
max(energy)*0.7], 'r');
flag = 0;
end
end
end
legend('signal wave', 'breath start point', 'breath end
point');
75
Appendix F
% train BP Neural Network
============================================================
w_ji=rand(20,10);% ten nodes in input layer, twenty nodes in
hidden layer
w_kj=rand(2,20); % two nodes in output layer
theta_j=rand(20,1); %initial random weights for hidden layer
theta_k=rand(2,1); %initial random weights for output layer
train_num=10000; %training times
train_file=100; %training data number
yita=0.1; %learning rate
precise = 0.01;
% train samples
num = 1;
while num < train_num
file_num=1; %training sample
e=0; %initial error as 0
while file_num<=train_file
for t=1:2
switch t
case 1, x = ta; y=[1 0]';
case 2, x = tb; y=[0 1]';
end
% reading data
x = x(:,file_num);
minx = min(x);
maxx = max(x);
for i=1:10
%normalising data
x(i)=(x(i)-minx)/(maxx-minx);
end
% feedforward algorithm
o_i=1./(1+exp(-x)); %output of input layer10*1
x_j=w_ji*o_i-theta_j; %input to hidden layer20*1
o_j=1./(1+exp(-x_j)); %output of hidden layer20*1
x_k=w_kj*o_j-theta_k; %input to output layer2*1
o_k=1./(1+exp(-x_k)); %output of output layer2*1
% back-propagation algorithm
delta_k=(y-o_k).*o_k.*(1-o_k); %error of output
layer*1
delta_wkj=yita*delta_k*o_j'; %adjust weight
learning rate w_kj=0.1
delta_thetak=yita*delta_k;%adjust weight
delta_j=o_j.*(1-o_j).*(w_kj'*delta_k); %error in
hidden layer
delta_wji=yita*delta_j*o_i'; %
76
delta_thetaj=yita*delta_j;%adjust weight for
hidden layer
w_ji=delta_wji+w_ji;
w_kj=delta_wkj+w_kj;
theta_k=delta_thetak+theta_k;
theta_j=delta_thetaj+theta_j;
e=0.5*sum((y-o_k).^2);
%ex(:,t) = e; for plotting
if e < precise
num = train_num;
end
end
file_num=file_num+1;
end
num=num+1;
end
%plot(exx,'.');
%%recognise testing data
============================================================
file_num=1;
recog_num=0;
total_num=30;
while file_num<=total_num
x = testa(:,file_num);
minx = min(x);
maxx = max(x);
for i=1:10
%normalising data
x(i)=(x(i)-minx)/(maxx-minx);
end
o_i=1./(1+exp(-x)); %output of input layer10*1
x_j=w_ji*o_i-theta_j; %input to hidden layer20*1
o_j=1./(1+exp(-x_j)); %output of hidden layer20*1
x_k=w_kj*o_j-theta_k; %input to output layer2*1
o_k=1./(1+exp(-x_k)); %output of output layer2*1
[y,n]=max(o_k);
if n==1 %recognised successfuly
recog_num=recog_num+1;
end
file_num=file_num+1;
end
rate=recog_num/total_num*100; %detection rate
77
78
Download