BAHAGIAN A – Pengesahan Kerjasama*
I declare that this thesis entitled “Computer-Based Malay Stuttering Assessment
System” is the result from my own research except as cited in references. The thesis
has not been accepted for any degree and is not concurrently submitted in
candidature of any other degree.
Ooi Chia Ai
15 May 2007
To my dearest parents, siblings and friends
for their love and support to make the journey
I would like to take this opportunity to extend my sincere gratitude and
appreciation to many people who made this thesis possible. Special thanks are due to
my supervisor, Associate Professor Dr. Jasmy bin Yunus for his invaluable guidance,
suggestions and full support in all aspects during the research.
Mega thanks go to the Faculty of Electrical Engineering of Universiti
Teknologi Malaysia for the full support in this project. Thanks are also due to UTMPTP for the research funding.
I would like to express my sincere thanks to Hospital Sultanah Aminah and
primary schools located at Skudai district as well, for their assistance in conducting
the numerous experiments during the research.
Finally, special thanks to all my friends for their unwavering support,
concerns and encouragement.
Stuttering has attracted extensive research interests over the past decades.
However, far less effort has been done on the computer-based stuttering assessment
system. Stutterers respond in unique ways to different therapy techniques. The
technique that works so dramatically for one stutterer does not necessarily work
dramatically, or at all, for other stutterers. Stuttering is so variable and so highly
individualized that, few would disagree, no one method works for all stutterers.
Normally, 2 to 3 months are required to determine suitable techniques for each client.
The effectiveness of each approach depends on the receptiveness of the client. This
thesis explains the development of a computer-based Malay stuttering assessment
The system assists Speech-Language Pathologist (SLP) in determining
suitable therapy techniques for each client. The software was developed based on
fluency shaping techniques used in fluency rehabilitation regimen. Digital Signal
Processing techniques were implemented to analyze speech signals. The maximum
magnitudes of the clients’ and the SLPs’ speech signals, corresponding to the
Average Magnitude Profiles (AMPs), were determined and compared.
maximum magnitude was determined where a total of 15 neighbouring samples were
summed to obtain a maximum value.
Start location, end location, maximum
magnitude and duration were compared between clients’ and SLPs’ AMPs to
generate scoring, the computational analyses help SLP to determine suitable
techniques in a faster way. The software has been developed using Microsoft Visual
C++ 6.0 to run under Window XP. Three therapy techniques were introduced in the
proposed computer-based method.
These techniques consisted of Shadowing,
Metronome and Delayed Auditory Feedback. Ten test subjects were selected from 6
primary schools located at Skudai. Measurements of percent syllables stuttered on
test subjects were made by SLP from Hospital Sultanah Aminah. The experimental
results showed that SLP agreed with the result analyses generated by the software.
Masalah gagap telah menarik perhatian para penyelidik sejak beberapa dekad
yang lalu.
Namun demikian, kerja penyelidikan melalui implementasi sistem
penilaian masalah gagap berdasarkan komputer adalah amat terhad. Pesakit gagap
memberi reaksi yang berlainan terhadap teknik terapi yang berbeza. Satu teknik
terapi yang berkesan terhadap seorang pesakit gagap tidak semestinya memberi
kesan yang sama kepada pesakit yang lain.
Biasanya, dua hingga tiga bulan
diperlukan untuk mengenalpasti teknik terapi yang paling sesuai untuk seseorang
pesakit gagap. Keberkesanan sesuatu teknik bergantung kepada tahap penerimaan
Tesis ini menerangkan proses pembinaan dan implementasi sistem
penilaian gagap berdasarkan komputer. Sistem penilaian ini membantu Patologis
Pertuturan (SLP) dalam mengenalpasti teknik terapi yang paling sesuai bagi pesakit
gagap. Perisian ini direka berdasarkan teknik penajaman kelancaran pertuturan yang
digunakan dalam proses pemulihan pertuturan. Teknik pemprosesan isyarat digital
diaplikasikan untuk menganalisis isyarat pertuturan. Magnitud maksimum isyarat
pertuturan bagi pesakit dan SLP, iaitu "Profil Purata Magnitud" (AMP) dikenalpasti
dan dibandingkan. Magnitud maksimum dikira dengan menambah 15 sampel untuk
mendapatkan satu nilai maksimum.
Titik permulaan, titik akhir, magnitud
maksimum dan tempoh dibandingkan di antara AMP pesakit dan SLP untuk
menghasilkan pemarkahan yang dapat membantu SLP mengenalpasti teknik terapi
yang sesuai dengan lebih cepat. Perisian direka dengan menggunakan Microsoft
Visual C++ 6.0 dalam platfom Window XP.
Tiga teknik terapi dibangunkan
berdasarkan komputer iaitu Shadowing, Metronome, Delayed Auditory Feedback.
Sepuluh subjek ujian dipilih dari enam buah sekolah di Skudai.
peratusan sukukata yang gagap dibuat oleh SLP dari Hospital Sultanah Aminah
terhadap sampel subjek ujikaji.
Hasil ujikaji telah menunjukkan bahawa SLP
bersetuju dengan analisis keputusan perisian.
Speech Waveforms and Sound Spectrograms of a Male
Client Saying “PLoS Biology”
Comparisons between Fluency Shaping and Stuttering
Five Steps to Implementing EBP
System Block Diagram
Flowchart: DC Offset Removal
Common Window Functions
Flowchart: Background Noise Level Detection
Dialog Box: The Loading of Wave Files
Dialog Box: Client Identification
Wave File Information
Speex File Information
Encoding Process
Decoding Process
Flowchart: The Scoring of Start Location
Flowchart: The Scoring of End Location
Flowchart: The Scoring of Maximum Magnitude Comparison 84
Flowchart: The Scoring of Duration Comparison
Shadowing Task
Metronome Task
DAF Task
Detection of Background Noise Level
End Detection of Background Noise Level
Selection of Five Pre-recorded Wave Files
Selection of Text File
Input of User Name
Input of History File and Its Location
The Enabling of Buttons
The AMP of SLP
The AMP of Client superimposed on SLP's AMP
The File Saving of Recorded Utterances
The File Playing of Both SLP and Subject's Utterances
The Display of Fireworks
The Display of the Information of Attempted Utterances
The Scoring Comparison for Start Location Parameter
The Scoring Comparison for End Location Parameter
The Scoring Comparison for Maximum Magnitude Location
The Scoring Comparison for Maximum Magnitude Location
The Average Score of Each Therapy Technique
Communication Attitude Test-Revised (Cat-R)
A-19 Scale for Children Who Stutter
Stuttering Prediction Instrument for Young Children (SPI)
Physician’s Screening Procedure for Children Who May Stutter 148
Background and Motivation
Stuttering has attracted extensive research interests over the past decades.
Stuttering research is exploring ways to improve the diagnosis and treatment of
stuttering as well as to identify its causes. Emphasis is being placed on improving
the ability to determine which children will outgrow their stuttering and which
children will stutter the rest of their lives.
Recent research has been focused on the therapy program such as Lidcombe
Program, Camperdown Program, Prolonged Speech-based Stuttering Treatment and
others. Many classical manual inventories, scales, and procedures have been
developed for assessing the quality of both the surface as well as the deep structure
of stuttering such as Stuttering Severity Instrument (SSI-3), Modified Erickson Scale
of Communication Attitudes (S-24), Perception of Stuttering Inventory (PSI) and so
on. However, far less effort has been done on the computer-based stuttering
assessment system.
Studies [1] indicated that stuttering therapy was eventually helpful in
reaching goals of successful management for person who stutter (PWS). However,
PWS had difficulty in identifying specific therapy techniques to which they could
attribute their success. Findings [2] suggest that a system that can identify suitable
therapy techniques is important because any rational and empirically informed
procedure that enables the PWS to systematically modify speech behaviours and the
associated cognitive features may be likely to successfully facilitate fluency.
Numerous therapy techniques exist to treat stuttering, yet there remains a
paucity of empirically motivated stuttering treatment outcomes research. Despite
repeated calls for increased outcome documentation on stuttering treatment, the
stuttering literature remains characterized by primarily ‘‘assertion-based’’ or
‘‘opinion-based’’ treatments, which by definition are based on unverified treatment
techniques and/or procedures [3].
Conversely, ‘‘evidence-based’’ treatments, based on well-researched and
scientifically validated techniques, remain relatively rare in the field of stuttering and
are usually limited to behavioural and fluency shaping (FS). The objective
assessment of specific stuttering treatment approaches is also important to elucidate
therapy techniques that contribute to desired outcomes. However, identifying the
appropriate therapy techniques manually can be difficult, given the multidimensional
nature of stuttering [4].
A study [5] consisted of 98 children who stutter (CWS) ranged from 9 to 14
years old were divided into four groups:
1. The first group was treated by speech-language pathologist (SLP) in a speech
2. In the second group, the parents were trained to administer the stuttering
therapy to their children, but the children did not see a SLP.
3. In the third group, the children used speech biofeedback computers designed
for treating stuttering. They were not treated by SLPs, and their parents were
not involved.
4. The control group received no therapy.
One year after the therapy program ended:
1. 48% of the children treated by SLPs were fluent.
2. 63% of the children treated by their parents were fluent.
3. 71% of the children treated by computers were fluent.
4. The control group's speech did not improve.
The results showed that computers were the most effective, the parents were
the next most effective, and the SLPs were the least effective. At the 1% disfluency
level, the computers and the parents were about four times more effective than the
SLPs and this encourages the implementation of a computer-based stuttering
assessment system to improve clients’ fluency.
Computer-based stuttering assessment system implements the function of
complicated and expensive acoustic equipment available only at well-equipped
speech-language pathology clinics.
Problem Statement
Why computer-based stuttering assessment system should be implemented to
replace the classical manual assessment approach? What is possible during clinical
session is often determined by treatment variables such as the availability, setting,
and cost of services. The drawbacks and insufficiencies of classical manual
stuttering assessment include:
Time - There were at least 115 therapy techniques that decreased stuttering
markedly [6]. Each of the PWS exhibits a unique response to different therapeutic
approaches. The technique that works so dramatically for one stutterer does not
necessarily work dramatically, or at all, for other PWS. Stuttering is so variable and
so highly individualized that, few would disagree, no one method works for all PWS.
Without computer-based system, SLP has to try every single therapy
technique depending on the needs and response of the client, which may take months
of repeated procedures that are costly and overly generalized. The fact is, the longer
a SLP takes to assess a client, the more tedious and troublesome will be for a client.
For PWS, practice with therapy techniques must take place for many months and
years before the techniques become functional. The uniqueness of each individual
PWS prevents any specific recommendations of therapy techniques from being
universally applicable [7]. Being able to move systematically and persistently
toward distant goals is essential since treatment with PWS takes a considerable
length of time.
Cost - Due to the typical length of treatment and the usual lack of
reimbursement by insurance companies, the cost of successful treatment can quickly
become prohibitive for many clients [8]. The longer the SLP takes to assess a client,
the higher the cost of diagnosis. The cost of clinical session is a major consideration
for nearly every client. Some individuals simply cannot afford professional help
unless the services are covered by insurance (which is not typically the case for
fluency disorders) or are available at a reduced rate. A computer-based stuttering
assessment is capable to reduce the diagnosis duration and thus reduce the costs of
clinical session for client.
Effectiveness - The success of treatment is closely tied to the ability of an
experienced SLP to determine a client’s readiness for change and adjust treatment
techniques accordingly. Thus the utility of the techniques depends on the SLP’s
ability to apply the right technique (s) at the right time. Often, an approach is chosen
because it coincides with the personality of the SLP and her view of reality. The
SLP’s perception of the client should be as accurate as possible. If SLP apply the
wrong therapy technique on a particular client, not only the client will not show any
improvement, the client may need longer time for assessment process.
One of the features of stuttering is that it tends to change with time. This
requires SLP to give full attention to each client’s progress. This is impossible in
sole traditional clinical treatment without the assistance of a computer-based
diagnosis tool. Research [5] indicates that computers were the most effective, the
parents are the next most effective, and the SLPs were the least effective.
Uniformity - The scores from manual measures provide quantitative
information that is very subjective due to different human perception. This
quantitative measurement may vary from one SLP to another which leads to the
scoring inconsistencies. Scoring generated by computer-based system enables SLPs
in different locations to be on “the same page” regarding the general severity,
suitable therapy techniques and the overall characteristics of clients.
Motivation - PWS often finds traditional clinical session to be a very tedious
and undesirable process. Successful intervention also requires continued
commitment and motivation by the client. The problem could be eliminated if SLPs
use a "user-friendly" stuttering relief approach where this could be enhanced by the
use of a computer-based system. An experienced and interesting guide such as
computer-based scoring analyses can show the way or, at the very least, make the
journey more efficient and often more pleasant. It is essential that the child enjoys
the assessment process and finds it to be a positive experience.
Research Objectives
The objectives of the research work are to develop a computer-based Malay
stuttering assessment system and to verify its operation in a real clinical session. The
computer-based assessment system is able to assist SLP in determining suitable
stuttering therapy techniques for each client. The application is capable to reduce
time consumption required to determine suitable therapy technique for each client.
The effectiveness of stuttering assessment process is improved as computerbased approach has been proved to be the most effective in treating PWS. Scoring
generated by the computer-based system ensures the consistency and uniformity
regarding the general stuttering severity of a particular client. Computer-based
system provides interesting guide which motivate client to enjoy the assessment
Scope of Work
The scope of this research work includes the development of a computerbased Malay stuttering assessment system working on Window-based operating
platform. This research work focuses on school-age children because stuttering
should be treated as early as possible, primarily because it becomes less tractable as
children get older. The application is developed and coded to run in the graphic user
interface (GUI) to provide a user-friendly environment. Digital Signal Processing
(DSP) techniques are implemented to analyze speech signals. Software is designed
based on standard speech FS techniques that can be easily incorporated into current
fluency rehabilitation regimen. Three stuttering therapy techniques are introduced in
computer-based. They are Shadowing, Metronome and Delayed Auditory Feedback
(DAF). These three techniques are chosen because they are the most commonly used
therapy techniques in Malaysia based on discussion with SLPs in Malaysian
hospitals. Once the computer-based stuttering assessment system is developed, it is
verified in the real stuttering clients through clinical trial carried out among primary
school's students. Hospital Sultanah Aminah assisted in the clinical trials, as well as
giving professional feedback.
Research Approach
The development process can be broken into six phases as shown in Figure
1.1. During the first phase, hardware and software required for computer-based
stuttering assessment system are identified. Hardware includes microphone,
earphones and desktop equipped with sound card, while software means Operating
System (OS), the development tools and the necessary drivers. Window XP OS is
chosen due to its availability and familiarity. Sound card must be able to work with
Window XP.
Prior to the designing and coding stage, basic principles and strategies of
stuttering assessment systems are analyzed. Both the FS and stuttering modification
(SM) treatment approaches are reviewed and compared. Digital audio processing
theories and principles are studied in detail. Algorithm selection and analysis are
made for the recording and playback procedures such as windowing, and filtering.
Next, the program flowcharts of stuttering assessment operation are
constructed for each module. Coding is done by Microsoft Visual C++ language on
Window-based platform. Clinical trials on control data and test subjects are carried
out to make sure that the system runs properly before its practicality was verified.
Literature Review and Problem Specification
Programming Language Learning (C, C++, MFC)
Algorithm Selection and Analysis
Clincial Trial
Figure 1.1: Research Development Process
Significance of Research Work
Implementing a stuttering assessment in a computer-based system is very
important because the use of computer technology in stuttering assessment is still
new in Malaysia and currently no computer-based assessment tool is available to
assist SLPs in determining suitable therapy technique for each client. Coming up
with a clean implementation not only helps better understanding of the available
stuttering therapy techniques, but also allows extensions to the introduction of more
stuttering therapy techniques in computer-based.
The implementation of computer-based Malay stuttering assessment system
aids SLP to determine suitable therapy techniques for each client during the clinical
assessment process. Each client exhibits a unique response to the treatment
strategies. The uniqueness of each individual PWS prevents any specific
recommendations of therapy techniques from being universally applicable.
Without the use of computer-based assessment system, normally 2 to 3
months are required to determine suitable technique for each client. The computerbased assessment system is capable to reduce the amount of time needed for the
determination of therapy techniques and the possibility of error occurrence in the
manual calculation of percentage of stuttered syllables (SS).
The algorithms involved in the design of assessment procedures and the
introduction of three therapy techniques in computer-based are our original ideas.
Thesis Layout
The content of the thesis is organized as follows. Chapter two describes the
generic characteristics of stuttering assessment system and survey of current
assessment systems used in clinical sessions. Not all systems are mentioned here,
but rather those systems that are intuitively relevant for our case of computer-based
assessment system.
Chapter three depicts the problem formulation and underlying design
principles of computer-based Malay stuttering assessment system, by pointing out
some relevant topics such as the basic stuttering treatment approaches including FS
and SM, and the criteria for selection of scoring parameters.
Clinical trials on control data and test subjects have been carried out to make
sure that the system runs properly before its practicality was verified. The details of
the development of computer-based Malay stuttering assessment system are
presented in Chapter four while Chapter five portrays the results obtained during the
clinical evaluation of the developed assessment system. Chapter six presents the
conclusions of the thesis and identifies some areas for future work.
Many inventories, scales and procedures have been developed for assessing
the quality of both the surface as well as the deep structure of stuttering. Many of
these measures are helpful for obtaining both data-based and criterion-referenced
information for assessing stuttering. However, the variety of surface behaviours and
intrinsic features that come together in stuttering do not always lend themselves to a
realistic or valid analysis. Individual PWS rarely fit all the descriptions, situations,
and categories associated with any single measure [9].
Many of these scales do provide helpful information that will prove useful as
SLPs make decisions concerning the initiation of treatment, the selection of a
treatment strategy, and the phasing out or termination of formal treatment. However,
assessment must go beyond all these procedures. No matter how many formal
measures SLPs administer or what they are able to discover during the initial meeting,
SLPs must recognize that they are viewing the client and his problem through a small
window. As much as with any other human communication disorder, the assessment
of fluency disorders including stuttering is an ongoing process [8].
The process of assessment – while most intense during the initial stages of
treatment – continues through treatment and into the post treatment period, when the
client may experience relapse. In order to continue making good clinical decisions,
the SLP must continue to obtain data concerning all aspects of the syndrome. The
continual process could be enhanced by a computer-based stuttering assessment
Features of Stuttering Assessment
At the initial assessment of a child thought to be stuttering, a SLP ought to be
thinking about two related judgments: Is the child actually stuttering and, if so, what
is the prognosis for recovery? When young children are diagnosed, speech patterns
are usually evaluated and compared to the normal fluency of their age group [10].
SLPs also take into consideration developmental and emotional factors that can
disrupt a child’s speech.
To start with stuttering assessment, valid, numeric measures of the factors
believed to be relevant are needed. A validated measurement does not necessarily
mean that it will have predictive value. To establish this, statistical models for
diagnosis and prognosis need constructing. If diagnosis and treatment prognosis are
each determined by several factors, they can be represented by a multivariate
equation of the form [11].
Using Rustin's assessment instrument [12] as a guide as well as a
comprehensive review of the research and clinical literature, ten factors that might
have relevance to diagnosis and prognosis with CWS were identified. These factors
are: (1) Parental attitude. (2) Social skills. (3) Family history of stuttering. (4)
Cerebral dominance. (5) Language development. (6) Client attitude. (7) General
health. (8) Motor skills. (9) Auditory skills. (10) Speech scores. A survey of SLPs
and researchers in the UK, mainland Europe and the USA indicated that, currently,
the factors are thought to be comprehensive [11]. Post hoc application of such
models, on validated numeric measures, will be able to establish exactly which
factors can be included as predictors and what item scores should be used to assess
these factors.
Many SLPs make a wide-ranging, multi-faceted assessment of a child [13].
A fundamental decision is what factors to be included in this assessment. Many
authors err on the side of caution and attempt to be as comprehensive as possible so
as to ensure nothing of potential value is missed. Another reason for developing
such extensive assessment instruments is that they are intended to be used for other
purposes too.
Due to the complication of stuttering assessment process, computer-based
stuttering assessment system could contribute by making the ongoing assessment
process easier for SLP where SLP just needs to look at the scoring generated in order
to evaluate the client's progress. As stated earlier, there are many consideration
factors in developing a stuttering assessment including parental attitude and family
history of stuttering. Therefore, computer-based system alone could not help to
assess a client completely, it also requires the SLP's personal assessment experience
and the information provided by client's family [14].
Problems of Classical Manual Assessment System
The research literature is riddled with discussion that factors involved in
assessments lack reliability and validity. Research [11] has repeatedly warned that
certain manual stuttering assessment tools are unreliable. This is because manual
assessment totally depends on the SLP's perception on each client's performance. To
date, relatively little work has been addressed at establishing whether there is validity
or accuracy in the information collected manually.
It is necessary to measure the incidence of stuttering in samples of speech to
aid SLPs deciding who to treat (diagnosis), to assess what changes in speech occur
after clients have been treated (treatment outcome) and to help establish which
individuals are likely to be treated most successfully (prognosis). Unfortunately, it
has been found that the manual measuring techniques that are traditionally used in
clinics result in variable estimates of stuttering incidence being made by different
judges on the same samples of speech [15]. Validity of a questionnaire or
performance test to assess is the main factor why SLP should not depend solely on
classical manual stuttering assessment tool.
Besides this, classical manual stuttering assessment tools used to assess
stuttered speech are time consuming. Short assessments are preferred over lengthy
ones because time spent by client in the clinic is always limited. The problem could
be ameliorated if reliable computer-based classification schemes are available.
Ideally what is required for each factor is an off-the-shelf, good, easy to use
instrument that will elicit scores that can be utilised in a multivariate equation to
establish if that factor contributes to diagnosis and/or to the prediction of therapy
outcome. An example of how such an instrument can been constructed is a
computer-based stuttering assessment system [16].
Computer-based Assessment System
Currently, no computer-based assessment tool exists to help SLP in
determining suitable stuttering therapy technique for each stutterer but DAF devices
are available to facilitate speech fluency that are sometimes recommended for
permanent use. In DAF, the person speaks into a microphone connected to a DAF
unit. This unit plays the speech back to the individual through headphones. The "fed
back" speech is delayed so that the individual experiences an echo while speaking.
When most stutterers talk under DAF, they tend to slow speech down and increase
the loudness of voice. The auditory feedback causes the speakers to alter their
speech rate, effort, or rhythm [8].
It is clear that the fluency effects of the SpeechEasy and Fluency Master are
not simple or uniform [3]. Many more evidenced-based empirically and
scientifically controlled studies are needed. There is no proof of long term
effectiveness. Furthermore, the device is expensive and the price ranges from USD
3500 to USD 5000. This will discourage most of the PWS to possess. However, the
concept of DAF remains a viable and important clinical tool in the total management
of stuttering and it continues to be a valuable clinical adjunct to a total treatment
program for an important number of PWS. DAF concept can be easily implemented
by a computer-based stuttering assessment tool.
A computer-based cluttering assessment system has been developed.
Cluttering or tachyphemia is a disorder of both speech and language processing that
frequently results in rapid, disrhythmic, sporadic, unorganized, and often
unintelligible speech. Cluttering is a more general confusion of speech than
stuttering. While the repetition that is characteristic of stuttering may be present, it
need not fall on opening words and key words in a sentence. In cluttering, meaning
may also be interrupted by irrelevant terms and by using multiple phrases to describe
the same thing [16].
This computer-based cluttering program is a dual-event counter/timer
designed for the assessment of cluttering severity. It helps to determine how often
one clutters, and how much one's speech is affected by cluttering. The main option
provided by the cluttering program involves online quantified assessment such as
percentage talking time cluttered. The second option collects a user's perceptual
ratings (visual analogue scale response format) with regard to features thought to be
relevant for a qualitative description of cluttering severity of one's client [16].
The cluttering software does not allow the saving of audio file or result upon
the completion of practice. It only allows the print out of numerical results.
Stuttering assessment is an ongoing process. The program should be able to generate
a history file summarizing the client attempts including the total number of times
each utterance was practiced. SLP can use this information to determine suitable
therapy techniques for each client. Moreover, it enables SLP to assess or monitor
client progress and observe how much time client spends to practice. Based on
client’s progress record, the supervising SLP is well-informed on which displayed
signals and measured parameters would be useful for improving the speech
rehabilitation process. These features would enable successful future therapy
The cluttering program does not provide the function of displaying speech
amplitude on screen. Speech amplitude should be displayed for stuttering client to
copy as closely as possible and to convey to the client those locations where the
client's spoken utterance differed from the SLP’s signal in the aspect of amplitude,
duration, onset, and end location. Real-time audio and visual modes give immediate
feedbacks to the client’s performance relative to the goal, and allow the client to
anticipate what amplitude or rate change that is needed to reach the goal. The client
can then, if necessary, alter their speech as required to closely match the SLP’s
No therapy technique or reward is implemented in the above computer-based
cluttering system. Three stuttering therapy techniques are implemented in present
work and rewards of fireworks display and applause are implemented for clients who
managed to obtain scores of 80 and above.
The Variability of Fluency
The level of fluency varies widely across time and location. Most of the time
human being speaks nearly automatically, with words flowing smoothly and
effortlessly. Normally, no attention is given to the manner of communicating. On
other occasions, particularly during communicative or emotional stress, the
smoothness begins to disappear and breaks in speech occur – or at least they become
more obvious.
Although fluency varies for all clients, its variability is even more
pronounced for PWS. In most instances, a stutterer is more likely than the normally
fluent client to react sooner and to a greater degree to fluency-disrupting stimuli such
as time pressure and difficult communication situations. At the other extreme, PWS
are sometimes able to “turn on their fluency”. By avoiding feared sounds and words
or – with heightened energy and emotion – momentarily “rising to the occasion”,
clients who typically stutter are able to become uncharacteristically fluent.
The variability of stuttering behaviours is one of the facts about stuttering and
something that contributes significantly to the mystery of the disorder. It is difficult
for listeners to understand how a client can be speaking fluently one moment and a
word or two later struggle dramatically as they attempt to do something as common
as saying their own name [17]. The variability of stuttering behaviour also makes it
difficult for listeners to become accustomed to a client, for it is not always possible
to predict whether or not a person will stutter. Such variability also presents a
predicament for the person doing the stuttering, that is, the stutterer cannot always be
certain of the amount and degree of difficulty he will have in any given speaking
situation. It is difficult for the client to compensate for a problem that is so
Of the many communication handicaps that people may suffer, perhaps none
is more variable than stuttering [8]. The inconsistent nature of stuttering requires
that the assessment of stuttering be an ongoing process that takes place over several
assessment or treatment sessions [18]. In other words, stuttering assessment is a
complicated process, without the aid of computer-based tool, the assessment process
is getting far more complex and difficult than it may first appear.
Definition of Stuttering
Stuttering is a disorder affecting the fluency of speech. It usually starts in the
third and fourth years of life, after a period of apparently normal speech development.
Prevalence rates of stuttering are approximately 1% but can be as high as 5% during
childhood. The onset of stuttering usually occurs between the ages of 2 and 6 and is
more common in males. The male to female gender ratio is 3 to 1 during childhood
and increases to 5 to 1 in adulthood. Although a specific cause for stuttering is not
known, it is believed to be a heritable disorder [19].
Stuttering occurs when the forward flow of speech is interrupted abnormally
by repetitions or prolongations of a sound, syllable, or articulatory posture, or by
avoidance and struggle behaviours. The term “stuttering” refers to the
developmental condition, is a disorder in the rhythm of speech in which the
individual knows precisely what he wishes to say but at the time is unable to say
because of an involuntary repetition, prolongation, or cessation of a sound” [20].
Stuttering is the intermittent impairment of fluency and speech rate, pitch,
loudness, inflectional patterns, articulation, facial expression and postural
adjustments of the client in the absence of word finding problems, speech motor
disorders or voice problems. It is also known as stammering in Britain. The term
stuttering will be used herewith. The earliest age of onset of stuttering is about 18
months (when speech emerges) and the latest age of onset vary from 7-13 years of
age. The incidence of stuttering (which is the approximate percentage of the
population who have stuttered at any time in their lives) can be as high as about 10%
Stuttering is a disruption in the fluency of verbal expression characterized by
involuntary, audible or silent, repetitions or prolongations of sounds or syllables as
shown in Figure 2.1. These are not readily controllable and may be accompanied by
other movements and by emotions of negative nature such as fear, embarrassment, or
irritation [22]. Strictly speaking, stuttering is a symptom, not a disease, but the term
stuttering usually refers to both the disorder and symptom.
Figure 2.1: Speech Waveforms and Sound Spectrograms of a Male Client Saying
“PLoS Biology”
The left column shows speech waveforms (amplitude as a function of time);
the right column shows a time–frequency plot using a wavelet decomposition of
these data. In the top row, speech is fluent; in the bottom row, stuttering typical
repetitions occur at the “B” in “Biology.” Four repetitions can be clearly identified
(arrows) in the spectrogram (lower right).
Successful stuttering therapy cannot be equated with complete fluency or a
"cure" for stuttering. Because stuttering is not a disease caused by a virus, bacteria,
or physical injury, there is no cure. Stuttering is a unique and specific developmental
and psycho-neurological condition involving a physical predisposition that is
triggered and neurologically conditioned by internal reactions to an extended cycle of
repeated performance failures, environmental stressors, and psychological injury.
The extended duration of the neurological conditioning involved in the
development of stuttering (from several months to many decades) places it in the
category of disorders (including post traumatic stress disorder) that can be
extinguished to varying degrees, but never totally erased. Some people (particularly
pre-school children) may be able to totally recover their original fluency. Other
people may experience relatively little change in fluency, but achieve an increased
ease of communication. Others may attain a remarkable degree of fluency and
communicative ease in many or most situations [23].
Characteristics of Stuttering
Stuttering can be mild, moderate or severe, and can even vary within the
same individual from one day to the next, particularly with children.
The fact that stuttering tends to run in families indicates that genetics is
involved somehow in the condition. Studies of stuttering in twins have also found
that both twins are more likely to stutter if they are identical rather than fraternal.
Adults who stutter (AWS) often fail to achieve full occupational potential and often
experience significant anxiety in social situations [24].
Stuttering is graded by its degree of severity. Most researchers manually rate
stuttering by the percentage of stuttered syllables (%SS). While the child speaks, the
researcher counts all the stuttered and non-stuttered syllables. One classification
method is [24]:
Mild– below five per cent of SS.
Mild to moderate – 5 to 10 per cent of SS.
Moderate – 10 to 15 per cent of SS.
Moderate to severe – 15 to 20 per cent of SS.
Severe – above 20 per cent of SS.
Research [25] has suggested that if a person stutters on more than 2% of
syllables spoken they should be designated as stuttering, Evesham and Fransella
stated that 2%SS and a speech rate of less than 130 syllables per minute should be
regarded as stuttered speech. In contrast, Boberg indicated that 2-4%SS should only
be regarded as marginal stuttering and speech should containing 4%SS or more
before it could be described as stuttering behaviour [25].
Attempts have also been made to come up with a specification about the
proportion of dysfluency of different types. Research [25] has suggested that if
prolongations exceed 25% of total dysfluencies, this defined stuttering behaviour.
Also, it is proposed that the presence of errors of articulation fulfilled the same role.
Furthermore, it is stated that the probability of chronic stuttering increases with the
amount of effort that a child puts into speech production.
When the behaviours of a stutterer are infrequent, brief, and are not
accompanied by substantial avoidance behaviour, the stutterer is usually classified as
a mild or a non-chronic stutterer. Non-chronic stuttering is often called "situational
stuttering" because the afflicted person generally has difficulty speaking only in
isolated situations—usually during public speaking or other stressful activities—and
outside of these situations the person generally does not stutter.
When the behaviours are frequent, long in duration, or when there are visible
signs of struggle and avoidance behaviour, the stutterer is classified as a severe or
chronic stutterer. Unlike mild or situational stuttering, chronic stuttering presents in
most situations, but can be either exacerbated or eased depending on different
conditions. Severe PWS often, but not always, are accompanied by strong feelings
and emotions in reaction to the problem such as anxiety, shame, fear, or self-hatred.
This is usually less obvious in mild PWS and serves as another criterion by which to
define PWS as mild or severe.
It is worth noting that the severity of a stutterer is not constant and that PWS
often go through weeks or months of substantially increased or decreased fluency.
PWS universally report having "good days" and "bad days" and report dramatically
increased or decreased fluency in specific situations.
There are some behaviours that occur during moments of stuttering that can
be seen and/or heard as elaborated below. Some of these also occur in the speech of
normal speakers, though they may differ qualitatively from those present during
moments of stuttering.
Interjections are extra sounds, syllables, or words that add no meaning to the
message. Probably the most common interjections are "uh" and "um" ("The uh baby
ate the soup" or "The baby um ate the soup.") Words or phrases such as "well,"
"like," and "you know" are considered to be interjections. These sounds units, which
occur between words, usually do not perform a linguistic function in messages – that
is to say, the denotative meanings of messages usually are not affected by their
presence. They may be accompanied by audible and/or visible signs of tensing, may
be voluntary or involuntary, and may vary with how aware the client is of their
occurrence [26].
Revision happens when children frequently revise what they have just said.
They may stop in midstream and start over in a new direction. Revisions may be in
pronunciation ("The bady-baby ate the soup"); grammar ("The baby eated-ate the
soup"); or word choice ("The daddy-The baby ate the soup."). A child also may go
back to add a word ("The baby-The hungry baby ate the soup.").
Mistiming occurs when words are mistimed when spoken. Sounds or
syllables may be prolonged ("The baby ate the s-s-soup" or "The baaaby ate the
soup."). There could also be a break in the word ("The ba-by ate the soup.").
Varying amounts of tension in the speech muscles (lips, tongue, vocal cords.) may
accompany these mistimed words. Sometimes, the voice sounds strained or the
coordination of breathing and speaking breaks down.
Part-word (syllable) repetitions are sound and syllable repetitions. They
occur most often at the beginnings of words and almost never at the ends of the
words. Though the number of times a particular sound or syllable is repeated can be
relatively high, it is usually once or twice [27]. The repetitions may be accompanied
by audible and/ or visible signs of tensing. Level of awareness of their occurrence
can be relatively high or relatively low. The repetitions may be voluntary though
they usually are involuntary.
Word repetitions are repetitions of an entire word. In most cases, it is a
single-syllable word. While a word may be repeated a relatively large number of
times, it is usually repeated only once or twice [28]. These repetitions, like part-
word ones, maybe accompanied by audible and/or visible signs of tensing, may be
voluntary or involuntary, and may vary with how aware the client is of their
Phrase repetitions are repetitions of units consisting of two or more words.
Such units usually are repeated only once or twice. They may be accompanied by
audible and/or visible signs of tensing, may be voluntary or involuntary, and may
vary with how aware the client is of their occurrence.
Incomplete phrases include instances in which the client becomes aware of
making an error and corrects it. The error may be in how a word was pronounced or
it may be related to the meaning of the word(s) that were said. Also included are
instances in which the client begins an utterance, but obviously does not complete it.
Disrhythmic phonations are disturbances in the normal rhythm of words. The
disturbance may be attributable to a prolonged sound, an accent or timing that is
notably unusual, an improper stress, a break (usually between syllables), or any other
speaking behaviour not compatible with fluent speech. Included here are phenomena
that some investigators have referred to as “broken” words. Disrhythmic phonations
may be accompanied by audible and/or visible signs of tensing, may be voluntary or
involuntary, and may vary with how aware the client is of their occurrence.
Tense pauses are phenomena that occur between words, part words, and
interjections. They consist of pauses in which there are barely audible manifestations
of heavy breathing or muscle tightening. The same phenomena within a word would
place the word in the category of disrhythmic phonations. Tense pauses vary with
how aware the client of their occurrence.
Physical behaviours are referred to as secondary behaviours as these are
acquired as the client strives to live, adapt and cope with their stuttering. Secondary
stuttering behaviours are unrelated to speech production. The following may be
observed in PWS. The observed associated behaviours are such things as facial
grimaces, eye blinks, lip tremors, or head jerks. Secondary behaviours are elaborated
in the following.
To hide stuttering, PWS may use another word, or put off saying a certain
word, until the stuttering feeling goes away. They may even avoid certain speaking
situations where they think they might stutter, such as going to a party with friends or
ordering a meal at a restaurant.
PWS may have a feeling of tightness in some parts of their face or body, such
as jaw, cheeks, lips, forehead and upper chest. PWS may move their head forward or
back; move their arm, leg or hand; close or blink eyes or move other parts of body in
an effort to help them get the sounds out. When PWS expect to stutter, they may
hold their breath, take several breaths or show other types of unusual breathing
There are many stuttering characteristics as elaborated above. Some may
exist in a particular individual while the others may not. A computer-based
assessment tool enables the repeated playback and display of audio files where SLP
could listen or observe the client's speech sample in unlimited times. This could help
SLP in determining suitable therapy technique for each client during the assessment
process in a faster and more informed way.
Types of Stuttering
There are three types of stuttering. The most common form of stuttering is
thought to be developmental, that is, it is occurring in children between the ages of 2
and 6 who are in the process of developing speech and language. This relaxed type
of stuttering occurs when a child's speech and language abilities are unable to meet
his or her verbal demands. Stuttering happens when the child searches for the correct
word. Developmental stuttering is usually outgrown [2, 29].
Another common form of stuttering is neurogenic. Neurogenic disorders
arise from signal problems between the brain and nerves or muscles. In neurogenic
stuttering, the brain is unable to coordinate adequately the different components of
the speech mechanism. Neurogenic stuttering may also occur following a stroke or
other type of brain injury. Neurogenic stuttering is caused by damage in the brain.
The damage in the brain will cause signal problems between nerves in the brain or to
muscles and the brain is not able to correctly direct the speech signals. Neurogenic
stuttering is most common in stroke patients and trauma to the brain [2].
Other forms of stuttering are classified as psychogenic or originating in the
mind or mental activity of the brain such as thought and reasoning. Whereas at one
time the major cause of stuttering was thought to be psychogenic, this type of
stuttering is now known to account for only a minority of the PWS. Although PWS
may develop emotional problems such as fear of meeting new people or speaking on
the telephone, these problems often result from stuttering rather than causing the
Stuttering cannot be permanently cured. However, it may go into remission
for a time, or clients can learn to shape their speech into fluent speech with FS
approach, the most commonly used technique of facilitating speech fluency. Current
work incorporates FS approach with DSP algorithms to analyze speech signal.
Basic Considerations When Assessing Young Children
The approach for assessing children who may be in the early stages of
stuttering evolved from the Riley’s clinical observations of sub-groupings of children
based on risk factors for stuttering onsets and development [30].
There are, of course, many salient distinctions between intervention with
children and with older clients [31]. Fluency is often variable with young children, a
fact that makes both assessment and therapeutic progress somewhat more difficult to
track for this age group. There is always the question of how much behavioural
change is due to treatment and how much is due to the natural variability of the
behaviour [32]. Other important differences when working with younger clients are
the following [33]:
Children are functioning with neurophysiological systems that are far from
adult-like and are still in the process of maturation.
Depending on the child’s level of awareness and reaction to the stuttering
experience, the SLP may select therapy techniques that are less direct than
those used with adults.
Parents and a variety of other professionals, and particularly the child’s
classroom teacher, play essential roles in the assessment process.
The SLP will more likely place greater emphasis on the evaluation and
possible treatment of the child’s other communication abilities, including
language, phonological, and voice. On occasion, some children will also
present with a variety of other learning or behavioural problems.
The likelihood of achieving spontaneous or automatic fluency is much greater
for young children than for adults.
There tends to be somewhat less effort needed for helping the child to transfer
and maintain treatment gains into extra-treatment environments.
Relapse following formal treatment is not usually a serious problem, as it is
with adult clients.
An overemphasis on fluent speech as the only goal of treatment can easily
lead to the child trying hard not to stutter, something he is already doing. During the
assessment of CWS, there will be occasions when the particular behaviours the SLP
wants to observe and evaluate are not present. For example, some children will
never speak if there is someone present who is not a member of their immediate
household. Although this also may occur during the assessment of AWS, it is more
often in the case with children. On the day and time of evaluation, the child may fail
to exhibit the behaviours that concern the parents or the teachers [34].
In some instances, despite the SLPs' best attempts within several speaking
situations and environments, SLPs are unable to obtain samples of the fluency breaks
that the child is apparently producing at home. It is usually possible to reschedule
another assessment during a time when the child is experiencing more difficulty, or
SLPs may observe the child in a more natural setting at home or in school. An
alternative is to ask the parents to make an audio- or videotape of the child at home
as he is experiencing the fluency breaks they are concerned about. The concept of
audio and visual recordings can be easily implemented in computer-based
assessment tool.
Formal Measures of Assessment System
Despite the problems associated with analysis, speech is a dominant factor as
shown by the fact it is included in all the assessments mentioned at the outset of this
section and offers the possibility of an objective measure. Some attempts have been
made by researchers and SLPs seeking a diagnosis of stuttering using questionnaires
and scales measuring a variety of factors to enhance a client's assessment.
Erickson, for example, submitted that PWS differ from non-stutterers in their
attitudes to communication and that as a function of such differences, the responses
of PWS to inventory items about inter-personal communication would differ from
the responses of non-stutterers [8]. These formal measures could only be used to
assess whether a person is having stuttering or not. However, they are not designed
for determining suitable therapy technique for each client as in current work.
There are a number of assessment devices that the SLP may use to obtain a
formal measure of the nature and severity of stuttering. In the following section,
some of the assessment instruments that seem to be particularly useful are described.
Stuttering Severity Instrument (SSI-3)
It is perhaps the mostly used of all scales for determining stuttering severity.
It was designed by Glyndon D. Riley [8] for both children and adults. The newest
edition provides scale values for stuttering severity for PWS. PWS who can read are
asked to describe their job or school and read a short passage. Non-readers are given
a picture task to which they respond. Scoring is accomplished across three areas.
The frequency of the fluency breaks is tabulated and the percentage of
stuttering is converted to a task score. The duration of the three longest stuttering
moments is tabulated and converted to a scale score. Lastly, physical concomitant
across four categories is scaled on a 0-to-5 scale and totalled. The total overall score
is computed by adding the scores for three sections. The scale is attractive because it
can be used with virtually all age ranges and is easy to administer and score [35].
Reading task is slightly different from shadowing task where texts are not
shown to client in shadowing task. The drawback of the scale is that there is no
guideline to determine the score for each physical concomitant. It is totally based on
SLP's perception. SLP has to calculate the task score manually which may take lots
of time.
Modified Erickson Scale of Communication Attitudes (S-24)
This popular and easy-to-administer scale has been used in many clinical
studies and it was designed by Erickson [36]. PWS respond to a series of 24
true/false statements according to whether the statements are characteristics of
themselves. The total score is obtained by tabulating one point for each item
answered by the PWS [37].
This scale is human perception oriented where it may be inaccurate if a client
does not answer the questions correctly or if a client is not sure of his own
characteristic. This scale is designed to differentiate between a stutterer and nonstutterer and it cannot be used in determining suitable therapy technique.
Perception of Stuttering Inventory (PSI)
PSI was designed by Woolf [8] to determine the client’s self-rating of his
degree of avoidance, struggle and expectancy. The subject responds to each of
60 statements according to whether or not he feels they are “characteristic of me”.
Statements that the person feels are not characteristic are left unmarked.
This scale is similar to scale S-24 where client needs to do self-rating. The
drawbacks are the same as stated in Section 2.4.2.
Locus of Control Behaviour (LCB)
This scaling procedure was used by Craig et. al [36] to indicate the ability of
a person for taking responsibility for maintaining new or desired behaviours.
Subjects are asked to indicate their agreement or disagreement to each of the 17
statements about personal beliefs using a six-point scale. The scale has good internal
reliability and scores are not influenced by age, gender or social desirability of
responses. The scores of the 17 statements are summed to yield a total LCB score
Since all forms of intervention for stuttering in one way or another ask the
client to gradually assume responsibility for changing his speech, the LCB concept is
intuitively appealing. However, this scale is more to a way for SLP to understand the
client's feeling towards his stuttering rather than a method to assess client.
Crowe’s Protocols
This protocol was designed by T. Crowe [39] and it provides a three- or
seven-point scaling procedure, sections of the protocol provides forms for obtaining
case history and cultural information as well as client self-assessment.
Other components include assessment of affective, behavioural and cognitive
features; speech status, stimulability and measures of severity. Several sections and
forms are designed to provide information for counselling during treatment. Forms
are designed to be completed by the client or by the SLP through respondent
This scale is a combination of both human perception and technical oriented
approaches where it consists of self-rating and also the measurement of stuttering
Communication Attitude Test-Revised (CAT-R)
This self-administered measure asks the child to respond to 35 true/false
statements. One point is scored for each response that is similar to the manner a
child who stutters would respond [40].
A-19 Scale for Children Who Stutter
Once a secure and trusting relationship is established between SLP and client,
this 19-item scale helps to distinguish between CWS and those who do not. It is
designed by Andre et. al [41]. Once the SLP is assured that the child understands the
task, the scale is administered by the SLP, who asks the child a series of questions
concerning speech and related general attitude. One point is assigned for each
question that is answered as a CWS might respond [42].
Both questionnaires, the A-19 Scale for Children who Stutter and the CAT-R
have been reported as useful in determining whether negative feelings about
stuttering are affecting a child or not. Both questionnaires are composed of
statements that the child reads and their recording of a true or false response. A
distinct advantage to the CAT-R over the A-19 is that there are norms that guide
SLPs in interpreting a child’s score. The CAT-R consists of 35 statements; for each
response the child gives that matches the answer key, one point is assigned. The
higher the score on the CAT-R, the more negative the child’s emotion is regarding
talking and stuttering.
Stuttering Therapy Techniques
There are many therapeutic techniques that have been shown to help PWS
[43]. Each technique can provide valuable information for the SLP and the clients
and almost any technique helps to reduce stuttering for at least one person.
Sometimes simply having the client tell his story and understand the basic dynamics
of speaking or stuttering more easily, along with decreasing patterns of avoidance, is
enough to bring about progress [44].
SLPs may recommend direct therapy with young children. The target speech
behaviours are similar to FS therapy, but various toys and games are used. For
example, a turtle hand puppet may be used to train the slow speech with stretched
syllables goal. When the child speaks slowly, the turtle slowly walks along. But,
when the child talks too fast, the turtle retreats into its shell [45].
There are many stuttering therapy techniques and only the most favoured
ones are elaborated in the following sub-sections such as shadowing, metronome,
DAF, rate control, regulated breathing (RB), easy onset, counselling and prolonged
speech (PS). Shadowing, Metronome and DAF are chosen to be introduced in
computer-based because they are among the most commonly used stuttering therapy
techniques and they can be implemented in FS approach. They are proven to be
effective in reducing stuttering severity in past research as stated in the following
sub-sections. Minimal or no published treatment efficacy data are available for other
techniques such as counselling.
Shadowing is a stuttering treatment technique where the client repeats
(shadows) everything the SLP reads from a book, where client stays a few words
behind the SLP without seeing the text. Shadowing is the spontaneous speech
equivalent of reading in chorus. A variation of shadowing is speaking in chorus with
one’s echo. Some PWS become more fluent while shadowing, or concurrently
repeating another person’s speech. The typical effect is to reduce the frequency of
stuttering [20].
Finding [46] shows that shadowing will produce not only stutter-free and
natural sounding speech but also reliable reductions in speech effort. However, these
reductions do not reach effort levels equivalent to those achieved by normally fluent
clients, thereby conditioning its use as a standard of achievable normal fluency by
persistent stuttering clients.
2.5.2 Metronome
Metronome-paced speech is a speech that is regulated by the beats of a
metronome, a form of treatment used for stuttering. Syllables or word initiations
may be regulated where it may be used to slow down or accelerate the rate of speech.
Immediate effects of reduced or eliminated stuttering have been documented.
Many PWS stutter less frequently when they pace their speech with a metronome –
one word or syllable per beat [20]. The metronome beat can be delivered auditorily,
visually, tactilely, or by some combination of these senses. The client is told to pace
his or her speech while reading aloud or doing a spontaneous speech task with the
beats of a metronome. This will cause PWS to concentrate more on how they are
speaking and thus reduce their speaking rate. This technique has been used clinically
for several centuries.
Delayed Auditory Feedback (DAF)
A technological aid that has been effective in fluency training is the use of
DAF, in which the client hears an echo of his own speech sounds. For some reason,
this disruption, which would make it harder for most people to speak, tends to
produce fluent speech in PWS.
The purpose of DAF techniques is to help PWS focus on the proprioceptive
feel of fluency and away from the sound of his new speech pattern. Once the client
has gradually learned to maintain improved fluency under the distorted feedback, the
delay intervals are varied in the direction of instantaneous or normal feedback, and
the client learns to speak without DAF as he continues to use the slow speech along
with an emphasis on proprioceptive feedback.
Most PWS do so less severely while speaking under condition of DAF in
which there is a 250-millisecond delay [19]. While doing so, their speaking rate
tends to be slower than usual, and they tend to prolong sounds and syllables. The
most typical effect for a person who hears his own speech after a delay is to slow
down the rate of speech. DAF is a widely used stuttering treatment technique and it
is a component in many programmed or comprehensive treatment approaches.
For a more severe client, the SLP may use DAF initially to help the child
experience some fluency in speaking. When used as the primary therapy technique,
DAF helps the child learn "a new way of talking." Under DAF, the child tends to
stretch the syllables in words and speak in shorter phrases and sentences. In therapy,
the child works to keep speech fluent as the DAF effect is reduced step by step.
When the child is able to use the new pattern in various situations, the SLP helps the
child change to a more natural-sounding pattern [47].
Rate Control
In rate control therapies, PWS are trained to speak at a slow rate by
deliberately and consciously prolonging syllables. Rate control procedures may also
include other “ingredients” such as continuous vocalization, soft articulatory contacts
and gentle voice onset.
Typically, in the initial stages of treatment, rate is slowed to less than half the
normal rate of speech (and highly exaggerated continuous vocalization, soft
articulatory contacts and gentle voice onset are used if they are incorporated into
therapy) to eliminate nearly all stuttering. Such changes in articulation and prosody
can be achieved without having to continuously and closely monitor one’s speech
because the changes are predictable, global and quite large. Thus, the initial stages
of rate control therapies utilize a robust form of motorically driven speech
construction to reduce stuttering and this is generally successful.
Speech rate varies throughout an utterance. An important aspect of the
account is that slowing needs to occur in local regions of speech (such as at the
points where planning gets out of synchrony with execution). Global speech rate
measures that have invariably been used in studies showing an association between
speech rate and fluency, are relatively crude. Depending on the way that speech rate
is slowed, it may or may not enhance fluency. For instance, speech rate can be
reduced by slowing down all the speech proportionately or just by decreasing the rate
of the slowest stretches. If there is a proportionate decrease of all stretches, this
would reduce rate on the problematic fast stretches and fluency should increase [48].
2.5.5 Regulated Breathing (RB)
Regulated use of airflow used in the treatment of stuttering is called regulated
breathing. It is effective in inducing stutter-free speech and it is often combined with
other treatment targets including gentle phonatory onset and prolonged speech.
RB is a behavioural treatment for stuttering designed to address airflow
irregularities by teaching breathing patterns that are incompatible with stuttering.
RB consists of several different treatment components, including awareness training,
relaxation, competing response training, motivation training and generalization
training [19].
Easy Onset
Easy onset is a technique that emphasizes proper timing of, and tension in,
the vocal cords. The person learns to start air flowing between the vocal cords and to
bring the vocal cords together easily before starting a word. Sometimes, the
individual is taught to "phonate" shortly before speaking. Also, the person learns to
keep air flowing smoothly through the throat and mouth by using "light contacts."
For light contacts, the person brings the lips, tongue, teeth, and roof of the mouth
together with less effort when saying speech sounds. Pushing the lips together very
hard on the "p" in "pill" means that air cannot flow through to say the rest of the
word [19].
Counselling is a collection of varied approaches to treating stuttering by
giving information, advice and strategies to deal with the problem. There is a range
of counselling techniques and most of them are psychologically oriented. The
recipients are parents of CWS or AWS. It is often combined with direct methods of
treating stuttering. The efficacy of counselling when used exclusively with no direct
work with stuttering by either the SLP or the parent is not established. When
combined with direct work on stuttering, whether counselling had any effect is
unclear [49].
Prolonged Speech (PS)
PS is speech produced with extended duration of speech sounds, especially
vowels, and particularly those in the initial position of words. It is a target behaviour
in stuttering treatment and it is not a treatment procedure but it induces stutter-free
speech which results in fluency that sounds unnatural and socially unacceptable. PS
is often combined with airflow management and gentle phonatory onset. It is a
common component in many contemporary stuttering treatment programs supported
by clinical evidence [50, 51].
PS pattern is similar to a traditional rate control therapy. The latest form of
PS is found in the Speecheasy device [52] which is a high-tech reincarnation of
earlier equipment now worn entirely in the ears of the PWS, commonly adults.
The basic characteristics and formal measures of stuttering assessment system
are outlined in this chapter. The implementation designs of stuttering assessment
system available publicly at the moment has been described in details. These
provide a better understanding and fundamental knowledge of the stuttering
assessment procedures. The computer-based Malay stuttering assessment system is
an attempt to improve the performance of classical manual assessment approaches.
The stuttering characteristics and its therapy techniques have also been elaborated.
The following chapter delineates the framework design of our stuttering assessment
The previous chapter outlines the problems of classical manual assessment
system. This chapter describes the support structure of the organization and
development of computer-based stuttering assessment system.
This chapter outlines the problem formulation and design principles of
stuttering assessment. Section 3.2 and Section 3.3 describe the basic principles of
assessment requirement and variables in choosing therapy techniques respectively.
Section 3.4 details the basic stuttering treatment approaches. Section 3.5 elaborates
the problem formulation. The underlying design principles are detailed in Section
3.6. Section 3.7 shows the criteria for selection of scoring parameters. Finally,
Section 3.8 concludes Chapter 3.
Assessment Requirement: Principles and Strategies
The main objective for undertaking an assessment is to determine whether the
amount of repeating (or being disfluent in other ways) that a client is doing is
abnormal. The features of stuttering assessment has been described in Section 2.1.1.
The foundation of assessment is the interplay between the client and the SLPs as they
work together through the stages of treatment. Clients are reassessed periodically to
determine whether there is any change in the set of behaviours that define their
While it is important for both adults and children to have speech disruptions
assessed for evidence of stuttering, this is particularly crucial for children in the
critical speech development years of 2 to 4. The consensus now is that stuttering
should be treated as early as possible, primarily because it becomes less tractable as
children get older. This is presumably because neural plasticity decreases with age.
Early assessment is therefore essential. Once stuttering becomes chronic,
communication can be severely impaired, with devastating social, emotional,
educational, and vocational effects [53]. The proposed work focuses on CWS aged
between 8 and 12 years old.
An assessment for stuttering will usually involve 1) the collection of
background speech, developmental and medical information from the parents or the
adult client, 2) evaluation of speech samples collected in a variety of situations,
including oral reading, conversation, and a spontaneous extended monologue, and 3)
preparation of a written report, often using a diagnostic, in which the severity of
stuttering and the severity of the observed secondary behaviours are documented.
The SLP will often use the results of previously given phonological tests and (if there
are obvious structural issues or voice irregularities) oral peripheral examinations.
Supporting assessments by a psychologist or psychiatrist or a voice specialist may be
used or requested by the SLP in some cases.
Variables in Choosing Therapy Techniques
The therapy techniques a SLP selects obviously will be influenced both by
what he or she is trying to accomplish and the specific behaviours that have to be
modified. Accomplished SLP needs to be aware of many approaches. Children as
well as adults will respond in unique ways to different therapeutic approaches [54].
The influence of any approach will always be somewhat different because of the
characteristics of the SLP who is using the approach. The technique that works so
dramatically for one child does not necessarily work dramatically, or at all, for other
Stuttering is so variable and so highly individualized that, few would disagree,
no one method works for all children. There are several factors affecting the choice
of techniques. What is possible during treatment is often determined by treatment
variables such as the availability, setting and cost of services as elaborated in subsections below. Ideally, treatment will result in spontaneous fluency.
The Therapy History of Client
A client’s therapy history can influence the choice of intervention strategy. If
a client has tried a particular approach and has concluded rightly or wrongly that it
was not effective, it may not be advisable to try that approach again, particularly if
there is a viable alternative. The reason is that if the SLPs do so, the client is likely
to expect it not to “work”, which could reduce the likelihood of it being effective
The most basic part of any assessment is the SLP’s understanding of the
client’s behavioural and cognitive features of the person’s problem, an understanding
that can occur as SLPs come to appreciate the client’s story. Information is obtained
from many sources during the assessment of clients. Additional speech samples
from representative situations outside the clinic setting should also be obtained either
prior to or following the assessment in the form of audio- or videotapes [55].
The Age and Motivation Level of Client
A client’s age and level of motivation to invest in therapy can also influence
the choice of intervention strategy. Some strategies that are appropriate for use with
adults may not be appropriate for use with young children. Also, some strategies that
would be appropriate for clients who are highly motivated would not be so with
others. Voluntary stuttering is an example of such a strategy [56].
Motivation is the force that impels people to act. As one of the critical
ingredients in successful treatment, motivation is something SLPs have to be
sensitive to. Properly directed, it provides energy that leads to the change in therapy.
Clients are unlikely to benefit from therapy if they are not sufficiently motivated to
make the investments required. It is sometimes possible to increase their motivation
for making these investments.
If, for example, the reason for the lack of motivation was that the client did
not expect therapy to be effective because of previous unsuccessful therapy
experiences, proving that change is possible could result in an increase in his or her
motivational level. An initial intervention goal for clients who believe this could be
proving to them that they can change by engineering it so that they do change some
aspect (s) of their behaviour.
Virtually all clinical authorities agree that motivation of the PWS is a key
feature of a successful treatment outcome [57]. Motivation is seldom maintained at a
constant level at the outset of clinical setting. The client will probably have a kind of
motivational reservoir he or she can draw on, but its depth and clarity will vary with
his or her changes in mood, the successes and failures of day-to-day experiences, his
or her sense of progress, and the like.
Economic and Time Constraints
Therapy can be frustrating. It may take more than a little time and money. A
SLP may reject a therapy technique that is likely to be effective because it is not
practical for the reason of economic and time constraints. It may require the client to
invest more time in therapy sessions and/or daily practice than he or she is willing or
able to. Or it may require the client to make a financial investment that he or she
cannot afford.
SLP’s Beliefs
A SLP’s belief about the best way to modify behaviours is also likely to
influence his or her choice of therapy techniques. Authorities do not agree about
what techniques are most likely to be successful for modifying some of the
behaviours exhibited by PWS, particularly the abnormal disfluency. As a result, a
relatively large number of strategies for reducing stuttering severity have been
advocated and used by at least a few SLPs [10]. They must be versatile in
implementing therapy program to fit the strengths and weaknesses of clients.
The SLP’s beliefs or hypotheses about the cause(s) of the behaviours he or
she seeks to modify also play a role in choosing therapy techniques for clients. The
program developed by SLP should be based upon his or her “best guess” as to why
the client is exhibiting it. If the hypothesis is correct, the SLP is more likely to be
successful than otherwise.
Basic Stuttering Treatment Approaches
There are several treatment approaches for stuttering that may provide relief
to varying degrees. The most commonly utilized techniques of facilitating fluency
are fluency shaping (FS) and stuttering modification (SM). Both techniques were
considered by PWS to be better than those strategies that were intuitive on the part of
the speaker such as forcing out the speech or avoidance [58]. It is interesting to note
that FS approaches tend to be favoured by SLPs with no personal history of
stuttering. SM approaches, on the other hand, tend to be the treatment of choice by
SLPs who themselves have experienced stuttering. Figure 3.1 shows the comparison
between FS and SM.
Some SLPs prefer to use the combination of FS and SM elements. This
approach usually begins by teaching the PWS FS strategies to slow down and smooth
out all of their speech. This eliminates most of the overt stuttering behaviour. For
the moments of stuttering that remain, the PWS learn to manage SM strategies. In
addition, the SM phases of motivation, identification, and desensitization get
incorporated into therapy to help the PWS manage the negative emotions that have
built up around the stuttering. This dual approach is more forcefully applied to
advanced PWS especially AWS than for beginners. It uses a variety of handouts
such as understanding stuttering, how to be positive about stuttering and how to use
feared words during treatment sessions [20].
FS tries to help a person speak more easily and fluently while SM helps a
person to stutter more easily. The exact approach is determined by the age of the
client, how long the person has stuttered and the severity of the stuttering, but
measuring the success of treatment is difficult and far from an exact science.
Figure 3.1: Comparisons between Fluency Shaping and Stuttering
Fluency Shaping (FS)
Many of fluency-producing activities involve combinations of altered
vocalization or enhancement of the speaking rhythm [59]. FS approaches are also
referred as fluency modification. The essence of FS is the establishment of fluent
speech in a controlled clinical setting which effectively and durably replaces the
chronic stuttered speech pattern with a newly learned prolonged and rhythmic fluent
speech. Once fluent speech is attained, it is shaped and expanded so that PWS can
gradually maintain fluency in conversational speaking situations both within and
outside the clinical setting [8].
A FS approach is appropriate when stuttering can be easily eliminated with
the use of fluency induction techniques and when the client exhibits very few fears
and avoidances. FS approach tends to focus on the surface features of the syndrome.
That is, the physical attributes of stuttering in terms of the normal or dysfunctional
use of the respiratory, phonatory, and articulatory systems are central to the treatment
process. This approach might be thought of as physical therapy for the speech
production system. It stresses on the use of smooth, slower-than-normal transitions
on the first two sounds of a word or utterance and an easy initiation of phonation
with smooth articulatory movement during the utterance. The primary goal with FS
strategy is to modify the surface features of the syndrome and not to deal directly
with such intrinsic features as the client’s cognitions about loss of control or attitudes
of fear or anxiety associated with stuttering [19].
The ultimate goal of FS is to have the fluent speech replacing the stuttered
speech. One common FS approach begins with establishing fluent speech in short,
one word, utterances, and then gradually increases the length and complexity of the
utterances while maintaining fluency. A second common FS approach requires the
PWS to alter their speaking pattern in a dramatic way and then move that altered
fluent speech closer and closer to normal sounding speech. Some SLPs have
combined these approaches by having clients alter their speaking pattern in an
exaggerated way, for example speaking at 1 syllable per second by stretching out
every sound in a word, establish fluency with this method in single syllable words,
move up to longer words and sentences, and increase the rate to something more
normal, all the while maintaining a high level of fluency [48].
Some FS therapy programs have used DAF to help clients alter their speech
[60]. This device makes the person hear their own voice slightly delayed. In order
to overcome the delay, the PWS must talk very slowly and smoothly by stretching
out the vowels and sliding all of the words together. The client begins at an
extremely slow rate, around 50 words per minute (WPM), and then builds up to
something slightly slower than normal, maybe 140 WPM, while maintaining the
smoothness and sliding words together.
Stuttering Modification (SM)
SM therapies focus on changing individual moments of stuttering to make
them smoother, shorter, less tense and less penalizing. SM approach can be used
when stuttering still persists after fluency induction techniques have been applied and
when the client exhibits significant fear, and uses many postponement and avoidance
behaviours. It tends to recognize the fear and avoidance that builds up surrounding
the stuttering and consequently spend a great deal of time helping PWS to work
through those emotions. The objective of SM is an easier and more fluid form of
speaking, which may mean an easier form of stuttering [61].
The SM strategy requires the client not only to evaluate and change
behavioural characteristics, but to self-monitor and self-manage cognitive and
attitudinal features of the syndromes as well. Informal counselling in some form is
typically an integral part of this approach. It is also referred to as the traditional, Van
Riperian, or non-avoidance approach. It is based on the concept that a large part of
the problem is the speaker’s struggle and avoidance of the core moment of stuttering
SM therapy has four phases: identification, desensitization, modification and
stabilization. Identification phase involves identifying the core behaviours,
secondary behaviours, and feelings and attitudes that characterize stuttering.
Desensitization encompasses three stages. They are confrontation or accepting
stuttering, freezing of stuttering moment and voluntary stuttering. Modification is
the phase where easy stuttering is learnt through stages like cancellations, pull-outs
and preparatory sets. Stabilization phase seeks to stabilize or solidify speech gains
[63]. The Successful Stuttering Management Program (SSMP) [61] is an example of
a SM treatment program with essentially no empirical evidence of its effectiveness.
SSMP is an intensive 3-week residential program that is based on an amalgam of
desensitization to stuttering, avoidance reduction therapy and the SM techniques.
Why Fluency Shaping (FS)?
The proposed work is developed based on FS approach to replace the
stuttered speech pattern with newly learned fluent speech. Research [43] indicated
that respondents who had participated in FS treatments were more likely to report
that they had experienced a relapse than those who had participated in SM or
combined treatments. Moreover, there are near absence of empirically motivated
treatment outcome studies for SM approach and it is only recommended if stuttering
still persists after applying FS approach.
In a study of a "smooth speech" FS stuttering therapy program, about 95% of
PWS were "very satisfied" or "satisfied" with their speech at the end of the treatment
A rigorous study [4] was carried out among 42 participants through the threeweek program at the Institute for Stuttering Therapy and Treatment in Edmonton,
Alberta, Canada. The FS program was based on slow, prolonged speech, starting
with 1.5 seconds per syllable stretch, and ending with slow-normal speech. The
program also works on reducing fears and avoidances, discussing stuttering openly,
and changing social habits to increase speaking. The program includes a
maintenance stage for practicing at home. The therapy program reduced stuttering
from about 15-20% SS to 1-2% SS. 12 to 24 months after therapy, about 70% of the
participants had satisfactory fluency. There is about 5% of participants that were
marginally successful and about 25% had unsatisfactory fluency.
Only one long-term efficacy study of a SM therapy program has been
published in a peer-reviewed journal [61]. This study concluded that the program
appears to be ineffective in producing durable improvements in stuttering behaviours.
SM therapy assumes that PWS will never be able to talk fluently, and so the
best a stutterer can hope for is to be a better communicator while still stuttering. The
effectiveness of other, more recently developed stuttering therapies for producing
fluent speech makes this assumption questionable.
Study [61] indicated that naive or non-professional listeners responded less
well to stuttering combined with SM techniques than they did to stuttering (only). In
other words, listeners may prefer to listen to untreated stuttering than to listen to a
stutterer using SM approach.
3.5 Problem Formulation
The earliest known references to stuttering date back to about 2000 B.C.
From the distant past until the recent stuttering therapy techniques have included
everything from holding pebbles in the mouth to drug therapy. Stuttering therapy
has many variations yet no treatment method or therapy technique has successfully
and positively cured stuttering.
3.5.1 The Uniqueness of Each Individual
There are many therapeutic approaches that have been shown to help PWS.
To be sure, the logic and techniques associated with most intervention strategies
provide the SLP with a framework and a sense of direction about the syndrome and
its treatment. Each strategy comes with its own doctrine. Each of these approaches
can provide something of value for the SLP and her client, depending on such
variables as the needs of the client, the stage of treatment, and the talent and
experience of the SLP. Almost any therapy has the power to eliminate stuttering in
someone, sometime and someplace. The uniqueness of each individual SLP and
PWS prevents any specific recommendations of therapy techniques from being
universally applicable [7].
Whatever the structure of treatment program, the process of change is far
more complicated than the use of the dogma of a treatment method and associated
criteria. Depending on the client, the SLP may use a variety of techniques and
possibly more than one overall treatment strategy. Even if a single overall strategy is
used, the application will never be quite the same with each client, for individuals
often respond differently to identical techniques. In some cases a particular approach
won, and in another investigation another method finished first.
PWS vary a lot and some will improve most with a domineering SLP, others
with more easy-going ones. The success of treatment is closely tied to the ability of
an experienced SLP to determine a client’s readiness for change and adjust treatment
techniques accordingly. Thus the utility of the techniques depends on the SLP’s
ability to apply the right technique (s) at the right time. With this in light, the
inclusion of a computer-based tool into stuttering assessment system assists SLPs in
the process of determining suitable therapy techniques for each client.
Difficulty in Identifying Appropriate Therapy Technique
Stuttering is a complex combination of attitude, behavioural, and cognitive
features bound together with degrees of anxiety and fear. Because of its complex
nature, stuttering is resistant to long-term change. Assessment of stuttering is
multidimensional [3]. PWS in different stages of change need to be matched with
different treatment processes, or improvement will be less likely to take place. This
way of considering the process of treatment coincides with the client-centred
approach advocated by a computer-based stuttering assessment system.
Fluency treatment that is based upon motor learning theory is advantageous
to SLPs, particularly those who determine that, due to a client's lack of progress,
alterations in treatment are warranted. However, the difficulty for these SLPs is not
a failure to recognize that changes in the course of treatment. Rather, it is suggested
that the difficulty for many SLPs is how to choose therapy technique and remain
consistent with the goals of therapy [65].
The use of computer technology in speech assessment is still new. There are
many clinical approaches to treat PWS. However, normally 2 to 3 months are
required to determine suitable technique for each client. The advantage of the
proposed system is that it duplicates the function of complicated and expensive
acoustic equipment available only at well-equipped speech-language pathology
Time Consumption in Classical Manual Assessment System
SLPs have to try every different approach depending on the needs and
response of the client, which may take months of repeated procedures that are costly
and overly generalized. Being able to move systematically and persistently toward
distant goals is essential since treatment with PWS takes a considerable length of
time. For PWS, practice with therapy techniques must take place for many months
and years before the techniques become functional.
Research [6] has found that there were at least 115 therapy techniques that
decreased stuttering markedly. It is evident that comprehensive treatments for PWS
involve so much more than simply “fixing the stuttering” and making people fluent.
It is much bigger than that and far more tedious. Each treatment strategy requires the
client to monitor and self-manage many aspects of his surface and intrinsic
behaviours. Each strategy dictates that the client systematically learns and practice
techniques, first within the treatment setting and then – gradually – outside the
security of the clinic, in real-world speaking situations.
Each method places great emphasis on the client to take primary
responsibility for his own self-management. In other words, many of these
techniques require a conscious effort on the part of the clients. The success of
therapy usually depends a great deal on the amount of effort the client. The fact is,
the longer a SLP takes to assess a client, the more tedious and troublesome will be
for a client [19].
PWS often finds traditional therapy to be a very tiresome and undesirable
process. Successful intervention also requires continued commitment and motivation
by the client. Therefore, element of motivation should be integrated into stuttering
therapy. The problem could be eliminated if SLP uses a "user-friendly" stuttering
relief approach where this could be enhanced by the use of computer-based system.
An experienced guide such as computer-based scoring analyses can show the way or,
at the very least, make the journey more efficient and often more pleasant.
A problem in relying on speech measures alone is that stuttering can vary
with speaking situation. The majority of children referred to SLPs for assessment are
in a phase of development when speech is changing rapidly, research [66] has
suggested that repeated observations should be made, because it is only with the
passing of time that a greater degree of certainty can be given regarding the child’s
dysfluency. Computer-based assessment tool can be used to assist SLP in these
repeated observations which in turn make the job easier for SLP by saving a lot of
time spent in manual assessment process.
There is agreement that a reduction in stuttering frequency and severity is
associated with effective treatment outcome [54]. But there is a major problem in the
reliability and validity of clinic-based perceptual measures of stuttering. Stuttering
has varied treatment techniques, only a few have been tested for their efficacies.
Some are questionable; some have uncontrolled clinical support; several are purely
rational [61].
Numerous approaches exist to treat stuttering, yet there remains a paucity of
empirically motivated stuttering treatment outcomes research. Despite repeated calls
for increased outcome documentation on stuttering treatment, the stuttering literature
remains characterized by primarily ‘‘assertion-based’’ or ‘‘opinion-based’’
treatments, which by definition are based on unverified treatment techniques and/or
procedures [61].
Conversely, ‘‘evidence-based’’ treatments, based on well-researched and
scientifically validated techniques, remain relatively rare in the field of stuttering and
are usually limited to behavioural and FS. The objective assessment of specific
stuttering treatment approaches is also important to elucidate therapy techniques that
contribute to desirable outcomes.
Different SLPs will interpret progress differently. The change of SLP in the
assessment process for a particular client may lead to the disagreement over the
result where the new SLP may have different perceptions on the efficacy of a therapy
technique for the client. One way to alleviate this problem may be as proposed in
current work where the proposed system is less oriented towards human perception,
rather, it implements technical-oriented approach where standard scoring parameters
could be used by SLPs to assess PWS by referring to the scoring generated
automatically by the software.
Underlying Design Principles
Therapeutic activities afford the client a structured opportunity to perform in
a specific manner. From the SLP’s perspective, therapeutic activities create an
occasion for assessment to measure a client’s performance level while demonstrating
a specific behaviour in a controlled environment. A “good” activity meets the needs
of the client and the SLP. Regardless of the overall treatment strategy chosen by the
SLP, all programs emphasize such factors as enhancing the client’s enjoyment of
speaking, empowering the client to understand and use his “speech helpers”, using
fluency facilitation techniques to achieve and expand fluency and to improve the
client’s self-confidence as a speaker and a person.
Audio and Visual Feedbacks
The proposed software tool enhances the assessment process by providing
clients with the audio and visual feedbacks necessary to identify speech properties
while still allowing the SLP to have control of the treatment process. PWS modify
their speech in subtle and variable ways to gain control over stuttering and, in that,
they appear to be similar to a well-known experimental technique for suppressing
PWS known as response contingent stimulation [67]. Computer-based assessment
system has the potential for substantially reducing the typical cost and time
requirements especially for chronic or severe stuttering.
As these conceptions develop, PWS become more concerned about their
ability and more sensitive to evaluation, especially negative evaluation. Moreover,
once they have developed a clear and coherent understanding of ability, the particular
conception of ability they adopt will determine a great deal about their motivational
patterns. It will influence such things as whether they seek and enjoy challenges and
how resilient they are in the face of setbacks [68].
Whether the SLP chooses to work indirectly or directly with the PWS,
however, the essence of treatment consists of both facilitating the child’s capacities
to produce easily fluent speech and reducing the demands placed on the child that
result in fluency disruption. The fluency-enhancing activities can provide highly
dramatic results, and such instantaneous improvements tend to have the effect of
making anyone who uses them an “expert” on how to help PWS. SLPs using operant
and FS approaches generally obtain considerably more data and specify specific
criteria for moving a child from one step of a program to another [8].
Fluency enhancing procedures provide the PWS with techniques for both
initiating and enhancing his fluency. The SLP cannot always assume that because
the child’s speech is non-stuttered, it is necessarily fluent. Speech that is to be
expanded and reinforced should have high-quality fluency, which is characterized by
smooth and effortless production. FS approach consists of procedures to help the
PWS more efficiently manage the breath stream, produce gradual and relaxed use of
the vocal folds, use a slower rate of articulatory movement, make gradual and
smooth transitions from one sound to another, produce light articulatory contacts,
and keep an open vocal tract in order to counteract constrictions resulting from
tension [8].
The proposed software is based upon the physical analysis of speech sounds
as they are being uttered. It provides real-time measures of sounds, evaluates the
sounds against standards for their production, and immediately signals the results of
the evaluation in graphs plotted on the computer screen. The implementation of
computer-based assessment system provides a faster way than traditional assessment
Assessment process must be facilitated in a structured manner. Intervention
for all clients is likely to be the most efficient when a careful analysis of the
speaker’s capacities and responses to demands are factored into the treatment
Monitoring and Assessment
Good fluency therapy respects the unique personal needs of each PWS. Only
if SLPs maneuver in response to the circumstances or the facts presented by a PWS,
which may not be those that have been taught or that are expected, it is likely, in the
long term, that they will be able to assess and treat the PWS effectively. Short- and
long-term outcomes of stuttering assessments of different SLPs with varying therapy
programs should be evaluated with the same measurement instruments [69].
Progress with PWS can be judged by such things as the improved use of
techniques, decreased reliance on cueing by the SLP, increased control of fluency,
taking risks, and decreased avoidance of speaking and speaking situations [2].
Stuttering therapies should be improved by feedback provided to the SLPs about the
results of their therapies so that SLPs are well-informed about the clients’
receptiveness towards each therapy technique.
This could be enhanced by a computer-based tool where it generates history
file summarizing client's past attempts. A log of the client's scores is maintained in
the personal file in computer. In addition, the personal score files maintain a count
of the total number of times each utterance is practiced. SLP can use this
information to determine suitable therapy techniques for each client. Moreover, it
enables SLP to assess or monitor client progress and observe how much time client
spends to practice. Based on client’s progress record, the supervising SLP is wellinformed on which displayed signals and measured parameters would be useful for
improving the speech rehabilitation process. These features enable the client to be
success for future therapy sessions.
The combination of feedback, progress records, and customizable practice
phrases would be a valuable asset to current assessment system.
Clinical Evaluation
Any clinical program design must follow and constitute a major part of the
present evidence-based practice (EBP) or treatment with PWS [70]. The concepts of
establishment, transfer (generalization, out-of-clinic), and maintenance (over a long
term period) must be employed. Follow-up post treatment data is collected to
determine the positive, long-term effects of the programs. All these procedures have
important contingency management features. EBP indicated that basically five steps
are required in implementing clinical treatment system as shown in Figure 3.2.
Upon development of a new system, the system must be evaluated or tested
in real situation, which is called "clinical trials". Clinical trials involve running
supervised tests to determine the effectiveness and safety of proposed system with
the aim of answering scientific questions about stuttering assessment. A clinical trial
may be separated into phases, or steps, with each step designed to answer a separate
research question. Test subjects are assessed carefully to make sure they are having
stuttering and not other problems. A control group is essential so that a treatment
group would have to ‘‘beat the odds’’ to provide convincing evidence of treatment
effects. Control groups need to be appropriately matched on key variables factors
known to be associated with design objectives [6].
Convert a clinical need into an answerable question
Search for and find the best evidence to answer the question
Critically evaluate the evidence for validity and applicability
Apply the results to clinical practice
Evaluate and audit performance
Figure 3.2: Five Steps to Implementing EBP
Confounding factors in the treatment design must be addressed, monitored
and if possible reduced or eliminated. This is essential as it can impact significantly
on the outcome results. During assessment, features of treatments programs need to
be carefully examined such as impact of different rewards and punishments,
avoidance behaviours, impact of attrition of outcome, and effect of individual
treatments in combined treatment programs [71]. For example, when a client uses a
fluency-facilitating technique and alters his usual tense and fragmented speech into a
more open and forward-moving pattern, the SLP can reward the accomplishment.
Rewards, if used should be structured so that the PWS can earn them in many
speaking situations, and any changes to treatment should be discussed between the
SLP, PWS and others who may be involved. Basic speech measurement should be
implemented to inform treatment progress or detect signs of lack of progress, or
progress plateau so that these potential barriers to treatment can be identified,
discussed and changes implemented within the treatment period [72].
Perhaps the best treatment in this regard was Sheehan’s comment [73] that
producing stutter-free speech is no more realistic than playing error-free baseball.
He reasoned that because the person possesses the capacity to function in an errorfree manner it does not follow that this will always be the case. By decreasing
demands, desensitizing the PWS to fluency-disrupting stimuli, and giving rewards
for open, easy, and forward-moving speech, the child is guided step-by-step toward
increased fluency.
Many of the clinical therapy techniques require a conscious effort on the part
of the PWS. Generally speaking, activities for PWS need to be more engaging -having more “entertainment value” and being of greater interest to the individual
child. Additionally, a child may need more support making the connection between
a therapeutic activity and its usage in the “real-world”. However, the basic elements
of what makes an activity “good” (for the client and SLP) remain the same [74].
Therefore, the element of motivation must be integrated into stuttering therapy.
Rewards are important to motivate the clients. Research [6] has supported
that both tangible forms of rewards and verbal rewards were effective in reducing
stuttering. Both forms of reward appeared to be successful, but their unique
contributions could not be measured because treatment involved a number of therapy
procedures. It is essential that the child enjoys the treatment and finds it to be a
positive experience. The proposed computer-based assessment system displays
speech waveforms and amplitude curve in graphical representation which can
motivate and encourage PWS to practice their therapy more often. Moreover,
applause and compliments of firework displays are implemented for client who
managed to obtain good scores.
System Design
The proposed work is developed based on standard FS techniques used in
fluency rehabilitation regimen. DSP techniques are implemented to analyze speech
signals. The software is based upon the physical analysis of speech sounds as they
are being uttered.
Software provides real-time visual and audio bio-feedbacks where client's
average magnitude profile (AMP) is displayed as it is spoken and it is superimposed
on the SLP’s AMP. The maximum magnitudes of the clients’ and the SLPs’ speech
signals, corresponding to AMPs, are determined and compared. The maximum
magnitude is determined where a total of 15 neighbouring samples are summed to
obtain a maximum value. The display of AMP is intended to convey to the client
those locations where the client's utterance differed from the SLP’s in the aspect of
start and end alignment, magnitude and duration. AMP of the spoken utterance is the
primary source used to gauge the fluency and performance of the client.
Software provides real-time measures of sounds, evaluate the sounds against
standards for their production, and immediately signal the results of the evaluation in
graphs plotted on the computer screen. Start location, end location, maximum
magnitude and duration are compared between clients’ and SLPs’ AMPs to generate
scoring, for each therapy technique. A log of the client's scoring is maintained in a
personal file saved in computer. In addition, the personal score files maintain a
count of the total number of times each utterance was practiced. SLP can facilitate
each client’s progress by assessing client’s history file. The computational analyses
help SLP to determine suitable techniques in a faster way.
Criteria for Selection of Scoring Parameters
Assessment refers to the monitoring and evaluation of various aspects of a
speech according to certain criteria [6]. The selection of scoring parameters is
important because the parameters will influence in clinically significant outcome for
stuttering treatments within an evidence-based framework. Moreover, it is important
to make sure that a framework might lead towards outcomes that are meaningful for
the SLPs and clients. A low score means the client had episodes of stuttering.
Selection of scoring parameters is significant because the scorings are used to:
(1) guide implementation of the program from week to week; (2) identify when the
PWS has met criterion speech performance; and (3) check that the PWS’s speech
continues to meet criterion speech performance in the long-term. The stuttering
measurements enable the SLP and the parent to communicate effectively about the
severity of the PWS’s stuttering throughout the treatment process. Any departure
from the criterion speech performance, results in more frequent clinic visits and
possibly an increase in therapy durations.
Scoring generated by the proposed work consists of four parameters based on
FS approach. The parameters are start location, end location, maximum magnitude
and duration. The purpose of scoring is for the SLP to evaluate progress towards the
goal of stutter-free, natural sounding, speech and to guide the client accordingly.
Scoring assists SLPs in determining suitable therapy techniques for each client.
PWS not only differ in how often they are disfluent, they also differ in how
long their moments of disfluency tend to last. Although there are several ways to
measure background noise level, the best way to fully characterize and determine
background noise levels is to measure the noise over a period. This can be done by
using a microphone and a computer-based data acquisition system that record the
noise levels in a few seconds. During the recording, when amplitude measured is
greater than background noise level, this location is identified as the start location
In stuttering assessment, the duration of tense pauses and prolongations is
usually less than five seconds [76]. This plays a significant factor in the
determination of end location. During the recording, the end location is identified
when amplitude measured is equal to background noise level for 5 seconds.
Maximum magnitude is another important parameter because a stutterer’s
speech is usually abnormally loud or soft or whose voice is abnormally high- or lowpitched for his or her age and sex. While these behaviours may not be related to the
person’s stuttering, they may be devices that he or she is purposefully using to reduce
stuttering severity. A frequent use of the device indicates a greater severity of
stuttering [19].
Moreover, PWS may appear relatively relaxed while being disfluent or may
produce audible and/or visible signs of tensing during at least some of his or her
disfluencies. Such tensing can be manifested in a number of ways (singly or in
combination) including tense pauses, audible tension (strain) evinced in the voice
while speaking, abnormally loudness of voice. The presence of a significant degree
of abnormal loudness while being disfluent does appear to differentiate PWS (or who
are at risk of developing stuttering) from their normal-speaking peers [8]. Therefore,
the parameter of maximum magnitude should be included when assessing PWS.
Duration of total speaking time is the common method in speech assessment.
Any time-consuming disruption in speech flow, such as with syllable and whole
word repetition, increased the duration of speech measurement. Easily made
measures of duration can yield an indication of the tension that is occurring.
Measures of duration such as waveform analysis are necessary for clinical evaluation.
The degree of duration also may be reflected in the rate of speech in words or
syllables per minute, with lower rates indicating greater severity [8].
The problem formulation and basic principle of stuttering assessment system
are outlined in this chapter. The underlying design principle of the proposed
computer-based stuttering assessment system has been described. The criteria for the
selection of four scoring parameters have been elaborated. The following chapter
delineates the development of a computer-based Malay stuttering assessment system.
This chapter outlines the works involved during the development of
computer-based assessment system. Section 4.2 elaborates the system requirements
of the assessment system. Before developing the software, the implementation
design approaches available have been analyzed carefully and the most suitable mean
was determined as the design choice. Section 4.3 details the overall system
descriptions. The coding steps and the scoring algorithms are described respectively
in Section 4.4 and Section 4.5. Finally, Section 4.6 concludes Chapter 4.
System Requirements
The assessment system is compatible with any computer with a recent version
of Windows. The computer screen can be set at any resolution with the 800*600
pixels as optimal functioning. The required hard disks space is 2MB. A printer is
helpful for printing out the results and history files. The assessment system is made
as simple as possible so that it is affordable and easily integrated into the current
speech rehabilitation regimen.
Hardware Requirements
The assessment system utilized a computer equipped with a sound card, a
microphone and earphones. Sound card is a device that process audio data and send
it to one or more speakers. Most sound cards are also capable of processing audio
input from a microphone for various purposes.
A standard PC sound card (or sound chipset) includes an analogue to digital
converter (ADC) for converting external sound signals to digital bits, a digital to
analogue converter (DAC) for converting digital bits back to sound signals, an
Industry Standard Architecture (ISA) or Peripheral Component Interconnect (PCI)
interface to connect the card to the motherboard, and input and output connections
for a microphone and speakers. Either ISA or PCI sound card can be used.
Microphone is an acoustic transducer that converts sound into an electrical
signal. There are many types of microphones. Dynamic and condenser microphones
are the most popular microphones in which either one can be used in our application.
If a lapel microphone is used, the loudspeakers are not required.
Software Requirements
The software is developed using Microsoft Visual C++ 6.0 running under
Window XP. It has the potential for substantially reducing the typical cost and time
requirements for chronic or severe stuttering. The software is developed as
Microsoft windows application or GUI, which makes therapy user friendly.
System Descriptions
The software is developed based on standard FS techniques used in fluency
rehabilitation regimen. DSP techniques are implemented to analyze speech signals.
The maximum magnitudes of the clients’ and the SLPs’ speech signals,
corresponding to the AMPs [77], are determined and compared. The maximum
magnitude is determined where a total of 15 neighbouring samples are summed to
obtain a maximum value.
Start location, end location, maximum magnitude and duration are compared
between clients’ and SLPs’ AMPs to generate scoring, the computational analyses
help SLP to determine suitable techniques in a faster way. Three therapy techniques
are introduced in computer-based method; these techniques are Shadowing,
Metronome and DAF.
Figure 4.1 describes the system block diagram. Sound recording of 5 goal
utterances is implemented where the software incorporates functions record,
playback, open, close and save of standard WAVE file. The 5 goal utterances are
customized by the SLP for each client depending on the age and language level.
Goal utterances are chosen based on their phonetic characteristics. Each goal
utterance is stored in a separate WAVE file, which can be individually selected for
practice. The duration for each target utterances is six seconds.
Background Noise Level Identification
Initiate Practice Session
Client practices to match goal
utterance using real-time visual
and audio feedbacks
Practice Goal
Wave files recorded
by clinician
Analyse and Compare
Analyse and compare attempt
utterance to goal utterance
Display Scoring
History File
Update History File
Text file used to
assess client's
Figure 4.1: System Block Diagram
Background noise level is identified for each client’s environment. In the
speech pathology clinic, after identifying the client’s stuttering problem, the SLP
verbally records 5 speech utterances for client to practice during the assessment
During the process, the client selects playback to listen to the SLP’s prerecorded utterances. The client speaks an utterance into a microphone. The client
practices matching the SLP’s speech pattern via both audio and visual means. The
client can audibly and repeatedly listen to the target utterance by selecting playback.
The visual comparison is achieved via the display of AMPs of both the SLP and
client utterances on the same axis. The software is able to calculate and display the
client’s AMP as it is spoken. The SLP’s AMP is first drawn on the screen in red
colour. The client’s AMP is then drawn in blue line in real time as the utterance is
AMPs are displayed for client to copy as closely as possible and it conveys to
the client those locations where the client's spoken utterance differed from the SLP’s
signal in the aspect of amplitude, duration, onset, and end location. The speech
processing is done in real-time. Real time display of AMP is very important as it
allows the client to instantly evaluate and compare their speech to that of the SLP.
The client can then, if necessary, alter their speech as required to closely match the
SLP’s AMP. This gives immediate feedback to the client’s performance relative to
the goal, and allows the client to anticipate what amplitude or rate change that is
needed to reach the goal.
The scoring algorithms assess the client’s performance where the scoring
routines compare the client's utterance to the reference utterance in four categories.
They are start location identification, end location identification, maximum
magnitude comparison and duration comparison. Upon completion of a practice,
scores are assigned to each trial. The scores are displayed to the client, allowing the
client to observe the progress being made.
Software generates a history file summarizing the client attempts. A separate
history file is created for each client. A log of the client's scores is maintained in the
personal file in computer. In addition, the personal score files maintain a count of
the total number of times each utterance was practiced. SLP can use this information
to determine suitable therapy techniques for each client. Moreover, it enables SLP to
assess or monitor client progress and observes how much time client spends to
practice. Based on client’s progress record, the supervising SLP is well-informed on
which displayed signals and measured parameters would be useful for improving the
speech rehabilitation process.
This section describes how each step of software development is done in
Microsoft Visual C++ 6.0.
Audio File Format
There are many types of audio file format and the most commonly used
digital audio file format in computer is .WAV file. WAVE file is a file format for
storing digital audio (waveform) data. This format is widely used in professional
programs that process digital audio waveforms [78]. Wave file is used in present
work where the initialization of wave file format is shown in detail in Appendix I.
Sampling is the process of converting a signal into a numeric sequence or a
function of discrete time or space. The number of samples taken per second is
known as the sampling rate. The sampling frequency should be at least twice the
frequency of the highest frequency of interest in the input signal. The telephone
system uses a sampling frequency of 8 kHz and it can capture only information up to
4 kHz. In studying speech recording, normally a sampling frequency of 16 kHz is
used which gives information up to 8 kHz [78]. The choice of an appropriate
sampling setup depends very much on the speech processing task and the amount of
computing power available.
The Nyquist rate is defined as twice the bandwidth of the continuous-time
signal. It should be noted that the sampling frequency must be strictly greater than
the Nyquist rate of the signal to achieve unambiguous representation of the signal.
This constraint is equivalent to requiring that the system's Nyquist frequency, which
is equal to half the sample rate, be strictly greater than the bandwidth of the signal. If
the signal contains a frequency component at precisely the Nyquist frequency, then,
the corresponding component of the sample values cannot have sufficient
information to reconstruct the Nyquist-frequency component in the continuous-time
signal because of phase ambiguity [79]. A 16 kHz sampling rate is a reasonable
target for high quality speech recording and playback [80]. A sampling rate of 16
kHz is used in present work.
Resolution Bit
When an acoustic signal is digitized, it is turned into a sequence of binary
numbers by the analogue-to-digital hardware. It is an important process where a
fixed number of binary digits is used to represent each sample and hence that the size
of the smallest change that can be detected in the input is related to the number of
bits used.
Analogue to digital hardware uses a fixed sample size to represent the
sampled acoustic signal; typically 12 or 16 bits are used per sample. A little
arithmetic will tell that 12 bits will give a maximum of 212 = 4096 different numbers
while 16 bits gives 216 = 65536 values. These numbers is used to represent the
different input voltages taken from the microphone. When the hardware measures
the size of the input voltage from the microphone, instead of calculating a voltage
value, it merely assigns it a number on a scale of 0 to 65536 (for a 16 bit digitizer).
Each sample of an audio signal must be ascribed a numerical value to be
stored in the computer. The numerical value expresses the instantaneous amplitude
of the signal at the moment it was sampled. The range of the numbers must be
sufficiently large to express adequately the entire amplitude range of the sound being
The number of bits used to represent the number in the computer is important
because it determines the resolution with which the amplitude of the signal can be
measured. If only one byte is used to represent each sample, then the entire range of
possible amplitudes of the signal must be divided into 256 parts since there are only
256 ways of describing the amplitude. 16-bit samples are used in present work to
provide the resolution necessary to calculate AMPs that portray the difference
between an utterance spoken by SLP and an utterance spoken by client.
Mono Channel
Sound files can be stereo, with a right channel and a left channel, or they can
be mono with just one channel. Mono or monophonic describes a system where all
the audio signals are mixed together and routed through a single audio channel. The
key is that the signal contains no level and arrival time/phase information that would
replicate or simulate directional cues. Mono systems can be full-bandwidth and fullfidelity and are able to reinforce both voice and music effectively [81].
Mono channel is used in present work where the main advantage is that
everyone hears the very same signal, and, in properly designed systems, all listeners
would hear the system at essentially the same sound level. This makes well-designed
mono systems very well suited for speech reinforcement as they can provide
excellent speech intelligibility. Mono system reduces the file size into half compared
to stereo [81]. The wave file format is initialized for mono channel as shown in
Appendix I.
DC Offset Removal
For some audio files, the direct current (DC) or zero frequency component is
not zero. This is called DC offset. DC offset is the average vertical offset from 0dB
that is in the recorded wave form. Every sound card has its own unique DC offset
[82]. Before applying a window function, the time domain data is corrected for any
DC offset from zero.
To increase the silence detection performance as needed during the
background noise level identification, the DC offset should be removed from each
sound file before the A/D conversion [82]. DC offset is undesirable because it tends
to mean that the positive peaks of the waveforms are more likely to exceed the
maximum level that can be represented. It is a common problem with PC sound
cards. DC offset can cause problems when concatenating several messages in series.
DC offset also exacerbates background noise problems and it causes errors when
trying to measure the noise floor of a recording.
DC offset must be removed in order for the particular AMP to appear the
same on all computers. The same speech waveform containing different DC offsets
will result in two unique AMPs if the DC offset is included in the calculation of the
average magnitude. The DC offset is calculated by determining the average value
for each 25 ms segment of speech [77]. The DC offset is then subtracted from each
sample value as shown in equation (1), which allows the level shifting of the speech
signal back to zero. The program flowchart is shown in Figure 4.2.
. (1)
int data [i];
double dc=0;
double result=0;
int M=400;
dc = dc + data [i]
i = i+1
i < M?
dc = dc/ (M+1)
data [i] = data [i] - dc
i < M?
Figure 4.2: Flowchart: DC Offset Removal
The basic audio representation is expressed as amplitude change with time.
This is the time domain representation. In the past, it is assumed that audio signal is
stationary over a specified interval of time. Most audio signals are far too long to be
processed in their entirety; it is necessary to divide the time-domain signal into
windowed intervals and process each window individually [83]. A window is a
temporal weighting function, which is applied to a signal before some other
operation such as DSP algorithms. The act of windowing can have a dramatic
influence on the results. Three major aspects of windowing are window type, size
and shift.
Windowing can be seen as multiplying a signal by a window which is zero
everywhere except for the region of interest, where it is one. Since signals are
assumed to be infinitive, all of the resulting zeros can be discarded and concentrated
on the windowed portion of the signal. A number of window types exist. Each has
different characteristics that make one window better than the others. The most
commonly used are Hanning, Hamming, Blackman, Kaiser and others windows as
illustrated in Figure 4.3. The best window length depends on the characteristics of
the signal to be analyzed.
Figure 4.3: Common Window Functions
Any windowing operation causes distortion, since the signal is being
modified by the window. The algorithms for the windowing process are shown in
Appendix I. The WAVE files are processed using analysis and synthesis window
lengths of 400 samples with Blackman windowing. Blackman is chosen because it
gives the best attenuation which means that it is clear to hear an audible difference
between filtered and non-filtered samples [83]. Blackman window gives much better
stop-band attenuation. The Blackman window offers a weighting function similar to
the Hanning but narrower in shape. It has all the dynamic range any application
should ever need. Blackman window equation is listed in equation (2).
. (2)
where 0 ≤ n ≤ N
The window w(n) is chosen to have a duration of 25 ms or 400 samples. The
25 ms window is shifted by 10 ms steps. Therefore, a new window of 400 samples is
calculated every 10 ms. A 25 ms window w(n) duration is a common window
duration for time domain processing [84]. It is sufficient to capture all the stuttering
disfluencies. The four disfluencies of stuttering are syllable repetitions, word
repetitions, prolongation of a sound and blocking or hesitation before word
completion. Calculation of AMP requires the calculation of 600 average magnitude
values corresponding to the 600 frames in 6 seconds of speech data. The average
magnitude is calculated for each 25 ms of speech data with a new average magnitude
calculated every 10 ms.
It is a fact that audio signals (both speech and music) are generally not
stationary; they cannot always be said to be stationary over each of these windows.
The window length chosen must strike a balance between being able to pick up
important transient details in the audio, as well as recognizing longer duration and
sustained events. The window length should be small enough so that the windowed
signal block is essentially stationary over the window interval. Research [85]
indicated that windows which are too short fail to pick up the important time
structures of the audio signal. Conversely, windows which are too long cause the
algorithm to miss important transient details in the music.
4.4.7 Time Domain Filtering
Time-domain filtering is favoured because the assessment system is a realtime application in which it is important to process a continuous data stream and to
output filtered values at the same rate as raw data is received [86]. Time domain
filters are used when the information is encoded in the shape of the signal's
waveform. Time domain filtering is used for smoothing, DC removal, waveform
shaping and others.
Convolution is a mathematical operation which takes two functions and
produces a third function. If the first vector is an acoustic signal and the second is
the impulse response of a filter, then the result of convolution is a filtered signal. It
is an operation equivalent to weighted differencing of the input signal. The filter
provides the weighting coefficients. The formula of convolution, y = x*h where y is
the output signal, x is the input signal and h is the filter impulse response. The input
signal is the output signal from Blackman windowing operation and the impulse
response is sin(2.0*PI*fc*(i-M/2))/(i-M/2). The code for the above operation is
shown in Appendix I.
Recording and Playback of Speech Utterances
Modern operating systems such as Windows provide a quite useful
Application Programming Interface (API) for programming soundcards. The normal
way of outputting audio is to open a device and writing blocks of data to this device.
The audio data is generally written to output buffer. The output buffer is a block of
memory which has several constrictions. The data in this buffer is usually
transferred to the soundcard using the Direct Memory Access (DMA) controller.
The DMA controller is a device which can copy data between memory and hardware
devices without needing the CPU. Sound input works generally in the same way as
the output except in opposite direction. The playback and recording process are
described in Section and Section respectively.
71 Playback
Playback is done via "blocks of data". Application reads a block of data from
the WAVE file on disk. This block is passed to the driver for playback via
waveOutWrite(). While the driver is playing this block, another block of data is read
into a second buffer. When the driver finish playing the first block, it signals the
program that it needs another block, and the driver passes that second buffer via
waveOutWrite(). Program will now read in the next block of data into the first buffer
while the driver is playing the second buffer. Again, this is all non-stop until the
WAVE is fully played [81]. See Appendix I for detail description of the playback
function. Recording
The device's driver manages the actual recording of data. This process can be
started with waveInStart(). While a driver records digital audio, it stores data into a
small fixed-size buffer. When that buffer is full, the driver "signals" the program that
the buffer is full and needs to be processed by the program. The driver then goes on
to store another block of data into a second, similarly-sized buffer. It is assumed that
program is simultaneously processing that first buffer of data, while the driver is
recording into the second buffer. It is also assumed that program finishes processing
the first buffer before the second buffer is full [87].
When the driver fills that second buffer, it again signals the program that now
the second buffer needs to be processed. While program is processing the second
buffer, the driver is storing more audio data into the now-empty, first buffer. This all
happens non-stop, so the process of recording digital audio is that two buffers are
constantly being filled by the driver (alternating between the 2 buffers), while the
program is constantly processing each buffer immediately upon being signalled that
the buffer is full. Therefore, the process ends up dealing with a series of "blocks of
data". See Appendix I for detail description of the recording function.
Background Noise Level Detection
Noise is any unwanted signal mixed with the signal of interest. Referring to
Figure 4.4, background noise level is measured for 5 seconds at the beginning of the
assessment process. The noise level is measured by taking the power of two of the
input noise, data [i]. Noise level is added up every time the application finishes the
measurement of each interval. After the period 5 seconds, waveInReset is called to
stop input on the input device and resets the current position to zero. All pending
buffers are marked as done and returned to the application. The waveInClose closes
the input device to stop the detection process.
Notify user for Background
Noise Level Identification
unsigned int i;
double noise=0;
int data [i];
Open audio input device
noise = noise + data [i]*data[i]
i < 5?
Close audio input device
Notify user for the end of
Figure 4.4: Flowchart: Background Noise Level Detection
4.4.10 History File
If this is the first time a client uses the software, a history file is created to be
saved at any desired location. The history file summarizes the client attempts. Each
client’s history file contains the client’s name, utterances attempted, date and time of
attempt and attempt scores for start of speech, end of speech, maximum magnitude
and duration. The data structure of history file is described in detail in Appendix I.
4.4.11 Client Identification
After detecting the background noise level, the Setting dialog box prompts
client to load five wave files which are pre-recorded and customized by SLP as
illustrated in Figure 4.5. The count of technique numbers is incremented each time
client loads a wave file. The client must load a total of five wave files, otherwise, the
application will show an error message. Next, client is required to load text file
which contains the sentences for each wave files. The sentences are displayed for
therapy techniques of Metronome and DAF.
A dialog box prompts client to create an individual history file. As shown in
Figure 4.5 and Figure 4.6, client is required to enter the user name and the location
for saving the file. Client is given choice to save the history file at any desired
location. If this is the first time client uses the software, a new history file is created.
Otherwise, client can choose to either update the previous database or create a new
history file.
Figure 4.5
Dialog Box: The Loading of Wave Files
Figure 4.6: Dialog Box: Client Identification
4.4.12 Compression and Decompression Using Speex
CODEC stands for COmpression DECompression. A CODEC simply knows
how to compress and decompress a given format. The aim of speech compression
using Speex [88] in current work is to produce a compact representation of speech
sounds such that when reconstructed it is perceived to be closed to the original [89].
Each wave file consumes 192 k Bytes in size which may require a minimum
of 3.84 M Bytes of hard disk storage for each practice session of a client. There are
five sentences for each therapy techniques and client may repeat recording for a few
times for the same sentence. The SLP may want to keep a record of client’s recorded
utterances so that SLP can assess the wave files anytime to evaluate the progress and
to decide which therapy technique give the best result for the particular client. This
may require large storage which in turns increases the cost. Therefore, it is important
to reduce wave file’s size by doing the Speex compression [88] and at the same time,
the quality is not significantly reduced.
Speex is mainly designed for 3 different sampling rates: 8 kHz (the same
sampling rate to transmit telephone calls), 16 kHz, and 32 kHz. These are
respectively referred to as narrowband, wideband and ultra-wideband. Speex allows
both very good quality speech and low bit rate. Very good quality also meant the
support of wideband (16 kHz sampling rate) in addition to narrowband (telephone
quality, 8 kHz sampling rate).
Referring to Figure 4.7 and Figure 4.8, each wave file is 192 k Bytes while
each speex file is 22 k Bytes. This reduces 88.54% of original file which saves a lot
of hard disk space. The bit rate is reduced from 256 bit/s to 28 bit/s.
Figure 4.7: Wave File Information
Figure 4.8: Speex File Information
The base Speex distribution includes a command-line encoder, speexenc and
decoder, speexdec. Speex encoding is controlled most of the time by a quality
parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality
parameter is an integer, while for variable bit-rate (VBR), the parameter is a float.
In order to encode speech using Speex, it is needed to include <speex.h>.
The speexenc utility is used to create Speex files from raw PCM or wave files. It can
be used by calling:
speexenc [options] input_file output_file
The value ’-’ for input_file or output_file corresponds respectively to stdin
and stdout. The wideband is used to tell Speex to treat the input as wideband (16
kHz) [89]. The encoding process is called as shown in Figure 4.9 each time the client
saves the wave files in the hard disk where the wave file is compressed to speex file.
Figure 4.9: Encoding Process
The speexdec utility is used to decode speex files and can be used by calling:
speexdec [options] speex_file [output_file]
The value ’-’ for input_file or output_file corresponds respectively to stdin
and stdout. When no output_file is specified, the file is played to the soundcard. The
–mono forces the decoding process in mono [89]. The decoding process is called as
shown in Figure 4.10 each time the SLP opens client’s recorded files for playback
where the speex file is decompressed to wave file.
Figure 4.10: Decoding Process
Scoring provides guidance for SLP to determine suitable therapy techniques
for each client. It also enables SLP to evaluate progress towards the goal of stutterfree speech and to guide the client accordingly.
Many of the clinical therapy techniques require a conscious effort on the part
of the clients. Therefore, the element of motivation must be integrated into stuttering
assessment system. It is essential that the child enjoys the system and finds it to be a
positive experience. Computer-based assessment system displays speech waveforms
and amplitude curve in graphical representation which can motivate and encourage
children to practice their speech for longer periods.
Rewards are important to motivate the clients. Both tangible forms of
rewards and verbal rewards were effective in reducing stuttering. Both forms of
reward appeared to be successful, but their unique contributions could not be
measured because treatment involved a number of therapy procedures. Therefore,
rewards of fireworks display and applause are implemented for clients who managed
to obtain scores of 80 and above.
The selection of scoring parameters is important because the parameters will
influence our clinically significant outcome for stuttering assessment within an
evidence-based framework. Moreover, it is important to make sure that a framework
might lead towards outcomes that are meaningful for the SLPs and clients.
Client’s AMP is compared with SLP’s AMP in four categories and scores are
assigned to each category. They are start location identification, end location
identification, maximum magnitude comparison and duration comparison as
elaborated in the following sub-sections.
Start Location
Starting points can be found by comparing ambient audio levels or acoustic
energy with the sample just recorded [90]. Background noise level is identified for
each client’s environment. When amplitude measured is greater than background
noise level, this location is identified as the start location.
There are 400 samples every 25ms, which means that there is 1600 samples
for each 100ms. As described in Figure 4.11, a score of 100% is assigned if the
client can align his or her utterance within 100 ms of the SLP’s utterance [8]. The
start location of SLP and client are identified as start1 and start2 respectively. The
score is reduced by 10% for each additional 100 ms that both the locations differ. In
other words, whenever the difference between two locations is 100ms, 10% is
deducted from the scores. If the difference is 1100ms or more, a score of 0% is
4.5.2 End Location
End-point detection algorithms identify sections in an incoming audio signal
that contain speech. Accurate end-pointing is a non-trivial task, however, reasonable
behaviour can be obtained for inputs which contain only speech surrounded by
silence (no other noises). Typical algorithms look at the energy or amplitude of the
incoming signal and at the rate of "zero-crossings". A zero-crossing is where the
audio signal changes from positive to negative or visa versa. When the energy and
zero-crossings are at certain levels, it is reasonable to guess that there is speech [91].
Endpoint detection is harder for stuttering application because PWS tend to
have tense pauses. The end location is identified when amplitude measured is equal
to background noise level for 5 seconds. Duration of 5 seconds is chosen because
the duration of tense pauses of PWS is less than 5 seconds [19]. The end alignment
of the client is scored as described in Figure 4.12. A score of 100% is assigned if
client can align his or her utterance within 100 ms of the SLP’s utterance. The end
location of SLP and client are identified as end1 and end2 respectively. The score is
reduced by 10% for each additional 100 ms that both locations differ [8]. In other
words, whenever the difference between two locations is 100ms, 10% is deducted
from the scores. If the difference is 1100ms or more, a score of 0% is assigned.
int i, start1, start2, data1,
start1 = 0
start2 = 0
data1 >
data2 >
i <6?
start1 = i
start2 = i
startscore =
[abs(start1 start2)]/1600
> = 10?
startscore = 0
startscore = 100 -startscore * 10
Save to History File
Figure 4.11: Flowchart: The Scoring of Start Location
int i, end1, end2, data1,
i, j = 0
end1 = 0
end2 = 0
data1 =
data2 =
i < 6?
j < 5?
end1 = j
end2 = j
endscore =
[abs(end1 end2)]/1600
> = 10?
endscore = 0
endscore = 100 -endscore * 10
Save to History File
Figure 4.12: Flowchart: The Scoring of End Location
Maximum Magnitude
Another important factor in speech is the change in over-all amplitude of a
sound over the course of its duration. The shape of this macroscopic over-all change
in amplitude is termed as the amplitude envelope. The amplitude envelope indicates
the general evolution of the loudness of the sound over time [87].
A measure of the maximum magnitude is made as an attempt to determine
how the maximum of client compares to that of the SLP. The maximum magnitudes
of the clients’ and SLP’ speech signals, corresponding to AMP, were determined and
compared. Figure 4.13 shows that the maximum magnitude is determined where a
total of 15 neighbouring samples are summed to obtain a maximum value [77].
Calculation of the average magnitude requires the calculation of 600 average
magnitude values corresponding to the 600 frames in 6 seconds of speech data. The
maximum magnitude of SLP and client are identified as max1 and max2 respectively.
A score of 100% is assigned if the difference between the client’s and SLP’s
maximum magnitude is less than 4000. The score is reduced by 10% for each
additional 4000 that the maximum values differ [8]. The algorithms are shown in
Figure 4.13.
Duration is defined as the period between the start and end locations. The
duration is compared between the client’s AMP and SLP’s AMP. Referring to
Figure 4.14, a score of 100% is assigned if the client’s duration differs from that of
the SLP by less than 100 ms. The SLP’s and client’s duration are identified as dur1
and dur2 respectively. The score is reduced by 10% for each additional 100 ms that
both locations differ. In other words, whenever the difference between two locations
is 100ms, 10% is deducted from the scores. If the difference is 1100ms or more, a
score of 0% is assigned [8].
int i,
long temp1, temp2, max1, max2;
temp1 = 0
temp2 = 0
temp1 = Mag1 + temp1
temp2 = Mag2 + temp2
i < 15?
max1 = temp1
max2 = temp2
i = 15?
temp1 = temp1 + Mag1
temp1 = temp1 - Mag1[i - 15]
temp2 = temp2 + Mag2
temp2 = temp2 - Mag1[i - 15]
temp1 > max1?
temp2 > max2?
max1 = temp1
max2 = temp2
i < 600?
max1 = i
max2 = i
maxscore =
[abs(max1 max2)]/4000
> = 10?
maxscore = 0
maxscore = 100 - maxscore * 10
Save to History File
Figure 4.13: Flowchart: The Scoring of Maximum Magnitude Comparison
int start1, start2, end1,
end2, dur1, dur2;
dur1 = end1 - start1
dur2 = end2 - start2
durscore =
[abs(dur1 dur2)]/1600
> = 10?
durscore = 0
durscore = 100 -durscore * 10
Save to History File
Figure 4.14: Flowchart: The Scoring of Duration Comparison
In this chapter, the hardware and software requirement for a computer-based
stuttering assessment system are discussed. The application is developed in
Microsoft Visual C++ 6.0 to run under Window XP. The coding steps are elaborated
with the aid of C++ algorithms and dialog box displays to provide a better
understanding of the assessment system. Based on these design approaches, a
computer-based Malay stuttering assessment system has been tested and verified in
the clinical trial as elaborated in the next chapter.
The previous chapter demonstrates the procedures involved in developing the
computer-based Malay stuttering assessment system. Clinical trials on control data
and test subjects have also been carried out to make sure that the system runs
properly before its practicality was verified. This chapter outlines the
implementation and verifications of computer-based Malay stuttering assessment
system at the primary schools and clinic in Johor Bahru. Section 5.2 describes in
details the prerequisites of the experiments before the system was verified on the test
subjects. Section 5.3 introduces the assessment procedures. Data collection is
elaborated in Section 5.4. The result analyses of software and SLP are described and
compared in Section 5.5.
Implementing Clinical Trials among School-age Children
Generally, a clinical trial is a research study designed to answer specific
questions about vaccines or new therapies or new ways of using known treatments.
Clinical trials (also known as medical research or clinical research) are carried out to
determine whether the developed stuttering assessment system is both safe and
effective in assisting SLP to determine appropriate therapy technique for each client
The goal of carrying out clinical trial is to collect speech utterances of schoolage children in primary schools. Control data means different things in different
designs — Moscicki regards this as coming from non-treated individuals as she
considered randomized control designs [92]. Research on stuttering has been carried
out for decades, and since the objective has been to be able to tell the difference (if
there is any) between stuttered and normal speech, stuttering research has often
included non-stuttered speech as control groups [93].
Research [94] suggests that control group is essential in such cases so that the
treatment group would have to 'beat the odds' to demonstrate convincing treatment
effects. Control data were selected among university students where participants in
the control group were regarded by themselves and by the SLP as normally fluent.
Their conversational and reading speech samples had to contain fewer than 2 %SS.
Test Subjects
Since stuttering mostly appears at an early age, most studies on stuttering
have been on children. The clinical trial was located within a qualitative, small
group research design, which incorporated speech recording with 11 CWS.
Test subjects were selected from 6 primary schools located in the Skudai,
Johor. They are assessed at the Speech Therapy Unit, Hospital Sultanah Aminah,
Johor Bahru. A total of 11 subjects participated, 10 males and 1 female. 10 of the
test subjects had been diagnosed by SLP as having developmental stuttering and had
been stuttering for at least six months. One of whom have been omitted from
analysis because the subject in question was identified to be not a stuttering client.
The age span was between 8 and 12 years old. The subjects were not familiar with
speech technology in any way.
Test subjects were selected according to the following criteria:
Diagnosis of stuttering. Subjects were required to have been given a
diagnosis of stuttering by a certified SLP in Hospital Sultanah Aminah,
following a formal assessment.
Language. Subjects were required to be fluent in Malay language in order to
minimize the problems caused by language inabilities, as the speech
recording was conducted in Malay language.
Age. As this study focused on the perspectives of children, school-age
children are preferable.
Subject numbers were not based on power calculations, because, as is often
the case with a low incidence disorder such as stuttering, requisite numbers of
clinical subjects for adequate power were prohibitive in terms of feasibility for an
initial study [95].
Experimental Set-Up
All subjects were recorded audibly by our executable software and both
audibly and visually by video camera. The subjects wore lapel microphone
positioned approximately 20 cm from the mouth. The sound level was set at a
comfortable level for listening over earphones and the level was checked to ensure
that it remained constant. The test was presented on a notebook positioned at a
comfortable reading level for each subject.
The speech samples for this experiment were video-recorded using a
Panasonic MiniDV Digital Video Camera NV-DS25. MiniDV was used as medium
for video camera. The setting for the subject included a plain background where the
camera was positioned to only show a close-up of the subjects’ face and hands.
Video samples contained any form of secondary coping behaviours such as facial
grimaces, head turns, eye closure and so on. It was posited that an audiovisual
component in conjunction with the speech sample would approximate a more
realistic face-to-face listening experience. The speech was recorded in DAT format
and transferred digitally to computer for further processing.
The software, Final Cut Pro HD 4.5 was used to transfer video data from
video camera to computer using a fire wire. Setting up premiere for use with a DV
camcorder requires computer to have an IEEE-1394 (also known as FireWire or
iLink) interface installed [96].
The clinical trial was conducted in normal room environment. The subjects
were tested in a quiet setting where one session requires approximately 5–10 minutes.
Subjects were told that the recording was for testing of new software instead of
telling them it was an evaluation program. This is because it has been recommended
that speech measures be collected without clients’ knowledge that their speech is
being evaluated, so that they do not react to being assessed and try to create a
favourable outcome [6].
In addition, all the subjects were informally screened for the presence of any
speech or language problem by a qualified SLP at the time of the fluency assessment.
Stuttering was diagnosed by the SLP based on frequency of %SS (>5%) and/or the
presence of significant speech-related struggle behaviour. At the beginning of the
recording session, subjects were given a short practice for each task using sentences
similar to the ones used during the actual experiment. A few sets of speech
utterances have been recorded before the experiment because subjects were varied in
ages and language level.
The subjects were given three tasks each. Each task contains 5 sentences.
They were to speak into microphone. Three stuttering therapy techniques
(Shadowing, Metronome and DAF) were implemented in computer-based method.
Subjects can repeat listening and/or recording for each sentence as many times as
90 Shadowing Task
In shadowing task, at fist, subject listens to SLP's pre-recorded wave files and
he or she is required to repeat (shadow) everything the SLP reads as shown in Figure
5.1. The SLP’s pre-recorded is first drawn in red colour followed by client’s
amplitude in blue colour. Client's amplitude is displayed as it is spoken and it is
superimposed on the SLP’s amplitude. The display of amplitudes is intended to
convey to the client those locations where the client's utterance differed from the
SLP’s in the aspect of start and end alignment, amplitude and duration. The dialog
box will show the word “Shadowing” to indicate that client is using this therapy
Figure 5.1: Shadowing Task Metronome Task
In Metronome task, similarly, the subject listens to SLP's pre-recorded wave
files. Then, he or she is told to pace his or her speech while reading aloud sentences
with the beats of a metronome, which is one word per beat. The sentences are shown
on screen with comfortable font size for subjects to read as shown in Figure 5.2.
Figure 5.2: Metronome Task DAF Task
In DAF task, the subject listens to SLP's pre-recorded wave files. He or she
talks into the microphone and his or her speech is recorded and played back through
earphones at 250 milliseconds of delay [97]. The sentences are shown on screen
with comfortable font size for subjects to read as shown in Figure 5.3.
Figure 5.3: DAF Task
Assessment Procedures
This software is designed for stuttering clients of any age especially CWS
because stuttering should be treated in the early years, primarily because it becomes
less tractable as children get older. It is an easy-to-use software program and most
users will easily navigate the software in the first or second session. Assessment
session consists of the following steps:
First, the software prompts subject to keep silent for 5 seconds in order for
the software to detect background noise level as shown in Figure 5.4. The detection
of background noise level is important for scoring algorithms. Background noise
level may vary with each subject's recording venue, time and condition. After 5
seconds, the software will signal subject for the end of background noise level
detection as shown in Figure 5.5.
Figure 5.4: Detection of Background Noise Level
Figure 5.5: End Detection of Background Noise Level
Next, subject is required to select five wave files to be loaded into system as
displayed in Figure 5.6. Normally, client visits SLP for an initial evaluation of the
speech fluency where SLP assesses the client’s speech pattern and assigns a set of
Figure 5.6: Selection of Five Pre-recorded Wave Files
After loading the wave files, subject is required to load the text file used for
displaying the sentences during the recording of technique Metronome and technique
DAF as shown in Figure 5.7.
Figure 5.8 and Figure 5.9 indicates the client identification process where
subject is required to key in his or her user name and create a personal history file.
Subject can choose the desirable location to save the file.
Figure 5.7: Selection of Text File
Figure 5.8: Input of User Name
Figure 5.9: Input of History File and Its Location
Finally, the recording and playback session can be started where buttons are
enabled as shown in Figure 5.10. Subject is guided from technique shadowing to
technique DAF. Button Testing is clicked for subject to listen to pre-recorded SLP's
wave file. At the same time, SLP's AMP is shown in red line on screen as displayed
in Figure 5.11. Then, subject can start recording by clicking button Record.
Referring to Figure 5.12, subject's AMP is shown in blue colour.
Figure 5.10: The Enabling of Buttons
Figure 5.11: The AMP of SLP
Figure 5.12: The AMP of Client superimposed on SLP's AMP
After completing all the three therapy techniques, the recorded wave files can
be saved by using the Save Wave function. The Save As dialog box prompts subject
to choose the directory path for saving the speex file as shown in Figure 5.13.
Figure 5.13: The File Saving of Recorded Utterances
During the assessment session, SLP may need to refer to the client's past
recording in order to evaluate the progress from time to time and to assist SLP in
determining a new set of utterances for that particular client. This can be done by the
Play Wave function where it enables both the SLP and client's wave files to be
displayed on the same axis at anytime as displayed in Figure 5.14. Figure 5.15
indicated the fireworks display whenever a particular subject has achieved a scoring
of 80 or higher. Meanwhile, applause is heard.
Figure 5.14: The File Playing of Both SLP and Subject's Utterances
Figure 5.15: The Display of Fireworks
Data Collection
The data collection process began by engaging each subject in a short
conversation regarding their favourite sports or interests to obtain data on subject’s
language level so that the author knows precisely which set of sentences to be used
for that subject’s oral-reading task. At the beginning of the recording session,
subjects were given a short practice for each sentence before the actual recording.
There are two types of data collection. First, SLP observes the subject’s
behaviour under conditions that were purposefully created, or structured. That is, the
client is asked to do three oral-reading tasks (Shadowing, Metronome, DAF) and
observes certain aspects of his or her behaviour while it is being done. The aspects
are body language, physiological functioning, something he or she has written, or
some combination of these.
Second, the SLP also observes the aspects of a subject’s behaviour under
conditions that were not created (structured). In this situation, he or she does not
attempt to manipulate the behaviour. With the first, the SLP does attempt to do so by
having the subject perform a task. With this, SLP observes the subject while the
subject is having a conversation with author.
Each set of speech samples contained three different speaking tasks
equivalent to the shadowing, metronome and DAF. The duration of each sentence
was approximately 6 s. Short conversation was included for each subject before
going through three therapy techniques.
Of the data collected, a total of 356 subject utterances have been analyzed in
the present work including the control data and test subject data. In time, this
amounts to 2136 seconds or 35.6 minutes of speech. Control data are totalled to 450
seconds or 7.5 minutes with 75 utterances. On the other hand, there are 1686
seconds of test subject data which are equivalent to 28.1 minutes or 281 utterances.
The speech samples were presented to the SLP in both the audiovisual and audioonly mode. SLP judged each sample individually.
Results or Quantitative Analyses
Evaluation is referred to tests used to measure a person's level of
development or to identify a possible disease or disorder. Efficacy has been defined
as the extent to which a specific intervention, procedure, regimen or service produces
a beneficial result under ideally controlled conditions when administered or
monitored by experts [98].
Sampling and satisfactory analysis of speech measures is essential for
reliability and consistency. Although it is often recommended that a range of
samples from different contexts be obtained, this can be impractical at times. It is
also important to remember that a smaller amount of data analysed properly is worth
more than a lot of data analyzed badly. There are different methods of clinical
stuttering assessment [44]:
Amount of stuttering. Percentage of SS & stuttered words per minute
(SW/M). In some cases, an increase in these measures may indicate progress
in therapy.
Quality and characteristics of stuttering. Accessory behaviours during
stuttering; amount and loci of tension.
Self-reported communication ease. Considers tension and emotions related to
stuttering from the client’s perspective.
Increased communication participation, spontaneity, and risk-taking in
speech. Raising hand in class, using telephone regularly, increased
socialization with peers, increased participation in social groups, increased
self-esteem and confidence in speaking, dating, no longer avoiding certain
Natural affect. Increased eye contact, more relaxed body postures, increased
speech naturalness, effective nonverbal communication skills and gestures.
It was discovered that some subjects tend to utter sentences without stuttering
but they stuttered while speaking to author during the test.
To what extent do the frequencies of these disfluencies (hesitation) types
allow PWS to be distinguished from normal speakers? Representative data on the
frequency of occurrence of these behaviours in the spontaneous speech of ten test
subjects are presented in Table 5.1. For example, subject A is diagnosed as having
stuttering problem with the following characteristics: part-word repetition,
prolongations, broken words, tense pauses, incomplete phrases and abnormal
speaking rate.
What appears to distinguish CWS from those who do not is the increased
frequency of repetitions of words, phrases, and syllables and to a lesser extent
prolonged sounds and broken words. It was also observed that the stuttering children
had a greater number of repetition units per disfluency. The frequency of
occurrences of stuttering is categorized in respective stuttering characteristics. These
results suggest that stuttering is not a dichotomy but is rather a continuous scale
where children who display an increased frequency of certain types of disfluencies
are considered to be stuttering.
Table 5.1: Occurrence Frequency of Stuttering Behaviours in Test Subjects
Single &
Word Repetitions
Prolongations Broken
Stuttering Characteristics
Tense Incomplete Interjections Revisions Abnormal Abnormal
Pauses Phrases
Speaking Loudness/
Rate Pitch Level
Table 5.2 indicated the range and quartile distribution of the frequency
indices of disfluencies for each of the ten stuttering characteristics. The lowest index
is the frequency of disfluencies per total words spoken by the subject who had the
fewest occurrences; the highest index is the frequency found in the subjects who had
the most occurrences. Q2 (the median) is the frequency exceeded by 50 percent of
the subjects; Q1 is the frequency exceeded by 75 percents of the subjects; and Q3 is
the frequency exceeded by 25 percent of the subjects.
Table 5.2: The Range and Quartile Distribution of the Frequency Indices for
Stuttering Characteristics
Stuttering Characteristic Categories
Single & Multisyllabic Word Repetitions
Broken Words
Tense Pauses
Incomplete Phrases
Abnormal Speaking Rate
Abnormal Loudness or Pitch Level
Part-word Repetitions
Results Generated by Software
As stated in previous chapters, the scoring displayed in the text file or
personal history file can be used by the SLP to determine suitable stuttering therapy
techniques for each client. Table 5.3 and Table 5.4 display the scoring generated by
software for test subject and control data respectively. Figure 5.16 shows the
utterances attempted, date and time of attempt and attempt scores for start of speech,
end of speech, maximum magnitude and duration.
Table 5.3: Software Scoring for Test Subjects
Scoring (%)
Technique Technique Technique
Shadowing Metronome
Table 5.4: Software Scoring for Control Data
Scoring (%)
Technique Technique Technique
Shadowing Metronome
Late Start
Early End
Too Low
Too Loud
Figure 5.16: The Display of the Information of Attempted Utterances
Figure 5.17 shows the scoring comparison between the normal control
subjects and "Late Start" subjects. The blue bars indicate the average scoring of start
location parameter generated by normal control subjects while the red bars show the
average scoring of start location parameter achieved by the subjects who tend to start
their speech late than the supposed start location. As illustrated in Chapter 4, a score
of 100% is assigned for start location parameter if the subject can align his or her
utterance within 100 ms of the SLP’s utterance. The score is reduced by 10% for
each additional 100 ms that the both locations differ.
Scoring Comparison
Late Start
Therapy Techniques
Figure 5.17: The Scoring Comparison for Start Location Parameter
Figure 5.18 shows the scoring comparison between the normal control
subjects and "Early End" subjects. The blue bars indicate the average scoring of end
location parameter generated by normal control subjects while the red bars show the
average scoring of end location parameter achieved by the subjects who tend to end
their speech earlier than the supposed end location. As illustrated in Chapter 4, a
score of 100% is assigned for end location parameter if the subject can align his or
her utterance within 100 ms of the SLP’s utterance. The score is reduced by 10% for
each additional 100 ms that the both locations differ.
Scoring Comparison
Early End
Therapy Techniques
Figure 5.18: The Scoring Comparison for End Location Parameter
Figure 5.19 shows the scoring comparison between the normal control
subjects and "Too Low Amplitude" subjects. The blue bars indicate the average
scoring of maximum magnitude generated by normal control subjects while the red
bars show the average scoring of maximum magnitude achieved by the subjects who
tend to speak in an extreme low sound level. As illustrated in Chapter 4, a score of
100% is assigned if the difference between the client’s and SLP’s maximum
magnitude is less than 4000. The score is reduced by 10% for each additional 4000
that the maximum values differ.
Scoring Comparison
Too Low
Therapy Techniques
Figure 5.19: The Scoring Comparison for Maximum Magnitude Location Parameter
Figure 5.20 shows the scoring comparisons between the normal control
subjects and "Too High Amplitude" subjects. The blue bars indicate the average
scoring of maximum magnitude generated by normal control subjects while the red
bars show the average scoring of maximum magnitude achieved by the subjects who
tend to speak in an extreme high sound level. As illustrated in Chapter 4, a score of
100% is assigned if the difference between the client’s and SLP’s maximum
magnitude is less than 4000. The score is reduced by 10% for each additional 4000
that the maximum values differ.
Scoring Comparison
Too Loud
Therapy Techniques
Figure 5.20: The Scoring Comparison for Maximum Magnitude Location Parameter
Referring to Figure 5.21, among the three therapy techniques, in average, test
subjects achieved the highest scores in technique Metronome, which is 75%. The
same scores are achieved for both therapy techniques Shadowing and DAF.
Average Score of Test Subjects
Average Score
Therapy techniques
Figure 5.21: The Average Score of Each Therapy Technique
Results Analysis by SLP
Result analyses were made in speech therapy room located at level 3,
Polyclinic in Hospital Sultanah Aminah by SLP.
The speech sample was presented to the SLP in both the audiovisual and
audio-only mode. SLP judged each sample individually. Each test subject requires
about 5 – 10 minutes for recording. Any part-word repetition, prolongation, or block
was considered a stuttering episode. The %SS does not include counts of normal
disfluencies. Measures of %SS were made by an SLP with 4 years experience in
treating and measuring stuttering, who was independent of the study and had no
knowledge of the participants. The SLP knew the topic of the research but not its
%SS is simply the calculation of the total number of syllables containing
unambiguous stuttering of any type divided by the total number of syllables assessed,
and then multiplied by 100 to obtain a percentage. It is efficient because with
relatively little practice it is possible to reliably count the frequency of breaks during
both reading and conversational speech. Counts can be obtained by shadowing the
syllable production of the subject and indicating those syllables on which stuttering
occurs. Stuttered syllables can be indicated with a keyboard or by hand by marking
dots and dashes for fluent and stuttered syllables, respectively.
Although the datum %SS is commonly referred to as a stuttering rate or
stuttering frequency measure, in the arithmetical sense it is a proportion. Hence,
%SS scores will always be a positive number between zero and 100.
Identification of stuttering was based on the stuttering taxonomy of [98].
Words were coded as stuttered if they contained any type of repeated movement
(whole syllable repetitions, incomplete syllable repetitions, or multisyllable unit
repetitions) or any type of fixed articulatory posture (with or without audible airflow).
Each word was coded as stuttered only once, regardless of the number of different
types of stuttering present within the word. Interjections such as ‘‘ah’’ or ‘‘um’’
were not counted or analyzed.
Based on [24], measurement of %SS for three techniques were made by SLP
as shown in Table 5.5. From Table 5.5, it was diagnosed that two subjects (Subjects
G and H) were identified as having mild stuttering, two subjects (Subjects C and E)
were identified as having mild-to-moderate stuttering, three subjects (Subjects B, D,
and F) as moderate, two subjects (Subject A and I) as moderate-to-severe, and one
subject (Subject J) was identified as having severe stuttering. Out of ten subjects,
only one subject had a history of receiving traditional speech therapy through his
school system. The measure of %SS verified the scoring generated by the software
as discussed in Section 5.5.3.
Table 5.5: %SS for Each Therapy Technique
Percentage of Stuttered Syllables (%SS)
Shadowing Metronome
None of the subjects reported having had any experience with DAF during
previous speech therapy. In contrast, the fluent speech produced using DAF in
experiments has been evaluated by SLP to sound natural.
Comparison between Software and SLP Analyses
Figure 5.22 shows the comparisons between the results generated by
developed software and results from SLP's analyses. The data indicated that SLP has
agreed on all the therapy techniques determined by the scoring generated by the
software. The accuracy of the software is identified as 100%. This software has
demonstrated great potential to aid SLP in determining suitable therapy technique for
each of the PWS.
For example, for Subject A, software scoring suggested that SLP should use
technique DAF in the future speech therapy for Subject A because Subject A
indicated the least stuttering while using technique DAF. By choosing the right
therapy techniques, less therapy session will be required and the subject may recover
in shorter time. During the assessment, SLP also agreed on the use of technique
DAF for Subject A as Subject A performed the least %SS while using this technique.
For Subject J, software scoring has shown that he scored the highest marks in
technique Shadowing during the clinical trials. SLP listened to his speech recordings
and manually made the %SS calculation. As shown in Table 5.5, the data indicated
that %SS is the lowest in technique Shadowing which gave the same result as
software scoring. The same method was used to compare the results between
software scoring and SLP's manual calculation of %SS.
Table 5.6: Comparison between the Determination of Therapy Technique for Each
Test Subject by Software and SLP
Technique Technique Technique Technique Technique Technique
Shadowing Metronome
Shadowing Metronome
System effectiveness is the extent to which a system or software employed in
the field does what it is intended to do for a specific population [99]. Based on the
software and SLP analyses, the following are the common observation:
Certain sounds are more likely to be stuttered than other sounds, mainly
consonants, but with wide individual variations as to what particular sounds
are problematic, although word-initial sounds are often a major determinant.
Certain parts of speech are more likely to be stuttered than other parts of
speech, videlicet, adjectives, nouns, adverbs and verbs such as words
belonging to open word classes.
The position of a word in a sentence affects the degree of difficulty it presents
to the client, the first three words of a sentence being stuttered more often
than words occurring later.
Longer words seem to be stuttered more often than shorter words.
Description of Individual Test Subjects
The ten test subjects’ description is tabulated as in Table 5.6. Each subject is
elaborated on his/her gender, age, stuttering severity, software scoring, %SS and the
suitable therapy technique. Each subject responds in unique ways to different
therapy technique. In some cases a particular approach won, and in another
investigation another method finished first. The uniqueness of each individual PWS
prevents any specific recommendations of therapy techniques from being universally
Table 5.7: Description of Individual Test Subjects
Scoring %SS
Scoring %SS
Scoring %SS
14.86% Shadowing
11.54% Shadowing
11.90% Metronome
13.51% Metronome
11.54% Shadowing
35.87% Shadowing
This chapter describes the implementation and verification of the clinical trial
for our computer-based Malay stuttering assessment system. The assessment system
has been tested and verified successfully in the clinical trial. Test subjects were
selected from 6 primary schools located in the Skudai, Johor. They are assessed at
the Speech Therapy Unit, Hospital Sultanah Aminah, Johor Bahru. A total of 11
subjects participated, 10 males and 1 female. Results generated by software and the
SLP's analyses are detailed in Section 5.5.1 and Section 5.5.2 respectively with the
demonstration of tables and graphs. The comparisons between both methods are
analysed in Section 5.5.3. The data indicated that SLP has agreed on all the therapy
techniques determined by the scoring generated by the software. This software has
demonstrated great potential to aid SLP in determining suitable therapy technique for
each of the PWS.
The thesis outlines the works on developing and implementing the computerbased Malay stuttering assessment system. The assessment system introduces
stuttering therapy techniques in computer-based. The assessment system generates
scoring to assist SLP in determining suitable therapy techniques for each client in a
faster way.
The computer-based stuttering therapy techniques developed in this project
consists of Shadowing, Metronome and DAF. Hardware includes microphone,
earphones and desktop equipped with sound card, while software means OS, the
development tools and the necessary drivers. Window XP OS was chosen due to its
availability and familiarity. Software development involved during the
implementation of the computer-based assessment system has been described in
details in Chapter four. In addition, a clinical trial has been carried out successfully
to verify the operation of the computer-based stuttering assessment system.
The software was developed based on standard FS techniques used in fluency
rehabilitation regimen. DSP techniques were implemented to analyze speech signals.
The maximum magnitudes of the clients’ and SLPs’ speech signals, corresponding to
the AMPs, were determined and compared. The maximum magnitude was
determined where a total of 15 neighbouring samples are summed to obtain a
maximum value. Start location, end location, maximum magnitude and duration
were compared between clients’ and SLPs’ AMPs to generate scoring, the
computational analyses help SLP to determine suitable techniques in a faster way.
Clinical trials on control data and test subjects have also been carried out to
make sure that the system runs properly before its practicality was verified. Control
data were selected among university students where participants in the control group
were regarded by themselves and by the SLP as normally fluent. Test subjects were
selected from 6 primary schools located in the Skudai, Johor. The age span was
between 8 and 12 years old. The subjects were not familiar with speech technology
in any way.
Of the data collected, a total of 356 subject utterances have been analyzed in
the present work including the control data and test subject data. In time, this
amounts to 2136 seconds or 35.6 minutes of speech. Control data are totalled to 450
seconds or 7.5 minutes with 75 utterances. On the other hand, there are 1686
seconds of test subject data which are equivalent to 28.1 minutes or 281 utterances.
Software scoring was compared with SLP's calculated %SS. The data
indicated that SLP has agreed on all the therapy techniques determined by the
scoring generated by the software. The accuracy of the software is identified as
100%. This software has demonstrated great potential to aid SLP in determining
suitable therapy technique for each of the PWS.
Our hope is that researchers from different fields will join forces together in
order to advance our knowledge of this disorder and its treatment. Without this
approach, this progress will be as slow as in the last several decades. We believe this
software tool will improve the effectiveness and availability of stuttering assessment.
We hope that our software tool will provide insights into the implementation of
computer-based Malay stuttering assessment system in Malaysia.
Future Works
This thesis has concentrated on the development and verification of the
computer-based Malay stuttering assessment system. This thesis has been written,
hoping to stimulate some stuttering assessment ideas that can be incorporated into
current best practices. The existing work can be further improved and enhanced.
Several suggestions for future works include:
To introduce more stuttering therapy techniques in computer-based so
that clients are given exposure to more therapy techniques and thus
increase the accuracy in determining suitable technique for each client.
Undoubtedly, future research similar in design to this study but
drawing on different samples, will find additional guideposts that will
also prove useful to SLP in selecting suitable therapy techniques for
particular client.
To re-examined the current outcomes with larger and diverse samples
and that the developed topic will be investigated through innovative
methodologies in efforts to inform treatment directions in stuttering.
Due to the well-documented variability of stuttering within subjects,
speech samples should ideally be obtained under multiple conditions
and on multiple occasions. This can be particularly important for
young children, as stuttering has been reported to fluctuate greatly
over time and sometimes cease entirely. Long-term assessments are
To identify specific therapy procedures used in the stuttering therapy
techniques that contribute the most to successful treatment outcomes
as well as variables that are responsible for treatment failures based on
the scoring generated by software. PWS deserve nothing less than
rigorously tested and empirically supported treatments. Therefore,
future research needs to identify new critical variables for study if
sounder therapy techniques efficacy or evaluations are to become
Except finding the suitable therapy techniques for each client (“Which
treatment is the best?”), it can be improved to identify and match the
characteristics of a client with the competencies and therapeutic
philosophy of a SLP (or mentor) in order to promote a working
alliance that is likely to result in a successful therapeutic outcome.
Further development of the animation engine that changes animations
to follow client improvement.
Modification of software for other patient populations, such as
Multiple Sclerosis or hearing impaired.
Wave files have a master RIFF chunk which includes a WAVE identifier
followed by sub-chunks. The data is stored in little-endian byte order.
Chunk ID: "RIFF"
Chunk size: 4+n
Wave chunks containing
format information and
sampled data
Format Chunk
The Format chunk specifies the format of the data. There are 3 variants of
the Format chunk for sampled data. These differ in the extensions to the basic
Format chunk.
Chunk ID: "fmt"
Chunk size: 16 or 18 or 40
Format code
Data rate
Data block size (bytes)
Bits per sample
Number of valid bits
Speaker position mask
Number of interleaved
Sampling rate (blocks per
Size of the extension (0 or
GUID, including the data
format code
The standard format codes for waveform data are given below. The
references above give many more format codes for compressed data, a good fraction
of which are now obsolete.
PreProcessor Symbol
8-bit ITU-T G.711
8-bit ITU-T G.711
Determined by SubFormat
PCM Format
The first part of the Format chunk is used to describe PCM data.
For PCM data, the Format chunk in the header declares the number of
bits/sample in each sample (wBitsPerSample). The number of bits per
sample is to be rounded up to the next multiple of 8 bits. This rounded-up
value is the container size. This information is redundant in that the container
size (in bytes) for each sample can also be determined from the block size
divided by the number of channels (nBlockAlign / nChannels).
This redundancy has been appropriated to define new formats. For
instance, Cool Edit uses a format which declares a sample size of 24
bits together with a container size of 4 bytes (32 bits) determined from
the block size and number of channels. With this combination, the
data is actually stored as 32-bit IEEE floats. The normalization (full
scale 223) is however different from the standard float format.
PCM data is two's-complement except for resolutions of 1-8 bits, which are
represented as offset binary.
Non-PCM Formats
An extended Format chunk is used for non-PCM data. The cbSize field gives
the size of the extension.
For all formats other than PCM, the Format chunk must have an extended
portion. The extension can be of zero length, but the size field (with value 0)
must be present.
For float data, full scale is 1. The bits/sample would normally be 32 or 64.
For the log-PCM formats (µ-law and A-law), the bits/sample field
(wBitsPerSample) should be set to 8 bits.
The non-PCM formats must have a Fact chunk.
Extensible Format
The WAVE_FORMAT_EXTENSIBLE format code indicates that there is an
extension to the Format chunk. The extension has one field which declares the
number of "valid" bits/sample (wValidBitsPerSample). Another field
(dwChannelMask) contains a bit which indicate the mapping from channels to
loudspeaker positions. The last field (Sub-Format) is a 16-byte globally unique
identifier (GUID).
With the WAVE_FORMAT_EXTENSIBLE format, the original bits/sample
field (wBitsPerSample) must match the container size (8 * nBlockAlign /
nChannels). This means that wBitsPerSample must be a multiple of 8.
Reduced precision within the container size is now specified by
The number of valid bits (wValidBitsPerSample) is informational only. The
data is correctly represented in the precision of the container size. The
number of valid bits can be any value from 1 to the container size in bits.
The loudspeaker position mask uses 18 bits, each bit corresponding to a
speaker position (Front Left or Top Back Right), to indicate the channel to
speaker mapping. This field is informational. An all-zero field indicates that
channels are mapped to outputs in order: first channel to first output, second
channel to second output, etc.
The first two bytes of the GUID form the sub-code specifying the data format
code, for example, WAVE_FORMAT_PCM. The remaining 14 bytes
contain a fixed string,
The WAVE_FORMAT_EXTENSIBLE format should be used whenever:
PCM data has more than 16 bits/sample.
The number of channels is more than 2.
The actual number of bits/sample is not equal to the container size.
The mapping from channels to speakers needs to be specified.
Fact Chunk
All (compressed) non-PCM formats must have a Fact chunk. The chunk
contains at least one value, the number of samples in the file.
Chunk ID: "fact"
Chunk size: minimum 4
Number of samples (per
The Fact chunk "is required for all new WAVE formats", but "is not
required for the standard WAVE_FORMAT_PCM files". One presumes that
files with IEEE float data need a Fact chunk.
The number of samples field is redundant for sampled data, since the Data
chunk indicates the length of the data. The number of samples can be
determined from the length of the data and the container size as determined
from the Format chunk.
This is an ambiguity as to the meaning of "number of samples" for multichannel data. It should be interpreted to be "number of samples per channel".
The statement is:
"The <nSamplesPerSec> field from the wave format header is used in
conjunction with the <dwSampleLength> field to determine the length of the
data in seconds."
With no mention of the number of channels in this computation, this implies
that dwSampleLength is the number of samples per channel.
There is a question as to whether the Fact chunk should be used for
(including those with PCM) WAVE_FORMAT_EXTENSIBLE files. One
example of a WAVE_FORMAT_EXTENSIBLE with PCM data from
Microsoft, does not have a Fact chunk.
Data Chunk
The Data chunk contains the sampled data.
Chunk ID: "data"
Chunk size: n
sampled data
pad byte
0 or 1
Padding byte if n is odd
PCM Data
Chunk ID: "RIFF"
Chunk size: 4 + 24 +
(8 + M * Nc * Ns + (0 or 1))
Chunk ID: "fmt "
Chunk size: 16
F * M * Nc
M * Nc
rounds up to 8 * M
Chunk ID: "data"
Chunk size: M * Nc* Ns
sampled data
M * Nc * Ns
0 or 1
Nc * Ns channel-interleaved Mbyte samples
Padding byte if M * Nc * Ns is
Non-PCM Data
Chunk ID: "RIFF"
Chunk size: 4 + 26 + 12 +
(8 + M * Nc * Ns + (0 or 1))
Chunk ID: "fmt "
Chunk size: 18
Format code
F * M * Nc
M * Nc
Size of the extension:0
Chunk ID: "fact"
Chunk size: 4
Nc * Ns
Chunk ID: "data"
Chunk size: M * Nc * Ns
sampled data
8 * M (float data) or 16 (logPCM data)
M * Nc * Nc * Ns channel-interleaved
0 or 1
M-byte samples
Padding byte if M * Nc *
Nsis odd
Extensible Format
Chunk ID: "RIFF"
Chunk size: 4 + 48 + 12 +
(8 + M * Nc * Ns + (0 or 1))
Chunk ID: "fmt "
Chunk size: 40
F * M * Nc
M * Nc
Size of the extension: 22
at most 8 * M
Speaker position mask 0
GUID (first two bytes are the data
format code)
Chunk ID: "fact"
Chunk size: 4
Nc * Ns
Chunk ID: "data"
Chunk size: M * Nc * Ns
sampled data
M * Nc Nc * Ns channel-interleaved M-byte
* Ns
0 or 1
Padding byte if M * Nc * Ns is odd
The Fact chunk can be omitted if the sampled data is in PCM format.
Microsoft Windows Media Player enforces the use of the
WAVE_FORMAT_EXTENSIBLE format code. For instance a file with 24bit data declared as a standard WAVE_FORMAT_PCM format code will not
play, but a file with 24-bit data declared as a
WAVE_FORMAT_EXTENSIBLE file with a WAVE_FORMAT_PCM subcode can be played.
This section shows the coding development using Microsoft Visual C++ 6.0.
Audio File Format
Wave files use the standard Resource Interchange File Format (RIFF)
structure which groups the files contents such as sample format and digital audio
samples into separate chunks, each containing its own header and data bytes. RIFF
is a multimedia file format introduced by Microsoft and IBM in the early 1990s that
is structured in "chunks." The chunk header specifies the type and size of the chunk
data bytes. Wave files have a master RIFF chunk which includes a WAVE identifier
followed by sub-chunks. The data is stored in little-endian byte order.
WAVEFORMATEX format is initialized for 16-bit, 16KHz, mono channel
and Pulse Code Modulation (PCM). PCM is an audio format of “raw" audio. This is
generally the format that audio hardware directly interacts with. Though some
hardware can directly play other formats, generally the software must convert any
audio stream to PCM and then attempt to play it.
BlockAlign is used for buffer alignment where playback software needs to
process a multiple bytes of data at a time. The cbSize is the extra information
appended to the WAVEFORMATEX structure tightly. For PCM format, the cbSize
is ignored.
Waveform Display
There are 13 units for x-axis in milliseconds starting from 500ms and the
TextOut defines the starting point of the text for logical x-coordinate and ycoordinate. Three digits are allowed after decimal point. There are 10 units for yaxis starting from 5000. MoveTo moves the current position to the point specified by
x and y and LineTo draws a line from the current position up to, but not including the
point specified. The dc, device context is a window data structure containing
information about the drawing attribute of a device such as a display.
MFC CPen class is used to create a solid pen for drawing solid lines with one
pixel wide. The class encapsulates a windows graphics device interface (GDI) pen.
Line 1 represents the SLP’s speech signal where the amplitude is drawn in red colour
RGB(255, 0, 0) while the line 2 is the client’s amplitude drawn in blue colour RGB(0,
0, 255).
How does the driver "signal" the program? The driver send messages to the
program's Window, for example, the MM_WOM_DONE message is sent each time
the driver finishes playing a given buffer. Parameters with that message include the
address of the given buffer (actually the address of the WAVEHDR structure which
encompasses the buffer) and the device's handle (the handle supplied when the
device is opened).
WaveOut sends audio data to a standard Windows audio device in real time.
It is compatible with most popular Windows hardware. The data is sent to the
hardware in uncompressed PCM format, and should typically be sampled at one of
the standard Windows audio device rates: 8000, 11025, 22050, 16000 or 44100 Hz.
Since audio devices generate real-time audio output, software must maintain
a continuous flow of data to a device throughout simulation. Delays in passing data
to the audio hardware can result in hardware errors or distortion of the output. This
means that the process block in principle supplies data to the audio hardware as
quickly as the hardware reads the data. However, the waveOut often cannot match
the throughput rate of the audio hardware, especially when the simulation is running
within execution rather than as generated code. Execution speed can vary during the
simulation as the host operating system services other processes. WaveOut must
therefore rely on a buffering strategy to ensure that signal data is available to the
hardware on demand.
The codes construct a File Open dialog box for wave file. The software
checks if the wave file formats are correct and copies the characters to new buffer,
wavedata1 for playback. CFileDialog class encapsulates the windows common file
dialog box which provides an easy way to implement File Open and File Save As
dialog boxes in the application. Playing a wave file requires several steps. First, the
data must be ready to be played. Once the wave data is available, it is required to
open the wave device, prepare the wave header, and start playback of the wave data.
Wave output device is opened for playback by calling the waveOutOpen.
Before opening the device, though, it is a good idea to query the device to see
whether it supports the format of the wave data. The waveOutOpen checks to see
whether the device supports the given format, but it does not actually open the device.
If the wave format is supported by the device, waveOutOpen returns 0 and the
process move on to opening the device. If the wave format is not supported,
waveOutOpen returns an error code, usually WAVERR_BADFORMAT.
The address of a variable containing a handle to the wave output device is
passed for the first parameter where &hWaveOut is the pointer to buffer that receives
a handle identifying the open output device. If waveOutOpen is successful (returns
0), the handle is returned when calling subsequent wave output functions. The
second parameter is WAVE_MAPPER constant for use of the sound card's ID. It
selects the output device capable of playing the given format, which is “.wav”. This
constant tells Windows to select the sound card as the wave output device. The third
parameter, &waveform points to the WAVEFORMATEX structure where the format
of audio data to be sent to the device is stated. Finally, the CALLBACK_WINDOW
constant is passed for the final parameter. This constant tells Windows to send any
wave-out messages sent to the form's window procedure. The device is opened as
long as the wave format is supported.
The next step in playing a wave file is to prepare a wave header. The wave
header is an instance of the WAVEHDR structure, and includes information about the
wave buffer that contains the wave data. Specifically, it holds a pointer to the buffer
and the size of the buffer in bytes. After a wave header structure is created and
initialized, the waveOutPrepareHeader is called to assign the wave header to the
currently open wave device. The waveOutPrepareHeader prepares a waveform
audio data block for playback. WaveOutPrepareHeader() is used to initialize the
buffer before reading into it.
The wave handle, hWaveOut obtained from waveOutOpen is passed to
waveOutPrepareHeader. A pointer, pWaveHdr1 is passed to the WAVEHDR
structure and the final parameter is the size of the structure. At this point, the wave
output device is opened, the header has been prepared, and it is ready to play the data
in the buffer.
The waveOutWrite sends a data block to the output device with the size in
bytes. The buffer must be prepared with the waveOutPrepareHeader before it is
passed to waveOutWrite. Unless the device is paused by calling waveOutPause,
playback begins when the first data block is sent to the device. If waveOutWrite is
successful (returns 0), the wave file starts playing and control is immediately
returned to the application. The next step is to detect when the wave file has finished
playing so that the wave header can be cleaned up and the wave device is closed.
The waveOutWrite starts the wave playing and immediately returns control to
the application. This means that the application is fully operational while the wave
file is being played. For this reason, the MM_WOM_DONE message must be used to
determine when the file has finished playing.
After the wave data has completed playback, the header that is prepared
earlier must be unprepared and the wave output device is closed. The
waveOutUnprepareHeader cleans up the preparation made by
waveOutPrepareHeader. As always, the variable of wave handle is passed as the
first parameter. The second and third parameters are identical to those passed in
waveOutPrepareHeader earlier.
Finally, the waveOutClose closes the wave device. If the wave device is not
closed, it will be unavailable to other applications or to Windows. Wave file is
always played asynchronously. The waveOutClose closes the output device. Buffers
are stopped sending to the sound driver. API will simply cancel all pending buffers
and send them back to the application. If the device is still playing a waveformaudio file, the close operation fails. In other words, waveOutClose will fail if there
are pending buffers in the driver.
The PlaySound plays a sound specified by given filename, “beat.wav” where
a beat is played every second for the second therapy technique, Metronome. The
nword is defined as 1 second. The “beat.wav” is played asynchronously and it
returns immediately after beginning the sound. PlaySound searches the following
directories for sound files. The sound specified by “beat.wav” must fit into available
physical memory and be playable by waveform-audio device driver. When a client
talks into a microphone, his or her speech is recorded and playback through
earphones at 250 milliseconds of delay for the third therapy technique, DAF.
The waveInPrepareHeader prepares buffer for input where hwavein is the
handle to the waveform audio oinput device and pWaveHdr is the pointer to the
WAVEHDR structure. It is a structure that defines the header used to identify
waveform audio buffer.
The waveInAddBuffer() is used to supply each buffer to the driver. The first 2
buffers are supplied to the driver using waveInAddBuffer() before recording. Every
time the program is signalled that a buffer is filled, waveInAddBuffer() is used to
indicate what buffer the driver will use after it finishes filling whatever buffer it is
currently filling. The waveInAddBuffer sends input buffer to the input device with
the size of WAVEHDR structure in bytes.
The waveInStart starts input on the input device with the handle to waveform
audio input device.
Compression and Decompression Using Speex
LPTSTR is a 32-bit pointer to a character string that is portable for Unicode.
The FindLastOf uses its member function to find the last record that matches the
STARTUPINFOR is a structure used with the CreateProcess to specify main
window properties when a new window of speex execution is created for the new
process. PROCESS_INFORMATION structure is filled in by the CreateProcess with
information about the newly created process and its primary thread. The
GetModuleFileName retrieves the full path and file name for the file containing our
specified module where the szPath is the pointer to the buffer that receives the path
and file name. The length of the buffer is specified in MAX_PATH. If the length of
the path and file name exceeds this limit, the string is truncated.
The &si pointer specifies how the main window for the new process should
appear and the &pi receives the identification information about the new process
created. The WaitForSingleObject returns when the command of decoding process
completes where the time-out interval elapses.
Save Wave Function
Save wave function is implemented in the application. The first parameter of
CFileDialog is set to FALSE to construct a File Save As dialog box. The dialog box
prompts client to save the speex file at desired location. A temporary path is created
for the wave file where the lstrcat function appends the string of wave file.wav to
szWavePath. The wave file is saved to temporary location.
The try-catch block of exceptions is used to prevent continued operation if
the program cannot obtain the required wave file. An ellipsis (...) is used as the
parameter of catch where the handler will catch any exception no matter what the
type of the exception is. This can be used as a default handler that catches all
exceptions not caught by other handlers. When the exception occurs, the control
goes to catch block.
SpeexEncode is called to compress the wave file. The input file, which is a
wave file, is given by szWavePath and the output file is saved at the full path entered
in the dialog box.
Play Wave Function
The Play Wave function prompts user twice in order to play both the wave
files of SLP and client in a single screen. The first prompt is the SLP’s pre-recorded
wave file and the second prompt is the client’s recorded speex file. The speex file is
decompressed back to wave file in order to be displayed on screen.
First prompt is for SLP’s pre-recorded wave file when button Play Wave is
clicked. A File Open dialog box prompts user for loading the desired wave file.
User browses to the directories where the wave file is saved. The wave file is
checked for its mono channel, sampling rate, bit per sample, and PCM format to
ensure it is the correct wave file format. Then, the data bytes in the wave file are
copied to new buffer, wavedata1 before the application could display both the SLP’s
and client’s AMPs.
Second prompt is for client’s speex file when Play Wave button is clicked.
The openfile2 is used to prompt user for the second audio file to be played which is
the “.spx” file recorded by client. The client’s speex file is decoded to temporary
wave file format in order to display both wave files in a single screen. The wave file
is checked whether it is in defined wave file format. After it is checked, the data
bytes in the wave file are copied to new buffer, wavedata2 before the application
could display both the SLP’s and client’s AMPs.
After having both wave files in the buffers, the application is ready to display
both the SLP’s and client’s wave files on screen. Again, the waveform audio output
device is opened for playback.
DC Offset Removal
Time Domain Windowing and Filtering
Background Noise Level Detection
During the recording of background noise, gcvt converts the floating-point
value of background noise level to a character string (which includes a decimal point
and a possible sign byte) and stores the string in buffer, buff. The buff should be
large enough to accommodate the converted value plus a terminating null character,
which is appended automatically. This produces 20 digits in decimal format. Upon
the detection of background noise, a message is sent to inform the client about the
end of detection.
History File
CFile object is constructed from a path where three actions can be taken when
opening the file. The modeCreate directs the constructor to create a new file. If the
file exists already, it is truncated to 0 length. For modeNoTruncate, if the file being
created already exists, it is not truncated to 0 length. Thus the file is guaranteed to
open, either as a newly created file or as an existing file. This might be useful, for
example, when opening a history file that may or may not exist already. The
modeWrite opens the file for writing only.
A structure is defined for all the members in history file. Three stuttering
therapy techniques are introduced in computer-based method. The techniques are
Shadowing, Metronome and DAF. History file includes the scoring of three therapy
techniques in which the technique names are stated together with the scoring.
The fopen opens for both reading and writing with the condition that the file
must exist. The fgets reads a string from the input argument of FILE structure and
stores it in sentence [i]. The fgets reads characters from the current stream position
to and including the first newline character, to the end of the stream, or until the
number of characters read is equal to n – 1, whichever comes first.
The Loading of Wave Files
The Loading of Text Sentences
Client Identification
Only after the client identification’s procedure, the features of Next
Technique, Next Sentence, Testing, Recording, Save Wave and Play Wave are
enabled. GetDlgItem retrieves handle to the control where the EnableWindows()
enables the mouse and keyboard input to the specified control when it is set to
“TRUE”. Otherwise, it disables any input when it is set to “FALSE”.
Buttons Initialization
The button Next Technique (IDC_Prev) enables client to proceed to next
therapy technique after completing the previous one. Button Next Sentence
(IDC_Next) is used to proceed to the next sentence every time client finishes
recording for the previous sentence. The button Testing (IDC_Testing) is pressed
whenever clients wish to listen to the playback. They can repeat the playback as
many times as they wish. The same thing goes to the button Record
(IDC_RecordTest) where clients can record as many times as they wish.
Scoring Parameter Definition
The noscore is the number of scoring for each scoring category. For each
therapy technique, the count is incremented one every time the client repeat
recording for the same sentence. For example, the scoring for start location
parameter is added to accumulative scoring upon each recording.
The average scoring of each parameter (grade, grade1, grade2, grade3) is
calculated by dividing the total scores (sscore, escore, mscore, dscore) with the total
recording attempts (noscore). The average score for each sentence is calculated by
dividing the total scores (startscore, endscore, maxscore, durscore) of all parameters
with four.
PlaySound is called to play the applause specified by given filename,
“applause.wav”. The applause is played asynchronously and it returns immediately
after beginning the sound. PlaySound searches the following directories for sound
files. The sound specified by “applause.wav” must fit into available physical
memory and be playable by waveform-audio device driver.
A CBitmap object, bitmap is constructed with a bitmap handle to it with one
of the member function, LoadBitmap. This member function loads the bitmap
resource ID number of the bitmap resource, IDB_BITMAP1 from the application’s
executable file. The loaded bitmap is attached to the bitmap object. The BitBlt
copies the bitmap of fireworks display from the source device context to current
device context.
Scoring for Start Location
Scoring for End Location
Scoring for Maximum Magnitude Comparison
Scoring for Duration Comparison