COMPUTER-BASED MALAY STUTTERING ASSESSMENT SYSTEM OOI CHIA AI

COMPUTER-BASED MALAY STUTTERING ASSESSMENT SYSTEM OOI CHIA AI UNIVERSITI TEKNOLOGI MALAYSIA COMPUTER-BASED MALAY STUTTERING ASSESSMENT SYSTEM 2006/2007 OOI CHIA AI 9 2662, LRG TUN SYED SHAHABUDDIN 6/1, TAMAN LUMBA KUDA, Assoc. Prof. DR. JASMY BIN YUNUS 05250 ALOR STAR, KEDAH. 15 May 2007 15 May 2007 “I hereby declare that I have read this thesis and in my opinion this thesis is sufficient in terms of scope and quality for the award of the degree of Master of Engineering (Electrical)” Signature : Supervisor : Assoc. Prof. Dr. Jasmy bin Yunus Date : 15 May 2007 BAHAGIAN A – Pengesahan Kerjasama* Adalah disahkan bahawa projek penyelidikan tesis ini telah dilaksanakan melalui kerjasama antara ______________________ dengan _________________________ Disahkan oleh: Tandatangan :………………………………………………… Nama :………………………………………………… Jawatan :………………………………………………… Tarikh :………… (Cop rasmi) * Jika penyediaan tesis/projek melibatkan kerjasama. BAHAGIAN B – Untuk Kegunaan Pejabat Fakulti Kejuruteraan Elektrik Tesis ini telah diperiksa dan diakui oleh: Nama dan Alamat Pemeriksa Luar : ...……………………………………………... ………………………………………………... ………………………………………………... ………………………………………………... Nama dan Alamat Pemeriksa Dalam : ……………………………………………….. ………………………………………………... ………………………………………………... Nama Penyelia Lain (jika ada) : …………………………………………….…. ………………………………………………... ………………………………………………... Disahkan oleh Timbalan Dekan (Pengajian Siswazah & Penyelidikan) / Ketua Jabatan Pengajian Siswazah: Tandatangan : ……………………………………….. Nama : ……………………………………….. Tarikh :………………... COMPUTER-BASED MALAY STUTTERING ASSESSMENT SYSTEM OOI CHIA AI A thesis submitted in fulfilment of the requirements for the award of the degree of Master of Engineering (Electrical) Faculty of Electrical Engineering Universiti Teknologi Malaysia MAY 2007 ii I declare that this thesis entitled “Computer-Based Malay Stuttering Assessment System” is the result from my own research except as cited in references. The thesis has not been accepted for any degree and is not concurrently submitted in candidature of any other degree. Signature : Name : Ooi Chia Ai Date : 15 May 2007 iii To my dearest parents, siblings and friends for their love and support to make the journey iv ACKNOWLEDGEMENT I would like to take this opportunity to extend my sincere gratitude and appreciation to many people who made this thesis possible. Special thanks are due to my supervisor, Associate Professor Dr. Jasmy bin Yunus for his invaluable guidance, suggestions and full support in all aspects during the research. Mega thanks go to the Faculty of Electrical Engineering of Universiti Teknologi Malaysia for the full support in this project. Thanks are also due to UTMPTP for the research funding. I would like to express my sincere thanks to Hospital Sultanah Aminah and primary schools located at Skudai district as well, for their assistance in conducting the numerous experiments during the research. Finally, special thanks to all my friends for their unwavering support, concerns and encouragement. v ABSTRACT Stuttering has attracted extensive research interests over the past decades. However, far less effort has been done on the computer-based stuttering assessment system. Stutterers respond in unique ways to different therapy techniques. The technique that works so dramatically for one stutterer does not necessarily work dramatically, or at all, for other stutterers. Stuttering is so variable and so highly individualized that, few would disagree, no one method works for all stutterers. Normally, 2 to 3 months are required to determine suitable techniques for each client. The effectiveness of each approach depends on the receptiveness of the client. This thesis explains the development of a computer-based Malay stuttering assessment system. The system assists Speech-Language Pathologist (SLP) in determining suitable therapy techniques for each client. The software was developed based on fluency shaping techniques used in fluency rehabilitation regimen. Digital Signal Processing techniques were implemented to analyze speech signals. The maximum magnitudes of the clients’ and the SLPs’ speech signals, corresponding to the Average Magnitude Profiles (AMPs), were determined and compared. The maximum magnitude was determined where a total of 15 neighbouring samples were summed to obtain a maximum value. Start location, end location, maximum magnitude and duration were compared between clients’ and SLPs’ AMPs to generate scoring, the computational analyses help SLP to determine suitable techniques in a faster way. The software has been developed using Microsoft Visual C++ 6.0 to run under Window XP. Three therapy techniques were introduced in the proposed computer-based method. These techniques consisted of Shadowing, Metronome and Delayed Auditory Feedback. Ten test subjects were selected from 6 primary schools located at Skudai. Measurements of percent syllables stuttered on test subjects were made by SLP from Hospital Sultanah Aminah. The experimental results showed that SLP agreed with the result analyses generated by the software. vi ABSTRAK Masalah gagap telah menarik perhatian para penyelidik sejak beberapa dekad yang lalu. Namun demikian, kerja penyelidikan melalui implementasi sistem penilaian masalah gagap berdasarkan komputer adalah amat terhad. Pesakit gagap memberi reaksi yang berlainan terhadap teknik terapi yang berbeza. Satu teknik terapi yang berkesan terhadap seorang pesakit gagap tidak semestinya memberi kesan yang sama kepada pesakit yang lain. Biasanya, dua hingga tiga bulan diperlukan untuk mengenalpasti teknik terapi yang paling sesuai untuk seseorang pesakit gagap. Keberkesanan sesuatu teknik bergantung kepada tahap penerimaan individu. Tesis ini menerangkan proses pembinaan dan implementasi sistem penilaian gagap berdasarkan komputer. Sistem penilaian ini membantu Patologis Pertuturan (SLP) dalam mengenalpasti teknik terapi yang paling sesuai bagi pesakit gagap. Perisian ini direka berdasarkan teknik penajaman kelancaran pertuturan yang digunakan dalam proses pemulihan pertuturan. Teknik pemprosesan isyarat digital diaplikasikan untuk menganalisis isyarat pertuturan. Magnitud maksimum isyarat pertuturan bagi pesakit dan SLP, iaitu "Profil Purata Magnitud" (AMP) dikenalpasti dan dibandingkan. Magnitud maksimum dikira dengan menambah 15 sampel untuk mendapatkan satu nilai maksimum. Titik permulaan, titik akhir, magnitud maksimum dan tempoh dibandingkan di antara AMP pesakit dan SLP untuk menghasilkan pemarkahan yang dapat membantu SLP mengenalpasti teknik terapi yang sesuai dengan lebih cepat. Perisian direka dengan menggunakan Microsoft Visual C++ 6.0 dalam platfom Window XP. Tiga teknik terapi dibangunkan berdasarkan komputer iaitu Shadowing, Metronome, Delayed Auditory Feedback. Sepuluh subjek ujian dipilih dari enam buah sekolah di Skudai. Pengukuran peratusan sukukata yang gagap dibuat oleh SLP dari Hospital Sultanah Aminah terhadap sampel subjek ujikaji. Hasil ujikaji telah menunjukkan bahawa SLP bersetuju dengan analisis keputusan perisian. vii TABLE OF CONTENTS CHAPTER 1 2 TITLE PAGE DECLARATION ii DEDICATION iii ACKNOWLEDGEMENT iv ABSTRACT v ABSTRAK vi TABLE OF CONTENTS vii LIST OF TABLES xii LIST OF FIGURES xiii LIST OF ABBREVIATIONS xv LIST OF APPENDICES xvii INTRODUCTION 1.1 Background and Motivation 1 1.2 Problem Statement 3 1.3 Research Objectives 5 1.4 Scope of Work 5 1.5 Research Approach 6 1.6 Significance of Research Work 7 1.7 Thesis Layout 8 STUTTERING ASSESSMENT AND DIAGNOSIS 2.1 Introduction 9 2.1.1 Features of Stuttering Assessment 10 2.1.2 Problems of Classical Manual Assessment System 11 viii 2.1.3 2.2 2.3 2.4 Computer-based Assessment System The Variability of Fluency 14 2.2.1 Definition of Stuttering 15 2.2.2 Characteristics of Stuttering 18 2.2.3 Types of Stuttering 22 Basic Considerations When Assessing Young Children 23 Formal Measures of Assessment System 25 2.4.1 Stuttering Severity Instrument (SSI-3) 25 2.4.2 Modified Erickson Scale of Communication Attitudes (S-24) 26 2.4.3 Perception of Stuttering Inventory (PSI) 26 2.4.4 Locus of Control Behaviour (LCB) 27 2.4.5 Crowe’s Protocols 2.4.6 2.4.7 2.5 2.6 3 12 27 Communication Attitude Test-Revised (CAT-R) 28 A-19 Scale for Children Who Stutter 28 Stuttering Therapy Techniques 29 2.5.1 Shadowing 30 2.5.2 Metronome 30 2.5.3 DAF 31 2.5.4 Rate Control 32 2.5.5 Regulated Breathing (RB) 32 2.5.6 Easy onset 33 2.5.7 Counselling 33 2.5.8 Prolonged Speech (PS) 34 Summary 34 STUTTERING ASSESSMENT FRAMEWORK DESIGN 3.1 Introduction 35 3.2 Assessment Requirement: Principles and Strategies 35 3.3 Variables in Choosing Therapy Techniques 36 3.3.1 The Therapy History of Client 37 3.3.2 The Age and Motivation Level of Client 37 ix 3.4 3.5 3.3.3 Economic and Time Constraints 38 3.3.4 SLP’s Beliefs 39 Basic Stuttering Treatment Approaches 39 3.4.1 Fluency Shaping (FS) 41 3.4.2 Stuttering Modification (SM) 42 3.4.3 Why Fluency Shaping (FS)? 43 Problem Formulation 44 3.5.1 44 The Uniqueness of Each Individual 3.5.2 Difficulty in Identifying Appropriate Therapy Technique 3.5.3 Time Consumption in Classical Manual Assessment System 3.6 4 45 46 3.5.4 Validity 47 Underlying Design Principles 48 3.6.1 Audio and Visual Feedbacks 49 3.6.2 50 Monitoring and Assessment 3.6.3 Clinical Evaluation 51 3.6.4 Motivation 52 3.7 System Design 54 3.8 Criteria for Selection of Scoring Parameters 55 3.9 Summary 57 DEVELOPMENT OF COMPUTER-BASED STUTTERING ASSESSMENT SYSTEM 4.1 Introduction 58 4.2 System Requirements 58 4.2.1 Hardware Requirements 59 4.2.2 Software Requirements 59 4.3 System Descriptions 59 4.4 Coding 62 4.4.1 Audio File Format 63 4.4.2 Sampling 63 4.4.3 Resolution Bit 64 4.4.4 Mono Channel 65 x 4.4.5 DC Offset Removal 65 4.4.6 Windowing 67 4.4.7 Time Domain Filtering 69 4.4.8 Recording and Playback of Speech Utterances 70 4.4.8.1 Playback 71 4.4.8.2 Recording 71 Background Noise Level Detection 72 4.4.9 4.4.10 History File 73 4.4.11 Client Identification 73 4.4.12 Compression and Decompression Using Speex 74 4.5 4.6 5 4.4.12.1 Encoding 77 4.4.12.2 Decoding 78 Scoring 79 4.5.1 Start Location 80 4.5.2 End Location 80 4.5.3 Maximum Magnitude 82 4.5.4 Duration 83 Summary 85 CLINICAL EVALUATION OF COMPUTER-BASED STUTTERING ASSESSMENT SYSTEM 5.1 Introduction 5.2 Implementing Clinical Trials among School-age 86 Children 86 5.2.1 Test Subjects 87 5.2.2 Experimental Set-Up 88 5.2.3 Scenario 89 5.2.3.1 Shadowing Task 90 5.2.3.2 Metronome Task 90 5.2.3.3 DAF Task 91 5.3 Assessment Procedures 92 5.4 Data Collection 101 5.5 Results or Quantitative Analyses 102 5.5.1 104 Results Generated by Software xi 5.5.2 Results Analysis by SLP 5.5.3 Comparison between Software and SLP 5.5.4 5.6 6 110 Analyses 111 Description of Individual Test Subjects 113 Summary 114 CONCLUSION 6.1 Introduction 115 6.2 Future Works 117 REFERENCES 119-127 Appendices A-I 128-172 xii LIST OF TABLES TABLE NO. 5.1 TITLE Occurrence Frequency of Stuttering Behaviours in Test Subjects 5.2 PAGE 103 The Range and Quartile Distribution of the Frequency Indices for Stuttering Characteristics 104 5.3 Software Scoring for Test Subjects 105 5.4 Software Scoring for Control Data 105 5.5 %SS for Each Therapy Technique 111 5.6 Comparison between the Determination of Therapy Technique 5.7 for Each Test Subject by Software and SLP 112 Description of Individual Test Subjects 113 xiii LIST OF FIGURES FIGURE NO. TITLE 1.1 Research Development Process 2.1 Speech Waveforms and Sound Spectrograms of a Male Client Saying “PLoS Biology” 3.1 PAGE 7 17 Comparisons between Fluency Shaping and Stuttering Modification 40 3.2 Five Steps to Implementing EBP 52 4.1 System Block Diagram 61 4.2 Flowchart: DC Offset Removal 67 4.3 Common Window Functions 68 4.4 Flowchart: Background Noise Level Detection 72 4.5 Dialog Box: The Loading of Wave Files 74 4.6 Dialog Box: Client Identification 74 4.7 Wave File Information 76 4.8 Speex File Information 76 4.9 Encoding Process 77 4.10 Decoding Process 78 4.11 Flowchart: The Scoring of Start Location 81 4.12 Flowchart: The Scoring of End Location 82 4.13 Flowchart: The Scoring of Maximum Magnitude Comparison 84 4.14 Flowchart: The Scoring of Duration Comparison 85 5.1 Shadowing Task 90 5.2 Metronome Task 91 5.3 DAF Task 92 5.4 Detection of Background Noise Level 93 5.5 End Detection of Background Noise Level 93 xiv 5.6 Selection of Five Pre-recorded Wave Files 94 5.7 Selection of Text File 95 5.8 Input of User Name 95 5.9 Input of History File and Its Location 96 5.10 The Enabling of Buttons 97 5.11 The AMP of SLP 97 5.12 The AMP of Client superimposed on SLP's AMP 98 5.13 The File Saving of Recorded Utterances 99 5.14 The File Playing of Both SLP and Subject's Utterances 100 5.15 The Display of Fireworks 100 5.16 The Display of the Information of Attempted Utterances 106 5.17 The Scoring Comparison for Start Location Parameter 107 5.18 The Scoring Comparison for End Location Parameter 107 5.19 The Scoring Comparison for Maximum Magnitude Location Parameter 5.20 5.21 108 The Scoring Comparison for Maximum Magnitude Location Parameter 109 The Average Score of Each Therapy Technique 109 xv LIST OF ABBREVIATIONS ADC - Analogue to Digital Converter AMP - Average Magnitude Profile API - Application Programming Interface ASHA - American Speech-Language-Hearing Association AWS - Adults Who Stutter CAT-R - Communication Attitude Test-Revised CBR - Constant Bit-Rate CODEC - Compression Decompression CPU - Central Processing Unit CWS - Children Who Stutter DAC - Digital to Analogue Converter DAF - Delayed Auditory Feedback DC - Direct Current DMA - Direct Memory Access DSP - Digital Signal Processing EBP - Evidence-Based Practice FS - Fluency Shaping GDI - Graphics Device Interface GUI - Graphical User Interface ISA - Industry Standard Architecture LCB - Locus of Control Behaviour LPF - Low Pass Filter MFC - Microsoft Foundation Class OS - Operating System PC - Personal Computer PCI - Peripheral Component Interconnect PCM - Pulse Code Modulation xvi PS - Prolonged Speech PSI - Perception of Stuttering Inventory PWS - Person Who Stutter RB - Regulated Breathing RIFF - Resource Interchange File Format S-24 - Modified Erickson Scale of Communication Attitudes SLP - Speech-Language Pathologist SM - Stuttering Modification SS - Stuttered Syllables SSI-3 - Stuttering Severity Instrument SSMP - Successful Stuttering Management Program SW/M - Stuttered Words per Minute USD - United State Dollar VBR - Variable Bit-Rate WPM - Words Per Minute xvii LIST OF APPENDICES APPENDIX TITLE PAGE A Wave File Format 128 B Stuttering Severity Instrument (SSI-3) 138 C Modified Erickson Scale of Communication Attitudes (S-24) 139 D Locus of Control Behaviour (LCB) 140 E Communication Attitude Test-Revised (Cat-R) 141 F A-19 Scale for Children Who Stutter 142 G Stuttering Prediction Instrument for Young Children (SPI) 144 H Physician’s Screening Procedure for Children Who May Stutter 148 I Coding 149 CHAPTER 1 INTRODUCTION 1.1 Background and Motivation Stuttering has attracted extensive research interests over the past decades. Stuttering research is exploring ways to improve the diagnosis and treatment of stuttering as well as to identify its causes. Emphasis is being placed on improving the ability to determine which children will outgrow their stuttering and which children will stutter the rest of their lives. Recent research has been focused on the therapy program such as Lidcombe Program, Camperdown Program, Prolonged Speech-based Stuttering Treatment and others. Many classical manual inventories, scales, and procedures have been developed for assessing the quality of both the surface as well as the deep structure of stuttering such as Stuttering Severity Instrument (SSI-3), Modified Erickson Scale of Communication Attitudes (S-24), Perception of Stuttering Inventory (PSI) and so on. However, far less effort has been done on the computer-based stuttering assessment system. Studies [1] indicated that stuttering therapy was eventually helpful in reaching goals of successful management for person who stutter (PWS). However, PWS had difficulty in identifying specific therapy techniques to which they could attribute their success. Findings [2] suggest that a system that can identify suitable therapy techniques is important because any rational and empirically informed procedure that enables the PWS to systematically modify speech behaviours and the associated cognitive features may be likely to successfully facilitate fluency. 2 Numerous therapy techniques exist to treat stuttering, yet there remains a paucity of empirically motivated stuttering treatment outcomes research. Despite repeated calls for increased outcome documentation on stuttering treatment, the stuttering literature remains characterized by primarily ‘‘assertion-based’’ or ‘‘opinion-based’’ treatments, which by definition are based on unverified treatment techniques and/or procedures [3]. Conversely, ‘‘evidence-based’’ treatments, based on well-researched and scientifically validated techniques, remain relatively rare in the field of stuttering and are usually limited to behavioural and fluency shaping (FS). The objective assessment of specific stuttering treatment approaches is also important to elucidate therapy techniques that contribute to desired outcomes. However, identifying the appropriate therapy techniques manually can be difficult, given the multidimensional nature of stuttering [4]. A study [5] consisted of 98 children who stutter (CWS) ranged from 9 to 14 years old were divided into four groups: 1. The first group was treated by speech-language pathologist (SLP) in a speech clinic. 2. In the second group, the parents were trained to administer the stuttering therapy to their children, but the children did not see a SLP. 3. In the third group, the children used speech biofeedback computers designed for treating stuttering. They were not treated by SLPs, and their parents were not involved. 4. The control group received no therapy. One year after the therapy program ended: 1. 48% of the children treated by SLPs were fluent. 2. 63% of the children treated by their parents were fluent. 3. 71% of the children treated by computers were fluent. 4. The control group's speech did not improve. The results showed that computers were the most effective, the parents were the next most effective, and the SLPs were the least effective. At the 1% disfluency 3 level, the computers and the parents were about four times more effective than the SLPs and this encourages the implementation of a computer-based stuttering assessment system to improve clients’ fluency. Computer-based stuttering assessment system implements the function of complicated and expensive acoustic equipment available only at well-equipped speech-language pathology clinics. 1.2 Problem Statement Why computer-based stuttering assessment system should be implemented to replace the classical manual assessment approach? What is possible during clinical session is often determined by treatment variables such as the availability, setting, and cost of services. The drawbacks and insufficiencies of classical manual stuttering assessment include: a) Time - There were at least 115 therapy techniques that decreased stuttering markedly [6]. Each of the PWS exhibits a unique response to different therapeutic approaches. The technique that works so dramatically for one stutterer does not necessarily work dramatically, or at all, for other PWS. Stuttering is so variable and so highly individualized that, few would disagree, no one method works for all PWS. Without computer-based system, SLP has to try every single therapy technique depending on the needs and response of the client, which may take months of repeated procedures that are costly and overly generalized. The fact is, the longer a SLP takes to assess a client, the more tedious and troublesome will be for a client. For PWS, practice with therapy techniques must take place for many months and years before the techniques become functional. The uniqueness of each individual PWS prevents any specific recommendations of therapy techniques from being universally applicable [7]. Being able to move systematically and persistently toward distant goals is essential since treatment with PWS takes a considerable length of time. 4 b) Cost - Due to the typical length of treatment and the usual lack of reimbursement by insurance companies, the cost of successful treatment can quickly become prohibitive for many clients [8]. The longer the SLP takes to assess a client, the higher the cost of diagnosis. The cost of clinical session is a major consideration for nearly every client. Some individuals simply cannot afford professional help unless the services are covered by insurance (which is not typically the case for fluency disorders) or are available at a reduced rate. A computer-based stuttering assessment is capable to reduce the diagnosis duration and thus reduce the costs of clinical session for client. c) Effectiveness - The success of treatment is closely tied to the ability of an experienced SLP to determine a client’s readiness for change and adjust treatment techniques accordingly. Thus the utility of the techniques depends on the SLP’s ability to apply the right technique (s) at the right time. Often, an approach is chosen because it coincides with the personality of the SLP and her view of reality. The SLP’s perception of the client should be as accurate as possible. If SLP apply the wrong therapy technique on a particular client, not only the client will not show any improvement, the client may need longer time for assessment process. One of the features of stuttering is that it tends to change with time. This requires SLP to give full attention to each client’s progress. This is impossible in sole traditional clinical treatment without the assistance of a computer-based diagnosis tool. Research [5] indicates that computers were the most effective, the parents are the next most effective, and the SLPs were the least effective. d) Uniformity - The scores from manual measures provide quantitative information that is very subjective due to different human perception. This quantitative measurement may vary from one SLP to another which leads to the scoring inconsistencies. Scoring generated by computer-based system enables SLPs in different locations to be on “the same page” regarding the general severity, suitable therapy techniques and the overall characteristics of clients. e) Motivation - PWS often finds traditional clinical session to be a very tedious and undesirable process. Successful intervention also requires continued 5 commitment and motivation by the client. The problem could be eliminated if SLPs use a "user-friendly" stuttering relief approach where this could be enhanced by the use of a computer-based system. An experienced and interesting guide such as computer-based scoring analyses can show the way or, at the very least, make the journey more efficient and often more pleasant. It is essential that the child enjoys the assessment process and finds it to be a positive experience. 1.3 Research Objectives The objectives of the research work are to develop a computer-based Malay stuttering assessment system and to verify its operation in a real clinical session. The computer-based assessment system is able to assist SLP in determining suitable stuttering therapy techniques for each client. The application is capable to reduce time consumption required to determine suitable therapy technique for each client. The effectiveness of stuttering assessment process is improved as computerbased approach has been proved to be the most effective in treating PWS. Scoring generated by the computer-based system ensures the consistency and uniformity regarding the general stuttering severity of a particular client. Computer-based system provides interesting guide which motivate client to enjoy the assessment process. 1.4 Scope of Work The scope of this research work includes the development of a computerbased Malay stuttering assessment system working on Window-based operating platform. This research work focuses on school-age children because stuttering should be treated as early as possible, primarily because it becomes less tractable as children get older. The application is developed and coded to run in the graphic user interface (GUI) to provide a user-friendly environment. Digital Signal Processing 6 (DSP) techniques are implemented to analyze speech signals. Software is designed based on standard speech FS techniques that can be easily incorporated into current fluency rehabilitation regimen. Three stuttering therapy techniques are introduced in computer-based. They are Shadowing, Metronome and Delayed Auditory Feedback (DAF). These three techniques are chosen because they are the most commonly used therapy techniques in Malaysia based on discussion with SLPs in Malaysian hospitals. Once the computer-based stuttering assessment system is developed, it is verified in the real stuttering clients through clinical trial carried out among primary school's students. Hospital Sultanah Aminah assisted in the clinical trials, as well as giving professional feedback. 1.5 Research Approach The development process can be broken into six phases as shown in Figure 1.1. During the first phase, hardware and software required for computer-based stuttering assessment system are identified. Hardware includes microphone, earphones and desktop equipped with sound card, while software means Operating System (OS), the development tools and the necessary drivers. Window XP OS is chosen due to its availability and familiarity. Sound card must be able to work with Window XP. Prior to the designing and coding stage, basic principles and strategies of stuttering assessment systems are analyzed. Both the FS and stuttering modification (SM) treatment approaches are reviewed and compared. Digital audio processing theories and principles are studied in detail. Algorithm selection and analysis are made for the recording and playback procedures such as windowing, and filtering. Next, the program flowcharts of stuttering assessment operation are constructed for each module. Coding is done by Microsoft Visual C++ language on Window-based platform. Clinical trials on control data and test subjects are carried out to make sure that the system runs properly before its practicality was verified. 7 Literature Review and Problem Specification Programming Language Learning (C, C++, MFC) Algorithm Selection and Analysis Designing Coding Clincial Trial Figure 1.1: Research Development Process 1.6 Significance of Research Work Implementing a stuttering assessment in a computer-based system is very important because the use of computer technology in stuttering assessment is still new in Malaysia and currently no computer-based assessment tool is available to assist SLPs in determining suitable therapy technique for each client. Coming up with a clean implementation not only helps better understanding of the available stuttering therapy techniques, but also allows extensions to the introduction of more stuttering therapy techniques in computer-based. The implementation of computer-based Malay stuttering assessment system aids SLP to determine suitable therapy techniques for each client during the clinical assessment process. Each client exhibits a unique response to the treatment strategies. The uniqueness of each individual PWS prevents any specific recommendations of therapy techniques from being universally applicable. 8 Without the use of computer-based assessment system, normally 2 to 3 months are required to determine suitable technique for each client. The computerbased assessment system is capable to reduce the amount of time needed for the determination of therapy techniques and the possibility of error occurrence in the manual calculation of percentage of stuttered syllables (SS). The algorithms involved in the design of assessment procedures and the introduction of three therapy techniques in computer-based are our original ideas. 1.7 Thesis Layout The content of the thesis is organized as follows. Chapter two describes the generic characteristics of stuttering assessment system and survey of current assessment systems used in clinical sessions. Not all systems are mentioned here, but rather those systems that are intuitively relevant for our case of computer-based assessment system. Chapter three depicts the problem formulation and underlying design principles of computer-based Malay stuttering assessment system, by pointing out some relevant topics such as the basic stuttering treatment approaches including FS and SM, and the criteria for selection of scoring parameters. Clinical trials on control data and test subjects have been carried out to make sure that the system runs properly before its practicality was verified. The details of the development of computer-based Malay stuttering assessment system are presented in Chapter four while Chapter five portrays the results obtained during the clinical evaluation of the developed assessment system. Chapter six presents the conclusions of the thesis and identifies some areas for future work. CHAPTER 2 STUTTERING ASSESSMENT AND DIAGNOSIS 2.1 Introduction Many inventories, scales and procedures have been developed for assessing the quality of both the surface as well as the deep structure of stuttering. Many of these measures are helpful for obtaining both data-based and criterion-referenced information for assessing stuttering. However, the variety of surface behaviours and intrinsic features that come together in stuttering do not always lend themselves to a realistic or valid analysis. Individual PWS rarely fit all the descriptions, situations, and categories associated with any single measure [9]. Many of these scales do provide helpful information that will prove useful as SLPs make decisions concerning the initiation of treatment, the selection of a treatment strategy, and the phasing out or termination of formal treatment. However, assessment must go beyond all these procedures. No matter how many formal measures SLPs administer or what they are able to discover during the initial meeting, SLPs must recognize that they are viewing the client and his problem through a small window. As much as with any other human communication disorder, the assessment of fluency disorders including stuttering is an ongoing process [8]. The process of assessment – while most intense during the initial stages of treatment – continues through treatment and into the post treatment period, when the client may experience relapse. In order to continue making good clinical decisions, the SLP must continue to obtain data concerning all aspects of the syndrome. The 10 continual process could be enhanced by a computer-based stuttering assessment system. 2.1.1 Features of Stuttering Assessment At the initial assessment of a child thought to be stuttering, a SLP ought to be thinking about two related judgments: Is the child actually stuttering and, if so, what is the prognosis for recovery? When young children are diagnosed, speech patterns are usually evaluated and compared to the normal fluency of their age group [10]. SLPs also take into consideration developmental and emotional factors that can disrupt a child’s speech. To start with stuttering assessment, valid, numeric measures of the factors believed to be relevant are needed. A validated measurement does not necessarily mean that it will have predictive value. To establish this, statistical models for diagnosis and prognosis need constructing. If diagnosis and treatment prognosis are each determined by several factors, they can be represented by a multivariate equation of the form [11]. Using Rustin's assessment instrument [12] as a guide as well as a comprehensive review of the research and clinical literature, ten factors that might have relevance to diagnosis and prognosis with CWS were identified. These factors are: (1) Parental attitude. (2) Social skills. (3) Family history of stuttering. (4) Cerebral dominance. (5) Language development. (6) Client attitude. (7) General health. (8) Motor skills. (9) Auditory skills. (10) Speech scores. A survey of SLPs and researchers in the UK, mainland Europe and the USA indicated that, currently, the factors are thought to be comprehensive [11]. Post hoc application of such models, on validated numeric measures, will be able to establish exactly which factors can be included as predictors and what item scores should be used to assess these factors. 11 Many SLPs make a wide-ranging, multi-faceted assessment of a child [13]. A fundamental decision is what factors to be included in this assessment. Many authors err on the side of caution and attempt to be as comprehensive as possible so as to ensure nothing of potential value is missed. Another reason for developing such extensive assessment instruments is that they are intended to be used for other purposes too. Due to the complication of stuttering assessment process, computer-based stuttering assessment system could contribute by making the ongoing assessment process easier for SLP where SLP just needs to look at the scoring generated in order to evaluate the client's progress. As stated earlier, there are many consideration factors in developing a stuttering assessment including parental attitude and family history of stuttering. Therefore, computer-based system alone could not help to assess a client completely, it also requires the SLP's personal assessment experience and the information provided by client's family [14]. 2.1.2 Problems of Classical Manual Assessment System The research literature is riddled with discussion that factors involved in assessments lack reliability and validity. Research [11] has repeatedly warned that certain manual stuttering assessment tools are unreliable. This is because manual assessment totally depends on the SLP's perception on each client's performance. To date, relatively little work has been addressed at establishing whether there is validity or accuracy in the information collected manually. It is necessary to measure the incidence of stuttering in samples of speech to aid SLPs deciding who to treat (diagnosis), to assess what changes in speech occur after clients have been treated (treatment outcome) and to help establish which individuals are likely to be treated most successfully (prognosis). Unfortunately, it has been found that the manual measuring techniques that are traditionally used in clinics result in variable estimates of stuttering incidence being made by different judges on the same samples of speech [15]. Validity of a questionnaire or 12 performance test to assess is the main factor why SLP should not depend solely on classical manual stuttering assessment tool. Besides this, classical manual stuttering assessment tools used to assess stuttered speech are time consuming. Short assessments are preferred over lengthy ones because time spent by client in the clinic is always limited. The problem could be ameliorated if reliable computer-based classification schemes are available. Ideally what is required for each factor is an off-the-shelf, good, easy to use instrument that will elicit scores that can be utilised in a multivariate equation to establish if that factor contributes to diagnosis and/or to the prediction of therapy outcome. An example of how such an instrument can been constructed is a computer-based stuttering assessment system [16]. 2.1.3 Computer-based Assessment System Currently, no computer-based assessment tool exists to help SLP in determining suitable stuttering therapy technique for each stutterer but DAF devices are available to facilitate speech fluency that are sometimes recommended for permanent use. In DAF, the person speaks into a microphone connected to a DAF unit. This unit plays the speech back to the individual through headphones. The "fed back" speech is delayed so that the individual experiences an echo while speaking. When most stutterers talk under DAF, they tend to slow speech down and increase the loudness of voice. The auditory feedback causes the speakers to alter their speech rate, effort, or rhythm [8]. It is clear that the fluency effects of the SpeechEasy and Fluency Master are not simple or uniform [3]. Many more evidenced-based empirically and scientifically controlled studies are needed. There is no proof of long term effectiveness. Furthermore, the device is expensive and the price ranges from USD 3500 to USD 5000. This will discourage most of the PWS to possess. However, the concept of DAF remains a viable and important clinical tool in the total management 13 of stuttering and it continues to be a valuable clinical adjunct to a total treatment program for an important number of PWS. DAF concept can be easily implemented by a computer-based stuttering assessment tool. A computer-based cluttering assessment system has been developed. Cluttering or tachyphemia is a disorder of both speech and language processing that frequently results in rapid, disrhythmic, sporadic, unorganized, and often unintelligible speech. Cluttering is a more general confusion of speech than stuttering. While the repetition that is characteristic of stuttering may be present, it need not fall on opening words and key words in a sentence. In cluttering, meaning may also be interrupted by irrelevant terms and by using multiple phrases to describe the same thing [16]. This computer-based cluttering program is a dual-event counter/timer designed for the assessment of cluttering severity. It helps to determine how often one clutters, and how much one's speech is affected by cluttering. The main option provided by the cluttering program involves online quantified assessment such as percentage talking time cluttered. The second option collects a user's perceptual ratings (visual analogue scale response format) with regard to features thought to be relevant for a qualitative description of cluttering severity of one's client [16]. The cluttering software does not allow the saving of audio file or result upon the completion of practice. It only allows the print out of numerical results. Stuttering assessment is an ongoing process. The program should be able to generate a history file summarizing the client attempts including the total number of times each utterance was practiced. SLP can use this information to determine suitable therapy techniques for each client. Moreover, it enables SLP to assess or monitor client progress and observe how much time client spends to practice. Based on client’s progress record, the supervising SLP is well-informed on which displayed signals and measured parameters would be useful for improving the speech rehabilitation process. These features would enable successful future therapy sessions. 14 The cluttering program does not provide the function of displaying speech amplitude on screen. Speech amplitude should be displayed for stuttering client to copy as closely as possible and to convey to the client those locations where the client's spoken utterance differed from the SLP’s signal in the aspect of amplitude, duration, onset, and end location. Real-time audio and visual modes give immediate feedbacks to the client’s performance relative to the goal, and allow the client to anticipate what amplitude or rate change that is needed to reach the goal. The client can then, if necessary, alter their speech as required to closely match the SLP’s amplitude. No therapy technique or reward is implemented in the above computer-based cluttering system. Three stuttering therapy techniques are implemented in present work and rewards of fireworks display and applause are implemented for clients who managed to obtain scores of 80 and above. 2.2 The Variability of Fluency The level of fluency varies widely across time and location. Most of the time human being speaks nearly automatically, with words flowing smoothly and effortlessly. Normally, no attention is given to the manner of communicating. On other occasions, particularly during communicative or emotional stress, the smoothness begins to disappear and breaks in speech occur – or at least they become more obvious. Although fluency varies for all clients, its variability is even more pronounced for PWS. In most instances, a stutterer is more likely than the normally fluent client to react sooner and to a greater degree to fluency-disrupting stimuli such as time pressure and difficult communication situations. At the other extreme, PWS are sometimes able to “turn on their fluency”. By avoiding feared sounds and words or – with heightened energy and emotion – momentarily “rising to the occasion”, clients who typically stutter are able to become uncharacteristically fluent. 15 The variability of stuttering behaviours is one of the facts about stuttering and something that contributes significantly to the mystery of the disorder. It is difficult for listeners to understand how a client can be speaking fluently one moment and a word or two later struggle dramatically as they attempt to do something as common as saying their own name [17]. The variability of stuttering behaviour also makes it difficult for listeners to become accustomed to a client, for it is not always possible to predict whether or not a person will stutter. Such variability also presents a predicament for the person doing the stuttering, that is, the stutterer cannot always be certain of the amount and degree of difficulty he will have in any given speaking situation. It is difficult for the client to compensate for a problem that is so inconsistent. Of the many communication handicaps that people may suffer, perhaps none is more variable than stuttering [8]. The inconsistent nature of stuttering requires that the assessment of stuttering be an ongoing process that takes place over several assessment or treatment sessions [18]. In other words, stuttering assessment is a complicated process, without the aid of computer-based tool, the assessment process is getting far more complex and difficult than it may first appear. 2.2.1 Definition of Stuttering Stuttering is a disorder affecting the fluency of speech. It usually starts in the third and fourth years of life, after a period of apparently normal speech development. Prevalence rates of stuttering are approximately 1% but can be as high as 5% during childhood. The onset of stuttering usually occurs between the ages of 2 and 6 and is more common in males. The male to female gender ratio is 3 to 1 during childhood and increases to 5 to 1 in adulthood. Although a specific cause for stuttering is not known, it is believed to be a heritable disorder [19]. Stuttering occurs when the forward flow of speech is interrupted abnormally by repetitions or prolongations of a sound, syllable, or articulatory posture, or by avoidance and struggle behaviours. The term “stuttering” refers to the 16 developmental condition, is a disorder in the rhythm of speech in which the individual knows precisely what he wishes to say but at the time is unable to say because of an involuntary repetition, prolongation, or cessation of a sound” [20]. Stuttering is the intermittent impairment of fluency and speech rate, pitch, loudness, inflectional patterns, articulation, facial expression and postural adjustments of the client in the absence of word finding problems, speech motor disorders or voice problems. It is also known as stammering in Britain. The term stuttering will be used herewith. The earliest age of onset of stuttering is about 18 months (when speech emerges) and the latest age of onset vary from 7-13 years of age. The incidence of stuttering (which is the approximate percentage of the population who have stuttered at any time in their lives) can be as high as about 10% [21]. Stuttering is a disruption in the fluency of verbal expression characterized by involuntary, audible or silent, repetitions or prolongations of sounds or syllables as shown in Figure 2.1. These are not readily controllable and may be accompanied by other movements and by emotions of negative nature such as fear, embarrassment, or irritation [22]. Strictly speaking, stuttering is a symptom, not a disease, but the term stuttering usually refers to both the disorder and symptom. 17 Figure 2.1: Speech Waveforms and Sound Spectrograms of a Male Client Saying “PLoS Biology” The left column shows speech waveforms (amplitude as a function of time); the right column shows a time–frequency plot using a wavelet decomposition of these data. In the top row, speech is fluent; in the bottom row, stuttering typical repetitions occur at the “B” in “Biology.” Four repetitions can be clearly identified (arrows) in the spectrogram (lower right). Successful stuttering therapy cannot be equated with complete fluency or a "cure" for stuttering. Because stuttering is not a disease caused by a virus, bacteria, or physical injury, there is no cure. Stuttering is a unique and specific developmental and psycho-neurological condition involving a physical predisposition that is triggered and neurologically conditioned by internal reactions to an extended cycle of repeated performance failures, environmental stressors, and psychological injury. The extended duration of the neurological conditioning involved in the development of stuttering (from several months to many decades) places it in the category of disorders (including post traumatic stress disorder) that can be extinguished to varying degrees, but never totally erased. Some people (particularly pre-school children) may be able to totally recover their original fluency. Other 18 people may experience relatively little change in fluency, but achieve an increased ease of communication. Others may attain a remarkable degree of fluency and communicative ease in many or most situations [23]. 2.2.2 Characteristics of Stuttering Stuttering can be mild, moderate or severe, and can even vary within the same individual from one day to the next, particularly with children. The fact that stuttering tends to run in families indicates that genetics is involved somehow in the condition. Studies of stuttering in twins have also found that both twins are more likely to stutter if they are identical rather than fraternal. Adults who stutter (AWS) often fail to achieve full occupational potential and often experience significant anxiety in social situations [24]. Stuttering is graded by its degree of severity. Most researchers manually rate stuttering by the percentage of stuttered syllables (%SS). While the child speaks, the researcher counts all the stuttered and non-stuttered syllables. One classification method is [24]: • Mild– below five per cent of SS. • Mild to moderate – 5 to 10 per cent of SS. • Moderate – 10 to 15 per cent of SS. • Moderate to severe – 15 to 20 per cent of SS. • Severe – above 20 per cent of SS. Research [25] has suggested that if a person stutters on more than 2% of syllables spoken they should be designated as stuttering, Evesham and Fransella stated that 2%SS and a speech rate of less than 130 syllables per minute should be regarded as stuttered speech. In contrast, Boberg indicated that 2-4%SS should only be regarded as marginal stuttering and speech should containing 4%SS or more before it could be described as stuttering behaviour [25]. 19 Attempts have also been made to come up with a specification about the proportion of dysfluency of different types. Research [25] has suggested that if prolongations exceed 25% of total dysfluencies, this defined stuttering behaviour. Also, it is proposed that the presence of errors of articulation fulfilled the same role. Furthermore, it is stated that the probability of chronic stuttering increases with the amount of effort that a child puts into speech production. When the behaviours of a stutterer are infrequent, brief, and are not accompanied by substantial avoidance behaviour, the stutterer is usually classified as a mild or a non-chronic stutterer. Non-chronic stuttering is often called "situational stuttering" because the afflicted person generally has difficulty speaking only in isolated situations—usually during public speaking or other stressful activities—and outside of these situations the person generally does not stutter. When the behaviours are frequent, long in duration, or when there are visible signs of struggle and avoidance behaviour, the stutterer is classified as a severe or chronic stutterer. Unlike mild or situational stuttering, chronic stuttering presents in most situations, but can be either exacerbated or eased depending on different conditions. Severe PWS often, but not always, are accompanied by strong feelings and emotions in reaction to the problem such as anxiety, shame, fear, or self-hatred. This is usually less obvious in mild PWS and serves as another criterion by which to define PWS as mild or severe. It is worth noting that the severity of a stutterer is not constant and that PWS often go through weeks or months of substantially increased or decreased fluency. PWS universally report having "good days" and "bad days" and report dramatically increased or decreased fluency in specific situations. There are some behaviours that occur during moments of stuttering that can be seen and/or heard as elaborated below. Some of these also occur in the speech of normal speakers, though they may differ qualitatively from those present during moments of stuttering. 20 Interjections are extra sounds, syllables, or words that add no meaning to the message. Probably the most common interjections are "uh" and "um" ("The uh baby ate the soup" or "The baby um ate the soup.") Words or phrases such as "well," "like," and "you know" are considered to be interjections. These sounds units, which occur between words, usually do not perform a linguistic function in messages – that is to say, the denotative meanings of messages usually are not affected by their presence. They may be accompanied by audible and/or visible signs of tensing, may be voluntary or involuntary, and may vary with how aware the client is of their occurrence [26]. Revision happens when children frequently revise what they have just said. They may stop in midstream and start over in a new direction. Revisions may be in pronunciation ("The bady-baby ate the soup"); grammar ("The baby eated-ate the soup"); or word choice ("The daddy-The baby ate the soup."). A child also may go back to add a word ("The baby-The hungry baby ate the soup."). Mistiming occurs when words are mistimed when spoken. Sounds or syllables may be prolonged ("The baby ate the s-s-soup" or "The baaaby ate the soup."). There could also be a break in the word ("The ba-by ate the soup."). Varying amounts of tension in the speech muscles (lips, tongue, vocal cords.) may accompany these mistimed words. Sometimes, the voice sounds strained or the coordination of breathing and speaking breaks down. Part-word (syllable) repetitions are sound and syllable repetitions. They occur most often at the beginnings of words and almost never at the ends of the words. Though the number of times a particular sound or syllable is repeated can be relatively high, it is usually once or twice [27]. The repetitions may be accompanied by audible and/ or visible signs of tensing. Level of awareness of their occurrence can be relatively high or relatively low. The repetitions may be voluntary though they usually are involuntary. Word repetitions are repetitions of an entire word. In most cases, it is a single-syllable word. While a word may be repeated a relatively large number of times, it is usually repeated only once or twice [28]. These repetitions, like part- 21 word ones, maybe accompanied by audible and/or visible signs of tensing, may be voluntary or involuntary, and may vary with how aware the client is of their occurrence. Phrase repetitions are repetitions of units consisting of two or more words. Such units usually are repeated only once or twice. They may be accompanied by audible and/or visible signs of tensing, may be voluntary or involuntary, and may vary with how aware the client is of their occurrence. Incomplete phrases include instances in which the client becomes aware of making an error and corrects it. The error may be in how a word was pronounced or it may be related to the meaning of the word(s) that were said. Also included are instances in which the client begins an utterance, but obviously does not complete it. Disrhythmic phonations are disturbances in the normal rhythm of words. The disturbance may be attributable to a prolonged sound, an accent or timing that is notably unusual, an improper stress, a break (usually between syllables), or any other speaking behaviour not compatible with fluent speech. Included here are phenomena that some investigators have referred to as “broken” words. Disrhythmic phonations may be accompanied by audible and/or visible signs of tensing, may be voluntary or involuntary, and may vary with how aware the client is of their occurrence. Tense pauses are phenomena that occur between words, part words, and interjections. They consist of pauses in which there are barely audible manifestations of heavy breathing or muscle tightening. The same phenomena within a word would place the word in the category of disrhythmic phonations. Tense pauses vary with how aware the client of their occurrence. Physical behaviours are referred to as secondary behaviours as these are acquired as the client strives to live, adapt and cope with their stuttering. Secondary stuttering behaviours are unrelated to speech production. The following may be observed in PWS. The observed associated behaviours are such things as facial grimaces, eye blinks, lip tremors, or head jerks. Secondary behaviours are elaborated in the following. 22 To hide stuttering, PWS may use another word, or put off saying a certain word, until the stuttering feeling goes away. They may even avoid certain speaking situations where they think they might stutter, such as going to a party with friends or ordering a meal at a restaurant. PWS may have a feeling of tightness in some parts of their face or body, such as jaw, cheeks, lips, forehead and upper chest. PWS may move their head forward or back; move their arm, leg or hand; close or blink eyes or move other parts of body in an effort to help them get the sounds out. When PWS expect to stutter, they may hold their breath, take several breaths or show other types of unusual breathing patterns. There are many stuttering characteristics as elaborated above. Some may exist in a particular individual while the others may not. A computer-based assessment tool enables the repeated playback and display of audio files where SLP could listen or observe the client's speech sample in unlimited times. This could help SLP in determining suitable therapy technique for each client during the assessment process in a faster and more informed way. 2.2.3 Types of Stuttering There are three types of stuttering. The most common form of stuttering is thought to be developmental, that is, it is occurring in children between the ages of 2 and 6 who are in the process of developing speech and language. This relaxed type of stuttering occurs when a child's speech and language abilities are unable to meet his or her verbal demands. Stuttering happens when the child searches for the correct word. Developmental stuttering is usually outgrown [2, 29]. Another common form of stuttering is neurogenic. Neurogenic disorders arise from signal problems between the brain and nerves or muscles. In neurogenic stuttering, the brain is unable to coordinate adequately the different components of the speech mechanism. Neurogenic stuttering may also occur following a stroke or 23 other type of brain injury. Neurogenic stuttering is caused by damage in the brain. The damage in the brain will cause signal problems between nerves in the brain or to muscles and the brain is not able to correctly direct the speech signals. Neurogenic stuttering is most common in stroke patients and trauma to the brain [2]. Other forms of stuttering are classified as psychogenic or originating in the mind or mental activity of the brain such as thought and reasoning. Whereas at one time the major cause of stuttering was thought to be psychogenic, this type of stuttering is now known to account for only a minority of the PWS. Although PWS may develop emotional problems such as fear of meeting new people or speaking on the telephone, these problems often result from stuttering rather than causing the stuttering. Stuttering cannot be permanently cured. However, it may go into remission for a time, or clients can learn to shape their speech into fluent speech with FS approach, the most commonly used technique of facilitating speech fluency. Current work incorporates FS approach with DSP algorithms to analyze speech signal. 2.3 Basic Considerations When Assessing Young Children The approach for assessing children who may be in the early stages of stuttering evolved from the Riley’s clinical observations of sub-groupings of children based on risk factors for stuttering onsets and development [30]. There are, of course, many salient distinctions between intervention with children and with older clients [31]. Fluency is often variable with young children, a fact that makes both assessment and therapeutic progress somewhat more difficult to track for this age group. There is always the question of how much behavioural change is due to treatment and how much is due to the natural variability of the behaviour [32]. Other important differences when working with younger clients are the following [33]: 24 • Children are functioning with neurophysiological systems that are far from adult-like and are still in the process of maturation. • Depending on the child’s level of awareness and reaction to the stuttering experience, the SLP may select therapy techniques that are less direct than those used with adults. • Parents and a variety of other professionals, and particularly the child’s classroom teacher, play essential roles in the assessment process. • The SLP will more likely place greater emphasis on the evaluation and possible treatment of the child’s other communication abilities, including language, phonological, and voice. On occasion, some children will also present with a variety of other learning or behavioural problems. • The likelihood of achieving spontaneous or automatic fluency is much greater for young children than for adults. • There tends to be somewhat less effort needed for helping the child to transfer and maintain treatment gains into extra-treatment environments. • Relapse following formal treatment is not usually a serious problem, as it is with adult clients. An overemphasis on fluent speech as the only goal of treatment can easily lead to the child trying hard not to stutter, something he is already doing. During the assessment of CWS, there will be occasions when the particular behaviours the SLP wants to observe and evaluate are not present. For example, some children will never speak if there is someone present who is not a member of their immediate household. Although this also may occur during the assessment of AWS, it is more often in the case with children. On the day and time of evaluation, the child may fail to exhibit the behaviours that concern the parents or the teachers [34]. In some instances, despite the SLPs' best attempts within several speaking situations and environments, SLPs are unable to obtain samples of the fluency breaks that the child is apparently producing at home. It is usually possible to reschedule another assessment during a time when the child is experiencing more difficulty, or SLPs may observe the child in a more natural setting at home or in school. An alternative is to ask the parents to make an audio- or videotape of the child at home 25 as he is experiencing the fluency breaks they are concerned about. The concept of audio and visual recordings can be easily implemented in computer-based assessment tool. 2.4 Formal Measures of Assessment System Despite the problems associated with analysis, speech is a dominant factor as shown by the fact it is included in all the assessments mentioned at the outset of this section and offers the possibility of an objective measure. Some attempts have been made by researchers and SLPs seeking a diagnosis of stuttering using questionnaires and scales measuring a variety of factors to enhance a client's assessment. Erickson, for example, submitted that PWS differ from non-stutterers in their attitudes to communication and that as a function of such differences, the responses of PWS to inventory items about inter-personal communication would differ from the responses of non-stutterers [8]. These formal measures could only be used to assess whether a person is having stuttering or not. However, they are not designed for determining suitable therapy technique for each client as in current work. There are a number of assessment devices that the SLP may use to obtain a formal measure of the nature and severity of stuttering. In the following section, some of the assessment instruments that seem to be particularly useful are described. 2.4.1 Stuttering Severity Instrument (SSI-3) It is perhaps the mostly used of all scales for determining stuttering severity. It was designed by Glyndon D. Riley [8] for both children and adults. The newest edition provides scale values for stuttering severity for PWS. PWS who can read are asked to describe their job or school and read a short passage. Non-readers are given a picture task to which they respond. Scoring is accomplished across three areas. 26 The frequency of the fluency breaks is tabulated and the percentage of stuttering is converted to a task score. The duration of the three longest stuttering moments is tabulated and converted to a scale score. Lastly, physical concomitant across four categories is scaled on a 0-to-5 scale and totalled. The total overall score is computed by adding the scores for three sections. The scale is attractive because it can be used with virtually all age ranges and is easy to administer and score [35]. Reading task is slightly different from shadowing task where texts are not shown to client in shadowing task. The drawback of the scale is that there is no guideline to determine the score for each physical concomitant. It is totally based on SLP's perception. SLP has to calculate the task score manually which may take lots of time. 2.4.2 Modified Erickson Scale of Communication Attitudes (S-24) This popular and easy-to-administer scale has been used in many clinical studies and it was designed by Erickson [36]. PWS respond to a series of 24 true/false statements according to whether the statements are characteristics of themselves. The total score is obtained by tabulating one point for each item answered by the PWS [37]. This scale is human perception oriented where it may be inaccurate if a client does not answer the questions correctly or if a client is not sure of his own characteristic. This scale is designed to differentiate between a stutterer and nonstutterer and it cannot be used in determining suitable therapy technique. 2.4.3 Perception of Stuttering Inventory (PSI) PSI was designed by Woolf [8] to determine the client’s self-rating of his degree of avoidance, struggle and expectancy. The subject responds to each of 27 60 statements according to whether or not he feels they are “characteristic of me”. Statements that the person feels are not characteristic are left unmarked. This scale is similar to scale S-24 where client needs to do self-rating. The drawbacks are the same as stated in Section 2.4.2. 2.4.4 Locus of Control Behaviour (LCB) This scaling procedure was used by Craig et. al [36] to indicate the ability of a person for taking responsibility for maintaining new or desired behaviours. Subjects are asked to indicate their agreement or disagreement to each of the 17 statements about personal beliefs using a six-point scale. The scale has good internal reliability and scores are not influenced by age, gender or social desirability of responses. The scores of the 17 statements are summed to yield a total LCB score [38]. Since all forms of intervention for stuttering in one way or another ask the client to gradually assume responsibility for changing his speech, the LCB concept is intuitively appealing. However, this scale is more to a way for SLP to understand the client's feeling towards his stuttering rather than a method to assess client. 2.4.5 Crowe’s Protocols This protocol was designed by T. Crowe [39] and it provides a three- or seven-point scaling procedure, sections of the protocol provides forms for obtaining case history and cultural information as well as client self-assessment. Other components include assessment of affective, behavioural and cognitive features; speech status, stimulability and measures of severity. Several sections and forms are designed to provide information for counselling during treatment. Forms 28 are designed to be completed by the client or by the SLP through respondent interview. This scale is a combination of both human perception and technical oriented approaches where it consists of self-rating and also the measurement of stuttering severity. 2.4.6 Communication Attitude Test-Revised (CAT-R) This self-administered measure asks the child to respond to 35 true/false statements. One point is scored for each response that is similar to the manner a child who stutters would respond [40]. 2.4.7 A-19 Scale for Children Who Stutter Once a secure and trusting relationship is established between SLP and client, this 19-item scale helps to distinguish between CWS and those who do not. It is designed by Andre et. al [41]. Once the SLP is assured that the child understands the task, the scale is administered by the SLP, who asks the child a series of questions concerning speech and related general attitude. One point is assigned for each question that is answered as a CWS might respond [42]. Both questionnaires, the A-19 Scale for Children who Stutter and the CAT-R have been reported as useful in determining whether negative feelings about stuttering are affecting a child or not. Both questionnaires are composed of statements that the child reads and their recording of a true or false response. A distinct advantage to the CAT-R over the A-19 is that there are norms that guide SLPs in interpreting a child’s score. The CAT-R consists of 35 statements; for each response the child gives that matches the answer key, one point is assigned. The 29 higher the score on the CAT-R, the more negative the child’s emotion is regarding talking and stuttering. 2.5 Stuttering Therapy Techniques There are many therapeutic techniques that have been shown to help PWS [43]. Each technique can provide valuable information for the SLP and the clients and almost any technique helps to reduce stuttering for at least one person. Sometimes simply having the client tell his story and understand the basic dynamics of speaking or stuttering more easily, along with decreasing patterns of avoidance, is enough to bring about progress [44]. SLPs may recommend direct therapy with young children. The target speech behaviours are similar to FS therapy, but various toys and games are used. For example, a turtle hand puppet may be used to train the slow speech with stretched syllables goal. When the child speaks slowly, the turtle slowly walks along. But, when the child talks too fast, the turtle retreats into its shell [45]. There are many stuttering therapy techniques and only the most favoured ones are elaborated in the following sub-sections such as shadowing, metronome, DAF, rate control, regulated breathing (RB), easy onset, counselling and prolonged speech (PS). Shadowing, Metronome and DAF are chosen to be introduced in computer-based because they are among the most commonly used stuttering therapy techniques and they can be implemented in FS approach. They are proven to be effective in reducing stuttering severity in past research as stated in the following sub-sections. Minimal or no published treatment efficacy data are available for other techniques such as counselling. 30 2.5.1 Shadowing Shadowing is a stuttering treatment technique where the client repeats (shadows) everything the SLP reads from a book, where client stays a few words behind the SLP without seeing the text. Shadowing is the spontaneous speech equivalent of reading in chorus. A variation of shadowing is speaking in chorus with one’s echo. Some PWS become more fluent while shadowing, or concurrently repeating another person’s speech. The typical effect is to reduce the frequency of stuttering [20]. Finding [46] shows that shadowing will produce not only stutter-free and natural sounding speech but also reliable reductions in speech effort. However, these reductions do not reach effort levels equivalent to those achieved by normally fluent clients, thereby conditioning its use as a standard of achievable normal fluency by persistent stuttering clients. 2.5.2 Metronome Metronome-paced speech is a speech that is regulated by the beats of a metronome, a form of treatment used for stuttering. Syllables or word initiations may be regulated where it may be used to slow down or accelerate the rate of speech. Immediate effects of reduced or eliminated stuttering have been documented. Many PWS stutter less frequently when they pace their speech with a metronome – one word or syllable per beat [20]. The metronome beat can be delivered auditorily, visually, tactilely, or by some combination of these senses. The client is told to pace his or her speech while reading aloud or doing a spontaneous speech task with the beats of a metronome. This will cause PWS to concentrate more on how they are speaking and thus reduce their speaking rate. This technique has been used clinically for several centuries. 31 2.5.3 Delayed Auditory Feedback (DAF) A technological aid that has been effective in fluency training is the use of DAF, in which the client hears an echo of his own speech sounds. For some reason, this disruption, which would make it harder for most people to speak, tends to produce fluent speech in PWS. The purpose of DAF techniques is to help PWS focus on the proprioceptive feel of fluency and away from the sound of his new speech pattern. Once the client has gradually learned to maintain improved fluency under the distorted feedback, the delay intervals are varied in the direction of instantaneous or normal feedback, and the client learns to speak without DAF as he continues to use the slow speech along with an emphasis on proprioceptive feedback. Most PWS do so less severely while speaking under condition of DAF in which there is a 250-millisecond delay [19]. While doing so, their speaking rate tends to be slower than usual, and they tend to prolong sounds and syllables. The most typical effect for a person who hears his own speech after a delay is to slow down the rate of speech. DAF is a widely used stuttering treatment technique and it is a component in many programmed or comprehensive treatment approaches. For a more severe client, the SLP may use DAF initially to help the child experience some fluency in speaking. When used as the primary therapy technique, DAF helps the child learn "a new way of talking." Under DAF, the child tends to stretch the syllables in words and speak in shorter phrases and sentences. In therapy, the child works to keep speech fluent as the DAF effect is reduced step by step. When the child is able to use the new pattern in various situations, the SLP helps the child change to a more natural-sounding pattern [47]. 32 2.5.4 Rate Control In rate control therapies, PWS are trained to speak at a slow rate by deliberately and consciously prolonging syllables. Rate control procedures may also include other “ingredients” such as continuous vocalization, soft articulatory contacts and gentle voice onset. Typically, in the initial stages of treatment, rate is slowed to less than half the normal rate of speech (and highly exaggerated continuous vocalization, soft articulatory contacts and gentle voice onset are used if they are incorporated into therapy) to eliminate nearly all stuttering. Such changes in articulation and prosody can be achieved without having to continuously and closely monitor one’s speech because the changes are predictable, global and quite large. Thus, the initial stages of rate control therapies utilize a robust form of motorically driven speech construction to reduce stuttering and this is generally successful. Speech rate varies throughout an utterance. An important aspect of the account is that slowing needs to occur in local regions of speech (such as at the points where planning gets out of synchrony with execution). Global speech rate measures that have invariably been used in studies showing an association between speech rate and fluency, are relatively crude. Depending on the way that speech rate is slowed, it may or may not enhance fluency. For instance, speech rate can be reduced by slowing down all the speech proportionately or just by decreasing the rate of the slowest stretches. If there is a proportionate decrease of all stretches, this would reduce rate on the problematic fast stretches and fluency should increase [48]. 2.5.5 Regulated Breathing (RB) Regulated use of airflow used in the treatment of stuttering is called regulated breathing. It is effective in inducing stutter-free speech and it is often combined with other treatment targets including gentle phonatory onset and prolonged speech. 33 RB is a behavioural treatment for stuttering designed to address airflow irregularities by teaching breathing patterns that are incompatible with stuttering. RB consists of several different treatment components, including awareness training, relaxation, competing response training, motivation training and generalization training [19]. 2.5.6 Easy Onset Easy onset is a technique that emphasizes proper timing of, and tension in, the vocal cords. The person learns to start air flowing between the vocal cords and to bring the vocal cords together easily before starting a word. Sometimes, the individual is taught to "phonate" shortly before speaking. Also, the person learns to keep air flowing smoothly through the throat and mouth by using "light contacts." For light contacts, the person brings the lips, tongue, teeth, and roof of the mouth together with less effort when saying speech sounds. Pushing the lips together very hard on the "p" in "pill" means that air cannot flow through to say the rest of the word [19]. 2.5.7 Counselling Counselling is a collection of varied approaches to treating stuttering by giving information, advice and strategies to deal with the problem. There is a range of counselling techniques and most of them are psychologically oriented. The recipients are parents of CWS or AWS. It is often combined with direct methods of treating stuttering. The efficacy of counselling when used exclusively with no direct work with stuttering by either the SLP or the parent is not established. When combined with direct work on stuttering, whether counselling had any effect is unclear [49]. 34 2.5.8 Prolonged Speech (PS) PS is speech produced with extended duration of speech sounds, especially vowels, and particularly those in the initial position of words. It is a target behaviour in stuttering treatment and it is not a treatment procedure but it induces stutter-free speech which results in fluency that sounds unnatural and socially unacceptable. PS is often combined with airflow management and gentle phonatory onset. It is a common component in many contemporary stuttering treatment programs supported by clinical evidence [50, 51]. PS pattern is similar to a traditional rate control therapy. The latest form of PS is found in the Speecheasy device [52] which is a high-tech reincarnation of earlier equipment now worn entirely in the ears of the PWS, commonly adults. 2.6 Summary The basic characteristics and formal measures of stuttering assessment system are outlined in this chapter. The implementation designs of stuttering assessment system available publicly at the moment has been described in details. These provide a better understanding and fundamental knowledge of the stuttering assessment procedures. The computer-based Malay stuttering assessment system is an attempt to improve the performance of classical manual assessment approaches. The stuttering characteristics and its therapy techniques have also been elaborated. The following chapter delineates the framework design of our stuttering assessment system. CHAPTER 3 STUTTERING ASSESSMENT FRAMEWORK DESIGN 3.1 Introduction The previous chapter outlines the problems of classical manual assessment system. This chapter describes the support structure of the organization and development of computer-based stuttering assessment system. This chapter outlines the problem formulation and design principles of stuttering assessment. Section 3.2 and Section 3.3 describe the basic principles of assessment requirement and variables in choosing therapy techniques respectively. Section 3.4 details the basic stuttering treatment approaches. Section 3.5 elaborates the problem formulation. The underlying design principles are detailed in Section 3.6. Section 3.7 shows the criteria for selection of scoring parameters. Finally, Section 3.8 concludes Chapter 3. 3.2 Assessment Requirement: Principles and Strategies The main objective for undertaking an assessment is to determine whether the amount of repeating (or being disfluent in other ways) that a client is doing is abnormal. The features of stuttering assessment has been described in Section 2.1.1. The foundation of assessment is the interplay between the client and the SLPs as they work together through the stages of treatment. Clients are reassessed periodically to 36 determine whether there is any change in the set of behaviours that define their problem. While it is important for both adults and children to have speech disruptions assessed for evidence of stuttering, this is particularly crucial for children in the critical speech development years of 2 to 4. The consensus now is that stuttering should be treated as early as possible, primarily because it becomes less tractable as children get older. This is presumably because neural plasticity decreases with age. Early assessment is therefore essential. Once stuttering becomes chronic, communication can be severely impaired, with devastating social, emotional, educational, and vocational effects [53]. The proposed work focuses on CWS aged between 8 and 12 years old. An assessment for stuttering will usually involve 1) the collection of background speech, developmental and medical information from the parents or the adult client, 2) evaluation of speech samples collected in a variety of situations, including oral reading, conversation, and a spontaneous extended monologue, and 3) preparation of a written report, often using a diagnostic, in which the severity of stuttering and the severity of the observed secondary behaviours are documented. The SLP will often use the results of previously given phonological tests and (if there are obvious structural issues or voice irregularities) oral peripheral examinations. Supporting assessments by a psychologist or psychiatrist or a voice specialist may be used or requested by the SLP in some cases. 3.3 Variables in Choosing Therapy Techniques The therapy techniques a SLP selects obviously will be influenced both by what he or she is trying to accomplish and the specific behaviours that have to be modified. Accomplished SLP needs to be aware of many approaches. Children as well as adults will respond in unique ways to different therapeutic approaches [54]. The influence of any approach will always be somewhat different because of the characteristics of the SLP who is using the approach. The technique that works so 37 dramatically for one child does not necessarily work dramatically, or at all, for other children. Stuttering is so variable and so highly individualized that, few would disagree, no one method works for all children. There are several factors affecting the choice of techniques. What is possible during treatment is often determined by treatment variables such as the availability, setting and cost of services as elaborated in subsections below. Ideally, treatment will result in spontaneous fluency. 3.3.1 The Therapy History of Client A client’s therapy history can influence the choice of intervention strategy. If a client has tried a particular approach and has concluded rightly or wrongly that it was not effective, it may not be advisable to try that approach again, particularly if there is a viable alternative. The reason is that if the SLPs do so, the client is likely to expect it not to “work”, which could reduce the likelihood of it being effective [19]. The most basic part of any assessment is the SLP’s understanding of the client’s behavioural and cognitive features of the person’s problem, an understanding that can occur as SLPs come to appreciate the client’s story. Information is obtained from many sources during the assessment of clients. Additional speech samples from representative situations outside the clinic setting should also be obtained either prior to or following the assessment in the form of audio- or videotapes [55]. 3.3.2 The Age and Motivation Level of Client A client’s age and level of motivation to invest in therapy can also influence the choice of intervention strategy. Some strategies that are appropriate for use with adults may not be appropriate for use with young children. Also, some strategies that 38 would be appropriate for clients who are highly motivated would not be so with others. Voluntary stuttering is an example of such a strategy [56]. Motivation is the force that impels people to act. As one of the critical ingredients in successful treatment, motivation is something SLPs have to be sensitive to. Properly directed, it provides energy that leads to the change in therapy. Clients are unlikely to benefit from therapy if they are not sufficiently motivated to make the investments required. It is sometimes possible to increase their motivation for making these investments. If, for example, the reason for the lack of motivation was that the client did not expect therapy to be effective because of previous unsuccessful therapy experiences, proving that change is possible could result in an increase in his or her motivational level. An initial intervention goal for clients who believe this could be proving to them that they can change by engineering it so that they do change some aspect (s) of their behaviour. Virtually all clinical authorities agree that motivation of the PWS is a key feature of a successful treatment outcome [57]. Motivation is seldom maintained at a constant level at the outset of clinical setting. The client will probably have a kind of motivational reservoir he or she can draw on, but its depth and clarity will vary with his or her changes in mood, the successes and failures of day-to-day experiences, his or her sense of progress, and the like. 3.3.3 Economic and Time Constraints Therapy can be frustrating. It may take more than a little time and money. A SLP may reject a therapy technique that is likely to be effective because it is not practical for the reason of economic and time constraints. It may require the client to invest more time in therapy sessions and/or daily practice than he or she is willing or able to. Or it may require the client to make a financial investment that he or she cannot afford. 39 3.3.4 SLP’s Beliefs A SLP’s belief about the best way to modify behaviours is also likely to influence his or her choice of therapy techniques. Authorities do not agree about what techniques are most likely to be successful for modifying some of the behaviours exhibited by PWS, particularly the abnormal disfluency. As a result, a relatively large number of strategies for reducing stuttering severity have been advocated and used by at least a few SLPs [10]. They must be versatile in implementing therapy program to fit the strengths and weaknesses of clients. The SLP’s beliefs or hypotheses about the cause(s) of the behaviours he or she seeks to modify also play a role in choosing therapy techniques for clients. The program developed by SLP should be based upon his or her “best guess” as to why the client is exhibiting it. If the hypothesis is correct, the SLP is more likely to be successful than otherwise. 3.4 Basic Stuttering Treatment Approaches There are several treatment approaches for stuttering that may provide relief to varying degrees. The most commonly utilized techniques of facilitating fluency are fluency shaping (FS) and stuttering modification (SM). Both techniques were considered by PWS to be better than those strategies that were intuitive on the part of the speaker such as forcing out the speech or avoidance [58]. It is interesting to note that FS approaches tend to be favoured by SLPs with no personal history of stuttering. SM approaches, on the other hand, tend to be the treatment of choice by SLPs who themselves have experienced stuttering. Figure 3.1 shows the comparison between FS and SM. Some SLPs prefer to use the combination of FS and SM elements. This approach usually begins by teaching the PWS FS strategies to slow down and smooth out all of their speech. This eliminates most of the overt stuttering behaviour. For the moments of stuttering that remain, the PWS learn to manage SM strategies. In 40 addition, the SM phases of motivation, identification, and desensitization get incorporated into therapy to help the PWS manage the negative emotions that have built up around the stuttering. This dual approach is more forcefully applied to advanced PWS especially AWS than for beginners. It uses a variety of handouts such as understanding stuttering, how to be positive about stuttering and how to use feared words during treatment sessions [20]. FS tries to help a person speak more easily and fluently while SM helps a person to stutter more easily. The exact approach is determined by the age of the client, how long the person has stuttered and the severity of the stuttering, but measuring the success of treatment is difficult and far from an exact science. Figure 3.1: Comparisons between Fluency Shaping and Stuttering Modification 41 3.4.1 Fluency Shaping (FS) Many of fluency-producing activities involve combinations of altered vocalization or enhancement of the speaking rhythm [59]. FS approaches are also referred as fluency modification. The essence of FS is the establishment of fluent speech in a controlled clinical setting which effectively and durably replaces the chronic stuttered speech pattern with a newly learned prolonged and rhythmic fluent speech. Once fluent speech is attained, it is shaped and expanded so that PWS can gradually maintain fluency in conversational speaking situations both within and outside the clinical setting [8]. A FS approach is appropriate when stuttering can be easily eliminated with the use of fluency induction techniques and when the client exhibits very few fears and avoidances. FS approach tends to focus on the surface features of the syndrome. That is, the physical attributes of stuttering in terms of the normal or dysfunctional use of the respiratory, phonatory, and articulatory systems are central to the treatment process. This approach might be thought of as physical therapy for the speech production system. It stresses on the use of smooth, slower-than-normal transitions on the first two sounds of a word or utterance and an easy initiation of phonation with smooth articulatory movement during the utterance. The primary goal with FS strategy is to modify the surface features of the syndrome and not to deal directly with such intrinsic features as the client’s cognitions about loss of control or attitudes of fear or anxiety associated with stuttering [19]. The ultimate goal of FS is to have the fluent speech replacing the stuttered speech. One common FS approach begins with establishing fluent speech in short, one word, utterances, and then gradually increases the length and complexity of the utterances while maintaining fluency. A second common FS approach requires the PWS to alter their speaking pattern in a dramatic way and then move that altered fluent speech closer and closer to normal sounding speech. Some SLPs have combined these approaches by having clients alter their speaking pattern in an exaggerated way, for example speaking at 1 syllable per second by stretching out every sound in a word, establish fluency with this method in single syllable words, 42 move up to longer words and sentences, and increase the rate to something more normal, all the while maintaining a high level of fluency [48]. Some FS therapy programs have used DAF to help clients alter their speech [60]. This device makes the person hear their own voice slightly delayed. In order to overcome the delay, the PWS must talk very slowly and smoothly by stretching out the vowels and sliding all of the words together. The client begins at an extremely slow rate, around 50 words per minute (WPM), and then builds up to something slightly slower than normal, maybe 140 WPM, while maintaining the smoothness and sliding words together. 3.4.2 Stuttering Modification (SM) SM therapies focus on changing individual moments of stuttering to make them smoother, shorter, less tense and less penalizing. SM approach can be used when stuttering still persists after fluency induction techniques have been applied and when the client exhibits significant fear, and uses many postponement and avoidance behaviours. It tends to recognize the fear and avoidance that builds up surrounding the stuttering and consequently spend a great deal of time helping PWS to work through those emotions. The objective of SM is an easier and more fluid form of speaking, which may mean an easier form of stuttering [61]. The SM strategy requires the client not only to evaluate and change behavioural characteristics, but to self-monitor and self-manage cognitive and attitudinal features of the syndromes as well. Informal counselling in some form is typically an integral part of this approach. It is also referred to as the traditional, Van Riperian, or non-avoidance approach. It is based on the concept that a large part of the problem is the speaker’s struggle and avoidance of the core moment of stuttering [62]. SM therapy has four phases: identification, desensitization, modification and stabilization. Identification phase involves identifying the core behaviours, 43 secondary behaviours, and feelings and attitudes that characterize stuttering. Desensitization encompasses three stages. They are confrontation or accepting stuttering, freezing of stuttering moment and voluntary stuttering. Modification is the phase where easy stuttering is learnt through stages like cancellations, pull-outs and preparatory sets. Stabilization phase seeks to stabilize or solidify speech gains [63]. The Successful Stuttering Management Program (SSMP) [61] is an example of a SM treatment program with essentially no empirical evidence of its effectiveness. SSMP is an intensive 3-week residential program that is based on an amalgam of desensitization to stuttering, avoidance reduction therapy and the SM techniques. 3.4.3 Why Fluency Shaping (FS)? The proposed work is developed based on FS approach to replace the stuttered speech pattern with newly learned fluent speech. Research [43] indicated that respondents who had participated in FS treatments were more likely to report that they had experienced a relapse than those who had participated in SM or combined treatments. Moreover, there are near absence of empirically motivated treatment outcome studies for SM approach and it is only recommended if stuttering still persists after applying FS approach. In a study of a "smooth speech" FS stuttering therapy program, about 95% of PWS were "very satisfied" or "satisfied" with their speech at the end of the treatment [64]. A rigorous study [4] was carried out among 42 participants through the threeweek program at the Institute for Stuttering Therapy and Treatment in Edmonton, Alberta, Canada. The FS program was based on slow, prolonged speech, starting with 1.5 seconds per syllable stretch, and ending with slow-normal speech. The program also works on reducing fears and avoidances, discussing stuttering openly, and changing social habits to increase speaking. The program includes a maintenance stage for practicing at home. The therapy program reduced stuttering from about 15-20% SS to 1-2% SS. 12 to 24 months after therapy, about 70% of the 44 participants had satisfactory fluency. There is about 5% of participants that were marginally successful and about 25% had unsatisfactory fluency. Only one long-term efficacy study of a SM therapy program has been published in a peer-reviewed journal [61]. This study concluded that the program appears to be ineffective in producing durable improvements in stuttering behaviours. SM therapy assumes that PWS will never be able to talk fluently, and so the best a stutterer can hope for is to be a better communicator while still stuttering. The effectiveness of other, more recently developed stuttering therapies for producing fluent speech makes this assumption questionable. Study [61] indicated that naive or non-professional listeners responded less well to stuttering combined with SM techniques than they did to stuttering (only). In other words, listeners may prefer to listen to untreated stuttering than to listen to a stutterer using SM approach. 3.5 Problem Formulation The earliest known references to stuttering date back to about 2000 B.C. From the distant past until the recent stuttering therapy techniques have included everything from holding pebbles in the mouth to drug therapy. Stuttering therapy has many variations yet no treatment method or therapy technique has successfully and positively cured stuttering. 3.5.1 The Uniqueness of Each Individual There are many therapeutic approaches that have been shown to help PWS. To be sure, the logic and techniques associated with most intervention strategies provide the SLP with a framework and a sense of direction about the syndrome and 45 its treatment. Each strategy comes with its own doctrine. Each of these approaches can provide something of value for the SLP and her client, depending on such variables as the needs of the client, the stage of treatment, and the talent and experience of the SLP. Almost any therapy has the power to eliminate stuttering in someone, sometime and someplace. The uniqueness of each individual SLP and PWS prevents any specific recommendations of therapy techniques from being universally applicable [7]. Whatever the structure of treatment program, the process of change is far more complicated than the use of the dogma of a treatment method and associated criteria. Depending on the client, the SLP may use a variety of techniques and possibly more than one overall treatment strategy. Even if a single overall strategy is used, the application will never be quite the same with each client, for individuals often respond differently to identical techniques. In some cases a particular approach won, and in another investigation another method finished first. PWS vary a lot and some will improve most with a domineering SLP, others with more easy-going ones. The success of treatment is closely tied to the ability of an experienced SLP to determine a client’s readiness for change and adjust treatment techniques accordingly. Thus the utility of the techniques depends on the SLP’s ability to apply the right technique (s) at the right time. With this in light, the inclusion of a computer-based tool into stuttering assessment system assists SLPs in the process of determining suitable therapy techniques for each client. 3.5.2 Difficulty in Identifying Appropriate Therapy Technique Stuttering is a complex combination of attitude, behavioural, and cognitive features bound together with degrees of anxiety and fear. Because of its complex nature, stuttering is resistant to long-term change. Assessment of stuttering is multidimensional [3]. PWS in different stages of change need to be matched with different treatment processes, or improvement will be less likely to take place. This 46 way of considering the process of treatment coincides with the client-centred approach advocated by a computer-based stuttering assessment system. Fluency treatment that is based upon motor learning theory is advantageous to SLPs, particularly those who determine that, due to a client's lack of progress, alterations in treatment are warranted. However, the difficulty for these SLPs is not a failure to recognize that changes in the course of treatment. Rather, it is suggested that the difficulty for many SLPs is how to choose therapy technique and remain consistent with the goals of therapy [65]. The use of computer technology in speech assessment is still new. There are many clinical approaches to treat PWS. However, normally 2 to 3 months are required to determine suitable technique for each client. The advantage of the proposed system is that it duplicates the function of complicated and expensive acoustic equipment available only at well-equipped speech-language pathology clinics. 3.5.3 Time Consumption in Classical Manual Assessment System SLPs have to try every different approach depending on the needs and response of the client, which may take months of repeated procedures that are costly and overly generalized. Being able to move systematically and persistently toward distant goals is essential since treatment with PWS takes a considerable length of time. For PWS, practice with therapy techniques must take place for many months and years before the techniques become functional. Research [6] has found that there were at least 115 therapy techniques that decreased stuttering markedly. It is evident that comprehensive treatments for PWS involve so much more than simply “fixing the stuttering” and making people fluent. It is much bigger than that and far more tedious. Each treatment strategy requires the client to monitor and self-manage many aspects of his surface and intrinsic behaviours. Each strategy dictates that the client systematically learns and practice 47 techniques, first within the treatment setting and then – gradually – outside the security of the clinic, in real-world speaking situations. Each method places great emphasis on the client to take primary responsibility for his own self-management. In other words, many of these techniques require a conscious effort on the part of the clients. The success of therapy usually depends a great deal on the amount of effort the client. The fact is, the longer a SLP takes to assess a client, the more tedious and troublesome will be for a client [19]. PWS often finds traditional therapy to be a very tiresome and undesirable process. Successful intervention also requires continued commitment and motivation by the client. Therefore, element of motivation should be integrated into stuttering therapy. The problem could be eliminated if SLP uses a "user-friendly" stuttering relief approach where this could be enhanced by the use of computer-based system. An experienced guide such as computer-based scoring analyses can show the way or, at the very least, make the journey more efficient and often more pleasant. A problem in relying on speech measures alone is that stuttering can vary with speaking situation. The majority of children referred to SLPs for assessment are in a phase of development when speech is changing rapidly, research [66] has suggested that repeated observations should be made, because it is only with the passing of time that a greater degree of certainty can be given regarding the child’s dysfluency. Computer-based assessment tool can be used to assist SLP in these repeated observations which in turn make the job easier for SLP by saving a lot of time spent in manual assessment process. 3.5.4 Validity There is agreement that a reduction in stuttering frequency and severity is associated with effective treatment outcome [54]. But there is a major problem in the reliability and validity of clinic-based perceptual measures of stuttering. Stuttering 48 has varied treatment techniques, only a few have been tested for their efficacies. Some are questionable; some have uncontrolled clinical support; several are purely rational [61]. Numerous approaches exist to treat stuttering, yet there remains a paucity of empirically motivated stuttering treatment outcomes research. Despite repeated calls for increased outcome documentation on stuttering treatment, the stuttering literature remains characterized by primarily ‘‘assertion-based’’ or ‘‘opinion-based’’ treatments, which by definition are based on unverified treatment techniques and/or procedures [61]. Conversely, ‘‘evidence-based’’ treatments, based on well-researched and scientifically validated techniques, remain relatively rare in the field of stuttering and are usually limited to behavioural and FS. The objective assessment of specific stuttering treatment approaches is also important to elucidate therapy techniques that contribute to desirable outcomes. Different SLPs will interpret progress differently. The change of SLP in the assessment process for a particular client may lead to the disagreement over the result where the new SLP may have different perceptions on the efficacy of a therapy technique for the client. One way to alleviate this problem may be as proposed in current work where the proposed system is less oriented towards human perception, rather, it implements technical-oriented approach where standard scoring parameters could be used by SLPs to assess PWS by referring to the scoring generated automatically by the software. 3.6 Underlying Design Principles Therapeutic activities afford the client a structured opportunity to perform in a specific manner. From the SLP’s perspective, therapeutic activities create an occasion for assessment to measure a client’s performance level while demonstrating a specific behaviour in a controlled environment. A “good” activity meets the needs 49 of the client and the SLP. Regardless of the overall treatment strategy chosen by the SLP, all programs emphasize such factors as enhancing the client’s enjoyment of speaking, empowering the client to understand and use his “speech helpers”, using fluency facilitation techniques to achieve and expand fluency and to improve the client’s self-confidence as a speaker and a person. 3.6.1 Audio and Visual Feedbacks The proposed software tool enhances the assessment process by providing clients with the audio and visual feedbacks necessary to identify speech properties while still allowing the SLP to have control of the treatment process. PWS modify their speech in subtle and variable ways to gain control over stuttering and, in that, they appear to be similar to a well-known experimental technique for suppressing PWS known as response contingent stimulation [67]. Computer-based assessment system has the potential for substantially reducing the typical cost and time requirements especially for chronic or severe stuttering. As these conceptions develop, PWS become more concerned about their ability and more sensitive to evaluation, especially negative evaluation. Moreover, once they have developed a clear and coherent understanding of ability, the particular conception of ability they adopt will determine a great deal about their motivational patterns. It will influence such things as whether they seek and enjoy challenges and how resilient they are in the face of setbacks [68]. Whether the SLP chooses to work indirectly or directly with the PWS, however, the essence of treatment consists of both facilitating the child’s capacities to produce easily fluent speech and reducing the demands placed on the child that result in fluency disruption. The fluency-enhancing activities can provide highly dramatic results, and such instantaneous improvements tend to have the effect of making anyone who uses them an “expert” on how to help PWS. SLPs using operant and FS approaches generally obtain considerably more data and specify specific criteria for moving a child from one step of a program to another [8]. 50 Fluency enhancing procedures provide the PWS with techniques for both initiating and enhancing his fluency. The SLP cannot always assume that because the child’s speech is non-stuttered, it is necessarily fluent. Speech that is to be expanded and reinforced should have high-quality fluency, which is characterized by smooth and effortless production. FS approach consists of procedures to help the PWS more efficiently manage the breath stream, produce gradual and relaxed use of the vocal folds, use a slower rate of articulatory movement, make gradual and smooth transitions from one sound to another, produce light articulatory contacts, and keep an open vocal tract in order to counteract constrictions resulting from tension [8]. The proposed software is based upon the physical analysis of speech sounds as they are being uttered. It provides real-time measures of sounds, evaluates the sounds against standards for their production, and immediately signals the results of the evaluation in graphs plotted on the computer screen. The implementation of computer-based assessment system provides a faster way than traditional assessment process. Assessment process must be facilitated in a structured manner. Intervention for all clients is likely to be the most efficient when a careful analysis of the speaker’s capacities and responses to demands are factored into the treatment strategy. 3.6.2 Monitoring and Assessment Good fluency therapy respects the unique personal needs of each PWS. Only if SLPs maneuver in response to the circumstances or the facts presented by a PWS, which may not be those that have been taught or that are expected, it is likely, in the long term, that they will be able to assess and treat the PWS effectively. Short- and long-term outcomes of stuttering assessments of different SLPs with varying therapy programs should be evaluated with the same measurement instruments [69]. 51 Progress with PWS can be judged by such things as the improved use of techniques, decreased reliance on cueing by the SLP, increased control of fluency, taking risks, and decreased avoidance of speaking and speaking situations [2]. Stuttering therapies should be improved by feedback provided to the SLPs about the results of their therapies so that SLPs are well-informed about the clients’ receptiveness towards each therapy technique. This could be enhanced by a computer-based tool where it generates history file summarizing client's past attempts. A log of the client's scores is maintained in the personal file in computer. In addition, the personal score files maintain a count of the total number of times each utterance is practiced. SLP can use this information to determine suitable therapy techniques for each client. Moreover, it enables SLP to assess or monitor client progress and observe how much time client spends to practice. Based on client’s progress record, the supervising SLP is wellinformed on which displayed signals and measured parameters would be useful for improving the speech rehabilitation process. These features enable the client to be success for future therapy sessions. The combination of feedback, progress records, and customizable practice phrases would be a valuable asset to current assessment system. 3.6.3 Clinical Evaluation Any clinical program design must follow and constitute a major part of the present evidence-based practice (EBP) or treatment with PWS [70]. The concepts of establishment, transfer (generalization, out-of-clinic), and maintenance (over a long term period) must be employed. Follow-up post treatment data is collected to determine the positive, long-term effects of the programs. All these procedures have important contingency management features. EBP indicated that basically five steps are required in implementing clinical treatment system as shown in Figure 3.2. 52 Upon development of a new system, the system must be evaluated or tested in real situation, which is called "clinical trials". Clinical trials involve running supervised tests to determine the effectiveness and safety of proposed system with the aim of answering scientific questions about stuttering assessment. A clinical trial may be separated into phases, or steps, with each step designed to answer a separate research question. Test subjects are assessed carefully to make sure they are having stuttering and not other problems. A control group is essential so that a treatment group would have to ‘‘beat the odds’’ to provide convincing evidence of treatment effects. Control groups need to be appropriately matched on key variables factors known to be associated with design objectives [6]. Convert a clinical need into an answerable question Search for and find the best evidence to answer the question Critically evaluate the evidence for validity and applicability Apply the results to clinical practice Evaluate and audit performance Figure 3.2: Five Steps to Implementing EBP 3.6.4 Motivation Confounding factors in the treatment design must be addressed, monitored and if possible reduced or eliminated. This is essential as it can impact significantly on the outcome results. During assessment, features of treatments programs need to be carefully examined such as impact of different rewards and punishments, avoidance behaviours, impact of attrition of outcome, and effect of individual treatments in combined treatment programs [71]. For example, when a client uses a 53 fluency-facilitating technique and alters his usual tense and fragmented speech into a more open and forward-moving pattern, the SLP can reward the accomplishment. Rewards, if used should be structured so that the PWS can earn them in many speaking situations, and any changes to treatment should be discussed between the SLP, PWS and others who may be involved. Basic speech measurement should be implemented to inform treatment progress or detect signs of lack of progress, or progress plateau so that these potential barriers to treatment can be identified, discussed and changes implemented within the treatment period [72]. Perhaps the best treatment in this regard was Sheehan’s comment [73] that producing stutter-free speech is no more realistic than playing error-free baseball. He reasoned that because the person possesses the capacity to function in an errorfree manner it does not follow that this will always be the case. By decreasing demands, desensitizing the PWS to fluency-disrupting stimuli, and giving rewards for open, easy, and forward-moving speech, the child is guided step-by-step toward increased fluency. Many of the clinical therapy techniques require a conscious effort on the part of the PWS. Generally speaking, activities for PWS need to be more engaging -having more “entertainment value” and being of greater interest to the individual child. Additionally, a child may need more support making the connection between a therapeutic activity and its usage in the “real-world”. However, the basic elements of what makes an activity “good” (for the client and SLP) remain the same [74]. Therefore, the element of motivation must be integrated into stuttering therapy. Rewards are important to motivate the clients. Research [6] has supported that both tangible forms of rewards and verbal rewards were effective in reducing stuttering. Both forms of reward appeared to be successful, but their unique contributions could not be measured because treatment involved a number of therapy procedures. It is essential that the child enjoys the treatment and finds it to be a positive experience. The proposed computer-based assessment system displays speech waveforms and amplitude curve in graphical representation which can motivate and encourage PWS to practice their therapy more often. Moreover, 54 applause and compliments of firework displays are implemented for client who managed to obtain good scores. 3.7 System Design The proposed work is developed based on standard FS techniques used in fluency rehabilitation regimen. DSP techniques are implemented to analyze speech signals. The software is based upon the physical analysis of speech sounds as they are being uttered. Software provides real-time visual and audio bio-feedbacks where client's average magnitude profile (AMP) is displayed as it is spoken and it is superimposed on the SLP’s AMP. The maximum magnitudes of the clients’ and the SLPs’ speech signals, corresponding to AMPs, are determined and compared. The maximum magnitude is determined where a total of 15 neighbouring samples are summed to obtain a maximum value. The display of AMP is intended to convey to the client those locations where the client's utterance differed from the SLP’s in the aspect of start and end alignment, magnitude and duration. AMP of the spoken utterance is the primary source used to gauge the fluency and performance of the client. Software provides real-time measures of sounds, evaluate the sounds against standards for their production, and immediately signal the results of the evaluation in graphs plotted on the computer screen. Start location, end location, maximum magnitude and duration are compared between clients’ and SLPs’ AMPs to generate scoring, for each therapy technique. A log of the client's scoring is maintained in a personal file saved in computer. In addition, the personal score files maintain a count of the total number of times each utterance was practiced. SLP can facilitate each client’s progress by assessing client’s history file. The computational analyses help SLP to determine suitable techniques in a faster way. 55 3.8 Criteria for Selection of Scoring Parameters Assessment refers to the monitoring and evaluation of various aspects of a speech according to certain criteria [6]. The selection of scoring parameters is important because the parameters will influence in clinically significant outcome for stuttering treatments within an evidence-based framework. Moreover, it is important to make sure that a framework might lead towards outcomes that are meaningful for the SLPs and clients. A low score means the client had episodes of stuttering. Selection of scoring parameters is significant because the scorings are used to: (1) guide implementation of the program from week to week; (2) identify when the PWS has met criterion speech performance; and (3) check that the PWS’s speech continues to meet criterion speech performance in the long-term. The stuttering measurements enable the SLP and the parent to communicate effectively about the severity of the PWS’s stuttering throughout the treatment process. Any departure from the criterion speech performance, results in more frequent clinic visits and possibly an increase in therapy durations. Scoring generated by the proposed work consists of four parameters based on FS approach. The parameters are start location, end location, maximum magnitude and duration. The purpose of scoring is for the SLP to evaluate progress towards the goal of stutter-free, natural sounding, speech and to guide the client accordingly. Scoring assists SLPs in determining suitable therapy techniques for each client. PWS not only differ in how often they are disfluent, they also differ in how long their moments of disfluency tend to last. Although there are several ways to measure background noise level, the best way to fully characterize and determine background noise levels is to measure the noise over a period. This can be done by using a microphone and a computer-based data acquisition system that record the noise levels in a few seconds. During the recording, when amplitude measured is greater than background noise level, this location is identified as the start location [75]. 56 In stuttering assessment, the duration of tense pauses and prolongations is usually less than five seconds [76]. This plays a significant factor in the determination of end location. During the recording, the end location is identified when amplitude measured is equal to background noise level for 5 seconds. Maximum magnitude is another important parameter because a stutterer’s speech is usually abnormally loud or soft or whose voice is abnormally high- or lowpitched for his or her age and sex. While these behaviours may not be related to the person’s stuttering, they may be devices that he or she is purposefully using to reduce stuttering severity. A frequent use of the device indicates a greater severity of stuttering [19]. Moreover, PWS may appear relatively relaxed while being disfluent or may produce audible and/or visible signs of tensing during at least some of his or her disfluencies. Such tensing can be manifested in a number of ways (singly or in combination) including tense pauses, audible tension (strain) evinced in the voice while speaking, abnormally loudness of voice. The presence of a significant degree of abnormal loudness while being disfluent does appear to differentiate PWS (or who are at risk of developing stuttering) from their normal-speaking peers [8]. Therefore, the parameter of maximum magnitude should be included when assessing PWS. Duration of total speaking time is the common method in speech assessment. Any time-consuming disruption in speech flow, such as with syllable and whole word repetition, increased the duration of speech measurement. Easily made measures of duration can yield an indication of the tension that is occurring. Measures of duration such as waveform analysis are necessary for clinical evaluation. The degree of duration also may be reflected in the rate of speech in words or syllables per minute, with lower rates indicating greater severity [8]. 57 3.9 Summary The problem formulation and basic principle of stuttering assessment system are outlined in this chapter. The underlying design principle of the proposed computer-based stuttering assessment system has been described. The criteria for the selection of four scoring parameters have been elaborated. The following chapter delineates the development of a computer-based Malay stuttering assessment system. CHAPTER 4 DEVELOPMENT OF COMPUTER-BASED STUTTERING ASSESSMENT SYSTEM 4.1 Introduction This chapter outlines the works involved during the development of computer-based assessment system. Section 4.2 elaborates the system requirements of the assessment system. Before developing the software, the implementation design approaches available have been analyzed carefully and the most suitable mean was determined as the design choice. Section 4.3 details the overall system descriptions. The coding steps and the scoring algorithms are described respectively in Section 4.4 and Section 4.5. Finally, Section 4.6 concludes Chapter 4. 4.2 System Requirements The assessment system is compatible with any computer with a recent version of Windows. The computer screen can be set at any resolution with the 800*600 pixels as optimal functioning. The required hard disks space is 2MB. A printer is helpful for printing out the results and history files. The assessment system is made as simple as possible so that it is affordable and easily integrated into the current speech rehabilitation regimen. 59 4.2.1 Hardware Requirements The assessment system utilized a computer equipped with a sound card, a microphone and earphones. Sound card is a device that process audio data and send it to one or more speakers. Most sound cards are also capable of processing audio input from a microphone for various purposes. A standard PC sound card (or sound chipset) includes an analogue to digital converter (ADC) for converting external sound signals to digital bits, a digital to analogue converter (DAC) for converting digital bits back to sound signals, an Industry Standard Architecture (ISA) or Peripheral Component Interconnect (PCI) interface to connect the card to the motherboard, and input and output connections for a microphone and speakers. Either ISA or PCI sound card can be used. Microphone is an acoustic transducer that converts sound into an electrical signal. There are many types of microphones. Dynamic and condenser microphones are the most popular microphones in which either one can be used in our application. If a lapel microphone is used, the loudspeakers are not required. 4.2.2 Software Requirements The software is developed using Microsoft Visual C++ 6.0 running under Window XP. It has the potential for substantially reducing the typical cost and time requirements for chronic or severe stuttering. The software is developed as Microsoft windows application or GUI, which makes therapy user friendly. 4.3 System Descriptions The software is developed based on standard FS techniques used in fluency rehabilitation regimen. DSP techniques are implemented to analyze speech signals. 60 The maximum magnitudes of the clients’ and the SLPs’ speech signals, corresponding to the AMPs [77], are determined and compared. The maximum magnitude is determined where a total of 15 neighbouring samples are summed to obtain a maximum value. Start location, end location, maximum magnitude and duration are compared between clients’ and SLPs’ AMPs to generate scoring, the computational analyses help SLP to determine suitable techniques in a faster way. Three therapy techniques are introduced in computer-based method; these techniques are Shadowing, Metronome and DAF. Figure 4.1 describes the system block diagram. Sound recording of 5 goal utterances is implemented where the software incorporates functions record, playback, open, close and save of standard WAVE file. The 5 goal utterances are customized by the SLP for each client depending on the age and language level. Goal utterances are chosen based on their phonetic characteristics. Each goal utterance is stored in a separate WAVE file, which can be individually selected for practice. The duration for each target utterances is six seconds. 61 Background Noise Level Identification Initiate Practice Session Client practices to match goal utterance using real-time visual and audio feedbacks Practice Goal Utterances Wave files recorded by clinician Analyse and Compare Analyse and compare attempt utterance to goal utterance Display Scoring History File Update History File Text file used to assess client's progress Figure 4.1: System Block Diagram Background noise level is identified for each client’s environment. In the speech pathology clinic, after identifying the client’s stuttering problem, the SLP verbally records 5 speech utterances for client to practice during the assessment process. During the process, the client selects playback to listen to the SLP’s prerecorded utterances. The client speaks an utterance into a microphone. The client practices matching the SLP’s speech pattern via both audio and visual means. The client can audibly and repeatedly listen to the target utterance by selecting playback. The visual comparison is achieved via the display of AMPs of both the SLP and client utterances on the same axis. The software is able to calculate and display the client’s AMP as it is spoken. The SLP’s AMP is first drawn on the screen in red colour. The client’s AMP is then drawn in blue line in real time as the utterance is spoken. AMPs are displayed for client to copy as closely as possible and it conveys to the client those locations where the client's spoken utterance differed from the SLP’s 62 signal in the aspect of amplitude, duration, onset, and end location. The speech processing is done in real-time. Real time display of AMP is very important as it allows the client to instantly evaluate and compare their speech to that of the SLP. The client can then, if necessary, alter their speech as required to closely match the SLP’s AMP. This gives immediate feedback to the client’s performance relative to the goal, and allows the client to anticipate what amplitude or rate change that is needed to reach the goal. The scoring algorithms assess the client’s performance where the scoring routines compare the client's utterance to the reference utterance in four categories. They are start location identification, end location identification, maximum magnitude comparison and duration comparison. Upon completion of a practice, scores are assigned to each trial. The scores are displayed to the client, allowing the client to observe the progress being made. Software generates a history file summarizing the client attempts. A separate history file is created for each client. A log of the client's scores is maintained in the personal file in computer. In addition, the personal score files maintain a count of the total number of times each utterance was practiced. SLP can use this information to determine suitable therapy techniques for each client. Moreover, it enables SLP to assess or monitor client progress and observes how much time client spends to practice. Based on client’s progress record, the supervising SLP is well-informed on which displayed signals and measured parameters would be useful for improving the speech rehabilitation process. 4.4 Coding This section describes how each step of software development is done in Microsoft Visual C++ 6.0. 63 4.4.1 Audio File Format There are many types of audio file format and the most commonly used digital audio file format in computer is .WAV file. WAVE file is a file format for storing digital audio (waveform) data. This format is widely used in professional programs that process digital audio waveforms [78]. Wave file is used in present work where the initialization of wave file format is shown in detail in Appendix I. 4.4.2 Sampling Sampling is the process of converting a signal into a numeric sequence or a function of discrete time or space. The number of samples taken per second is known as the sampling rate. The sampling frequency should be at least twice the frequency of the highest frequency of interest in the input signal. The telephone system uses a sampling frequency of 8 kHz and it can capture only information up to 4 kHz. In studying speech recording, normally a sampling frequency of 16 kHz is used which gives information up to 8 kHz [78]. The choice of an appropriate sampling setup depends very much on the speech processing task and the amount of computing power available. The Nyquist rate is defined as twice the bandwidth of the continuous-time signal. It should be noted that the sampling frequency must be strictly greater than the Nyquist rate of the signal to achieve unambiguous representation of the signal. This constraint is equivalent to requiring that the system's Nyquist frequency, which is equal to half the sample rate, be strictly greater than the bandwidth of the signal. If the signal contains a frequency component at precisely the Nyquist frequency, then, the corresponding component of the sample values cannot have sufficient information to reconstruct the Nyquist-frequency component in the continuous-time signal because of phase ambiguity [79]. A 16 kHz sampling rate is a reasonable target for high quality speech recording and playback [80]. A sampling rate of 16 kHz is used in present work. 64 4.4.3 Resolution Bit When an acoustic signal is digitized, it is turned into a sequence of binary numbers by the analogue-to-digital hardware. It is an important process where a fixed number of binary digits is used to represent each sample and hence that the size of the smallest change that can be detected in the input is related to the number of bits used. Analogue to digital hardware uses a fixed sample size to represent the sampled acoustic signal; typically 12 or 16 bits are used per sample. A little arithmetic will tell that 12 bits will give a maximum of 212 = 4096 different numbers while 16 bits gives 216 = 65536 values. These numbers is used to represent the different input voltages taken from the microphone. When the hardware measures the size of the input voltage from the microphone, instead of calculating a voltage value, it merely assigns it a number on a scale of 0 to 65536 (for a 16 bit digitizer). Each sample of an audio signal must be ascribed a numerical value to be stored in the computer. The numerical value expresses the instantaneous amplitude of the signal at the moment it was sampled. The range of the numbers must be sufficiently large to express adequately the entire amplitude range of the sound being sampled. The number of bits used to represent the number in the computer is important because it determines the resolution with which the amplitude of the signal can be measured. If only one byte is used to represent each sample, then the entire range of possible amplitudes of the signal must be divided into 256 parts since there are only 256 ways of describing the amplitude. 16-bit samples are used in present work to provide the resolution necessary to calculate AMPs that portray the difference between an utterance spoken by SLP and an utterance spoken by client. 65 4.4.4 Mono Channel Sound files can be stereo, with a right channel and a left channel, or they can be mono with just one channel. Mono or monophonic describes a system where all the audio signals are mixed together and routed through a single audio channel. The key is that the signal contains no level and arrival time/phase information that would replicate or simulate directional cues. Mono systems can be full-bandwidth and fullfidelity and are able to reinforce both voice and music effectively [81]. Mono channel is used in present work where the main advantage is that everyone hears the very same signal, and, in properly designed systems, all listeners would hear the system at essentially the same sound level. This makes well-designed mono systems very well suited for speech reinforcement as they can provide excellent speech intelligibility. Mono system reduces the file size into half compared to stereo [81]. The wave file format is initialized for mono channel as shown in Appendix I. 4.4.5 DC Offset Removal For some audio files, the direct current (DC) or zero frequency component is not zero. This is called DC offset. DC offset is the average vertical offset from 0dB that is in the recorded wave form. Every sound card has its own unique DC offset [82]. Before applying a window function, the time domain data is corrected for any DC offset from zero. To increase the silence detection performance as needed during the background noise level identification, the DC offset should be removed from each sound file before the A/D conversion [82]. DC offset is undesirable because it tends to mean that the positive peaks of the waveforms are more likely to exceed the maximum level that can be represented. It is a common problem with PC sound cards. DC offset can cause problems when concatenating several messages in series. 66 DC offset also exacerbates background noise problems and it causes errors when trying to measure the noise floor of a recording. DC offset must be removed in order for the particular AMP to appear the same on all computers. The same speech waveform containing different DC offsets will result in two unique AMPs if the DC offset is included in the calculation of the average magnitude. The DC offset is calculated by determining the average value for each 25 ms segment of speech [77]. The DC offset is then subtracted from each sample value as shown in equation (1), which allows the level shifting of the speech signal back to zero. The program flowchart is shown in Figure 4.2. . . . . (1) 67 Start UINT i; int data [i]; double dc=0; double result=0; int M=400; i=0 dc = dc + data [i] i = i+1 No i < M? dc = dc/ (M+1) i=0 Yes data [i] = data [i] - dc i=i+1 Yes i < M? No End Figure 4.2: Flowchart: DC Offset Removal 4.4.6 Windowing The basic audio representation is expressed as amplitude change with time. This is the time domain representation. In the past, it is assumed that audio signal is stationary over a specified interval of time. Most audio signals are far too long to be processed in their entirety; it is necessary to divide the time-domain signal into windowed intervals and process each window individually [83]. A window is a temporal weighting function, which is applied to a signal before some other operation such as DSP algorithms. The act of windowing can have a dramatic 68 influence on the results. Three major aspects of windowing are window type, size and shift. Windowing can be seen as multiplying a signal by a window which is zero everywhere except for the region of interest, where it is one. Since signals are assumed to be infinitive, all of the resulting zeros can be discarded and concentrated on the windowed portion of the signal. A number of window types exist. Each has different characteristics that make one window better than the others. The most commonly used are Hanning, Hamming, Blackman, Kaiser and others windows as illustrated in Figure 4.3. The best window length depends on the characteristics of the signal to be analyzed. Figure 4.3: Common Window Functions Any windowing operation causes distortion, since the signal is being modified by the window. The algorithms for the windowing process are shown in Appendix I. The WAVE files are processed using analysis and synthesis window lengths of 400 samples with Blackman windowing. Blackman is chosen because it gives the best attenuation which means that it is clear to hear an audible difference between filtered and non-filtered samples [83]. Blackman window gives much better stop-band attenuation. The Blackman window offers a weighting function similar to the Hanning but narrower in shape. It has all the dynamic range any application should ever need. Blackman window equation is listed in equation (2). 69 . . . . (2) where 0 ≤ n ≤ N The window w(n) is chosen to have a duration of 25 ms or 400 samples. The 25 ms window is shifted by 10 ms steps. Therefore, a new window of 400 samples is calculated every 10 ms. A 25 ms window w(n) duration is a common window duration for time domain processing [84]. It is sufficient to capture all the stuttering disfluencies. The four disfluencies of stuttering are syllable repetitions, word repetitions, prolongation of a sound and blocking or hesitation before word completion. Calculation of AMP requires the calculation of 600 average magnitude values corresponding to the 600 frames in 6 seconds of speech data. The average magnitude is calculated for each 25 ms of speech data with a new average magnitude calculated every 10 ms. It is a fact that audio signals (both speech and music) are generally not stationary; they cannot always be said to be stationary over each of these windows. The window length chosen must strike a balance between being able to pick up important transient details in the audio, as well as recognizing longer duration and sustained events. The window length should be small enough so that the windowed signal block is essentially stationary over the window interval. Research [85] indicated that windows which are too short fail to pick up the important time structures of the audio signal. Conversely, windows which are too long cause the algorithm to miss important transient details in the music. 4.4.7 Time Domain Filtering Time-domain filtering is favoured because the assessment system is a realtime application in which it is important to process a continuous data stream and to output filtered values at the same rate as raw data is received [86]. Time domain 70 filters are used when the information is encoded in the shape of the signal's waveform. Time domain filtering is used for smoothing, DC removal, waveform shaping and others. Convolution is a mathematical operation which takes two functions and produces a third function. If the first vector is an acoustic signal and the second is the impulse response of a filter, then the result of convolution is a filtered signal. It is an operation equivalent to weighted differencing of the input signal. The filter provides the weighting coefficients. The formula of convolution, y = x*h where y is the output signal, x is the input signal and h is the filter impulse response. The input signal is the output signal from Blackman windowing operation and the impulse response is sin(2.0*PI*fc*(i-M/2))/(i-M/2). The code for the above operation is shown in Appendix I. 4.4.8 Recording and Playback of Speech Utterances Modern operating systems such as Windows provide a quite useful Application Programming Interface (API) for programming soundcards. The normal way of outputting audio is to open a device and writing blocks of data to this device. The audio data is generally written to output buffer. The output buffer is a block of memory which has several constrictions. The data in this buffer is usually transferred to the soundcard using the Direct Memory Access (DMA) controller. The DMA controller is a device which can copy data between memory and hardware devices without needing the CPU. Sound input works generally in the same way as the output except in opposite direction. The playback and recording process are described in Section 4.4.8.1 and Section 4.4.8.2 respectively. 71 4.4.8.1 Playback Playback is done via "blocks of data". Application reads a block of data from the WAVE file on disk. This block is passed to the driver for playback via waveOutWrite(). While the driver is playing this block, another block of data is read into a second buffer. When the driver finish playing the first block, it signals the program that it needs another block, and the driver passes that second buffer via waveOutWrite(). Program will now read in the next block of data into the first buffer while the driver is playing the second buffer. Again, this is all non-stop until the WAVE is fully played [81]. See Appendix I for detail description of the playback function. 4.4.8.2 Recording The device's driver manages the actual recording of data. This process can be started with waveInStart(). While a driver records digital audio, it stores data into a small fixed-size buffer. When that buffer is full, the driver "signals" the program that the buffer is full and needs to be processed by the program. The driver then goes on to store another block of data into a second, similarly-sized buffer. It is assumed that program is simultaneously processing that first buffer of data, while the driver is recording into the second buffer. It is also assumed that program finishes processing the first buffer before the second buffer is full [87]. When the driver fills that second buffer, it again signals the program that now the second buffer needs to be processed. While program is processing the second buffer, the driver is storing more audio data into the now-empty, first buffer. This all happens non-stop, so the process of recording digital audio is that two buffers are constantly being filled by the driver (alternating between the 2 buffers), while the program is constantly processing each buffer immediately upon being signalled that the buffer is full. Therefore, the process ends up dealing with a series of "blocks of data". See Appendix I for detail description of the recording function. 72 4.4.9 Background Noise Level Detection Noise is any unwanted signal mixed with the signal of interest. Referring to Figure 4.4, background noise level is measured for 5 seconds at the beginning of the assessment process. The noise level is measured by taking the power of two of the input noise, data [i]. Noise level is added up every time the application finishes the measurement of each interval. After the period 5 seconds, waveInReset is called to stop input on the input device and resets the current position to zero. All pending buffers are marked as done and returned to the application. The waveInClose closes the input device to stop the detection process. Start Notify user for Background Noise Level Identification unsigned int i; double noise=0; int data [i]; i=0 Open audio input device noise = noise + data [i]*data[i] i=i+1 Yes i < 5? No Close audio input device Notify user for the end of detection End Figure 4.4: Flowchart: Background Noise Level Detection 73 4.4.10 History File If this is the first time a client uses the software, a history file is created to be saved at any desired location. The history file summarizes the client attempts. Each client’s history file contains the client’s name, utterances attempted, date and time of attempt and attempt scores for start of speech, end of speech, maximum magnitude and duration. The data structure of history file is described in detail in Appendix I. 4.4.11 Client Identification After detecting the background noise level, the Setting dialog box prompts client to load five wave files which are pre-recorded and customized by SLP as illustrated in Figure 4.5. The count of technique numbers is incremented each time client loads a wave file. The client must load a total of five wave files, otherwise, the application will show an error message. Next, client is required to load text file which contains the sentences for each wave files. The sentences are displayed for therapy techniques of Metronome and DAF. A dialog box prompts client to create an individual history file. As shown in Figure 4.5 and Figure 4.6, client is required to enter the user name and the location for saving the file. Client is given choice to save the history file at any desired location. If this is the first time client uses the software, a new history file is created. Otherwise, client can choose to either update the previous database or create a new history file. 74 Figure 4.5 Dialog Box: The Loading of Wave Files Figure 4.6: Dialog Box: Client Identification 4.4.12 Compression and Decompression Using Speex CODEC stands for COmpression DECompression. A CODEC simply knows how to compress and decompress a given format. The aim of speech compression using Speex [88] in current work is to produce a compact representation of speech sounds such that when reconstructed it is perceived to be closed to the original [89]. 75 Each wave file consumes 192 k Bytes in size which may require a minimum of 3.84 M Bytes of hard disk storage for each practice session of a client. There are five sentences for each therapy techniques and client may repeat recording for a few times for the same sentence. The SLP may want to keep a record of client’s recorded utterances so that SLP can assess the wave files anytime to evaluate the progress and to decide which therapy technique give the best result for the particular client. This may require large storage which in turns increases the cost. Therefore, it is important to reduce wave file’s size by doing the Speex compression [88] and at the same time, the quality is not significantly reduced. Speex is mainly designed for 3 different sampling rates: 8 kHz (the same sampling rate to transmit telephone calls), 16 kHz, and 32 kHz. These are respectively referred to as narrowband, wideband and ultra-wideband. Speex allows both very good quality speech and low bit rate. Very good quality also meant the support of wideband (16 kHz sampling rate) in addition to narrowband (telephone quality, 8 kHz sampling rate). Referring to Figure 4.7 and Figure 4.8, each wave file is 192 k Bytes while each speex file is 22 k Bytes. This reduces 88.54% of original file which saves a lot of hard disk space. The bit rate is reduced from 256 bit/s to 28 bit/s. 76 Figure 4.7: Wave File Information Figure 4.8: Speex File Information 77 4.4.12.1 Encoding The base Speex distribution includes a command-line encoder, speexenc and decoder, speexdec. Speex encoding is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an integer, while for variable bit-rate (VBR), the parameter is a float. In order to encode speech using Speex, it is needed to include <speex.h>. The speexenc utility is used to create Speex files from raw PCM or wave files. It can be used by calling: speexenc [options] input_file output_file The value ’-’ for input_file or output_file corresponds respectively to stdin and stdout. The wideband is used to tell Speex to treat the input as wideband (16 kHz) [89]. The encoding process is called as shown in Figure 4.9 each time the client saves the wave files in the hard disk where the wave file is compressed to speex file. Figure 4.9: Encoding Process 78 4.4.12.2 Decoding The speexdec utility is used to decode speex files and can be used by calling: speexdec [options] speex_file [output_file] The value ’-’ for input_file or output_file corresponds respectively to stdin and stdout. When no output_file is specified, the file is played to the soundcard. The –mono forces the decoding process in mono [89]. The decoding process is called as shown in Figure 4.10 each time the SLP opens client’s recorded files for playback where the speex file is decompressed to wave file. Figure 4.10: Decoding Process 79 4.5 Scoring Scoring provides guidance for SLP to determine suitable therapy techniques for each client. It also enables SLP to evaluate progress towards the goal of stutterfree speech and to guide the client accordingly. Many of the clinical therapy techniques require a conscious effort on the part of the clients. Therefore, the element of motivation must be integrated into stuttering assessment system. It is essential that the child enjoys the system and finds it to be a positive experience. Computer-based assessment system displays speech waveforms and amplitude curve in graphical representation which can motivate and encourage children to practice their speech for longer periods. Rewards are important to motivate the clients. Both tangible forms of rewards and verbal rewards were effective in reducing stuttering. Both forms of reward appeared to be successful, but their unique contributions could not be measured because treatment involved a number of therapy procedures. Therefore, rewards of fireworks display and applause are implemented for clients who managed to obtain scores of 80 and above. The selection of scoring parameters is important because the parameters will influence our clinically significant outcome for stuttering assessment within an evidence-based framework. Moreover, it is important to make sure that a framework might lead towards outcomes that are meaningful for the SLPs and clients. Client’s AMP is compared with SLP’s AMP in four categories and scores are assigned to each category. They are start location identification, end location identification, maximum magnitude comparison and duration comparison as elaborated in the following sub-sections. 80 4.5.1 Start Location Starting points can be found by comparing ambient audio levels or acoustic energy with the sample just recorded [90]. Background noise level is identified for each client’s environment. When amplitude measured is greater than background noise level, this location is identified as the start location. There are 400 samples every 25ms, which means that there is 1600 samples for each 100ms. As described in Figure 4.11, a score of 100% is assigned if the client can align his or her utterance within 100 ms of the SLP’s utterance [8]. The start location of SLP and client are identified as start1 and start2 respectively. The score is reduced by 10% for each additional 100 ms that both the locations differ. In other words, whenever the difference between two locations is 100ms, 10% is deducted from the scores. If the difference is 1100ms or more, a score of 0% is assigned. 4.5.2 End Location End-point detection algorithms identify sections in an incoming audio signal that contain speech. Accurate end-pointing is a non-trivial task, however, reasonable behaviour can be obtained for inputs which contain only speech surrounded by silence (no other noises). Typical algorithms look at the energy or amplitude of the incoming signal and at the rate of "zero-crossings". A zero-crossing is where the audio signal changes from positive to negative or visa versa. When the energy and zero-crossings are at certain levels, it is reasonable to guess that there is speech [91]. Endpoint detection is harder for stuttering application because PWS tend to have tense pauses. The end location is identified when amplitude measured is equal to background noise level for 5 seconds. Duration of 5 seconds is chosen because the duration of tense pauses of PWS is less than 5 seconds [19]. The end alignment of the client is scored as described in Figure 4.12. A score of 100% is assigned if client can align his or her utterance within 100 ms of the SLP’s utterance. The end 81 location of SLP and client are identified as end1 and end2 respectively. The score is reduced by 10% for each additional 100 ms that both locations differ [8]. In other words, whenever the difference between two locations is 100ms, 10% is deducted from the scores. If the difference is 1100ms or more, a score of 0% is assigned. Start int i, start1, start2, data1, data2; i=0 start1 = 0 start2 = 0 data1 > noiselevel? data2 > noiselevel? No i=i+1 No i <6? Yes Yes start1 = i start2 = i startscore = [abs(start1 start2)]/1600 Yes startscore > = 10? startscore = 0 No startscore = 100 -startscore * 10 Save to History File End Figure 4.11: Flowchart: The Scoring of Start Location 82 Start int i, end1, end2, data1, data2; i, j = 0 end1 = 0 end2 = 0 data1 = noiselevel? data2 = noiselevel? No Yes i < 6? No Yes j=j+1 1 No j < 5? end1 = j end2 = j Yes endscore = [abs(end1 end2)]/1600 Yes endscore > = 10? endscore = 0 No endscore = 100 -endscore * 10 Save to History File 1 End Figure 4.12: Flowchart: The Scoring of End Location 4.5.3 Maximum Magnitude Another important factor in speech is the change in over-all amplitude of a sound over the course of its duration. The shape of this macroscopic over-all change 83 in amplitude is termed as the amplitude envelope. The amplitude envelope indicates the general evolution of the loudness of the sound over time [87]. A measure of the maximum magnitude is made as an attempt to determine how the maximum of client compares to that of the SLP. The maximum magnitudes of the clients’ and SLP’ speech signals, corresponding to AMP, were determined and compared. Figure 4.13 shows that the maximum magnitude is determined where a total of 15 neighbouring samples are summed to obtain a maximum value [77]. Calculation of the average magnitude requires the calculation of 600 average magnitude values corresponding to the 600 frames in 6 seconds of speech data. The maximum magnitude of SLP and client are identified as max1 and max2 respectively. A score of 100% is assigned if the difference between the client’s and SLP’s maximum magnitude is less than 4000. The score is reduced by 10% for each additional 4000 that the maximum values differ [8]. The algorithms are shown in Figure 4.13. 4.5.4 Duration Duration is defined as the period between the start and end locations. The duration is compared between the client’s AMP and SLP’s AMP. Referring to Figure 4.14, a score of 100% is assigned if the client’s duration differs from that of the SLP by less than 100 ms. The SLP’s and client’s duration are identified as dur1 and dur2 respectively. The score is reduced by 10% for each additional 100 ms that both locations differ. In other words, whenever the difference between two locations is 100ms, 10% is deducted from the scores. If the difference is 1100ms or more, a score of 0% is assigned [8]. 84 Start int i, long temp1, temp2, max1, max2; i=0 temp1 = 0 temp2 = 0 temp1 = Mag1 + temp1 temp2 = Mag2 + temp2 i=i+1 Yes i < 15? No max1 = temp1 max2 = temp2 No i = 15? 1 Yes temp1 = temp1 + Mag1 temp1 = temp1 - Mag1[i - 15] temp2 = temp2 + Mag2 temp2 = temp2 - Mag1[i - 15] No temp1 > max1? temp2 > max2? Yes max1 = temp1 max2 = temp2 i=i+1 No i < 600? max1 = i max2 = i Yes maxscore = [abs(max1 max2)]/4000 Yes maxscore > = 10? maxscore = 0 No maxscore = 100 - maxscore * 10 Save to History File 1 End Figure 4.13: Flowchart: The Scoring of Maximum Magnitude Comparison 85 Start int start1, start2, end1, end2, dur1, dur2; dur1 = end1 - start1 dur2 = end2 - start2 durscore = [abs(dur1 dur2)]/1600 Yes durscore > = 10? durscore = 0 No durscore = 100 -durscore * 10 Save to History File End Figure 4.14: Flowchart: The Scoring of Duration Comparison 4.6 Summary In this chapter, the hardware and software requirement for a computer-based stuttering assessment system are discussed. The application is developed in Microsoft Visual C++ 6.0 to run under Window XP. The coding steps are elaborated with the aid of C++ algorithms and dialog box displays to provide a better understanding of the assessment system. Based on these design approaches, a computer-based Malay stuttering assessment system has been tested and verified in the clinical trial as elaborated in the next chapter. CHAPTER 5 CLINICAL EVALUATION OF COMPUTER-BASED STUTTERING ASSESSMENT SYSTEM 5.1 Introduction The previous chapter demonstrates the procedures involved in developing the computer-based Malay stuttering assessment system. Clinical trials on control data and test subjects have also been carried out to make sure that the system runs properly before its practicality was verified. This chapter outlines the implementation and verifications of computer-based Malay stuttering assessment system at the primary schools and clinic in Johor Bahru. Section 5.2 describes in details the prerequisites of the experiments before the system was verified on the test subjects. Section 5.3 introduces the assessment procedures. Data collection is elaborated in Section 5.4. The result analyses of software and SLP are described and compared in Section 5.5. 5.2 Implementing Clinical Trials among School-age Children Generally, a clinical trial is a research study designed to answer specific questions about vaccines or new therapies or new ways of using known treatments. Clinical trials (also known as medical research or clinical research) are carried out to determine whether the developed stuttering assessment system is both safe and 87 effective in assisting SLP to determine appropriate therapy technique for each client [53]. The goal of carrying out clinical trial is to collect speech utterances of schoolage children in primary schools. Control data means different things in different designs — Moscicki regards this as coming from non-treated individuals as she considered randomized control designs [92]. Research on stuttering has been carried out for decades, and since the objective has been to be able to tell the difference (if there is any) between stuttered and normal speech, stuttering research has often included non-stuttered speech as control groups [93]. Research [94] suggests that control group is essential in such cases so that the treatment group would have to 'beat the odds' to demonstrate convincing treatment effects. Control data were selected among university students where participants in the control group were regarded by themselves and by the SLP as normally fluent. Their conversational and reading speech samples had to contain fewer than 2 %SS. 5.2.1 Test Subjects Since stuttering mostly appears at an early age, most studies on stuttering have been on children. The clinical trial was located within a qualitative, small group research design, which incorporated speech recording with 11 CWS. Test subjects were selected from 6 primary schools located in the Skudai, Johor. They are assessed at the Speech Therapy Unit, Hospital Sultanah Aminah, Johor Bahru. A total of 11 subjects participated, 10 males and 1 female. 10 of the test subjects had been diagnosed by SLP as having developmental stuttering and had been stuttering for at least six months. One of whom have been omitted from analysis because the subject in question was identified to be not a stuttering client. The age span was between 8 and 12 years old. The subjects were not familiar with speech technology in any way. 88 Test subjects were selected according to the following criteria: • Diagnosis of stuttering. Subjects were required to have been given a diagnosis of stuttering by a certified SLP in Hospital Sultanah Aminah, following a formal assessment. • Language. Subjects were required to be fluent in Malay language in order to minimize the problems caused by language inabilities, as the speech recording was conducted in Malay language. • Age. As this study focused on the perspectives of children, school-age children are preferable. Subject numbers were not based on power calculations, because, as is often the case with a low incidence disorder such as stuttering, requisite numbers of clinical subjects for adequate power were prohibitive in terms of feasibility for an initial study [95]. 5.2.2 Experimental Set-Up All subjects were recorded audibly by our executable software and both audibly and visually by video camera. The subjects wore lapel microphone positioned approximately 20 cm from the mouth. The sound level was set at a comfortable level for listening over earphones and the level was checked to ensure that it remained constant. The test was presented on a notebook positioned at a comfortable reading level for each subject. The speech samples for this experiment were video-recorded using a Panasonic MiniDV Digital Video Camera NV-DS25. MiniDV was used as medium for video camera. The setting for the subject included a plain background where the camera was positioned to only show a close-up of the subjects’ face and hands. Video samples contained any form of secondary coping behaviours such as facial grimaces, head turns, eye closure and so on. It was posited that an audiovisual component in conjunction with the speech sample would approximate a more 89 realistic face-to-face listening experience. The speech was recorded in DAT format and transferred digitally to computer for further processing. The software, Final Cut Pro HD 4.5 was used to transfer video data from video camera to computer using a fire wire. Setting up premiere for use with a DV camcorder requires computer to have an IEEE-1394 (also known as FireWire or iLink) interface installed [96]. The clinical trial was conducted in normal room environment. The subjects were tested in a quiet setting where one session requires approximately 5–10 minutes. Subjects were told that the recording was for testing of new software instead of telling them it was an evaluation program. This is because it has been recommended that speech measures be collected without clients’ knowledge that their speech is being evaluated, so that they do not react to being assessed and try to create a favourable outcome [6]. In addition, all the subjects were informally screened for the presence of any speech or language problem by a qualified SLP at the time of the fluency assessment. Stuttering was diagnosed by the SLP based on frequency of %SS (>5%) and/or the presence of significant speech-related struggle behaviour. At the beginning of the recording session, subjects were given a short practice for each task using sentences similar to the ones used during the actual experiment. A few sets of speech utterances have been recorded before the experiment because subjects were varied in ages and language level. 5.2.3 Scenario The subjects were given three tasks each. Each task contains 5 sentences. They were to speak into microphone. Three stuttering therapy techniques (Shadowing, Metronome and DAF) were implemented in computer-based method. Subjects can repeat listening and/or recording for each sentence as many times as desired. 90 5.2.3.1 Shadowing Task In shadowing task, at fist, subject listens to SLP's pre-recorded wave files and he or she is required to repeat (shadow) everything the SLP reads as shown in Figure 5.1. The SLP’s pre-recorded is first drawn in red colour followed by client’s amplitude in blue colour. Client's amplitude is displayed as it is spoken and it is superimposed on the SLP’s amplitude. The display of amplitudes is intended to convey to the client those locations where the client's utterance differed from the SLP’s in the aspect of start and end alignment, amplitude and duration. The dialog box will show the word “Shadowing” to indicate that client is using this therapy technique. Figure 5.1: Shadowing Task 5.2.3.2 Metronome Task In Metronome task, similarly, the subject listens to SLP's pre-recorded wave files. Then, he or she is told to pace his or her speech while reading aloud sentences 91 with the beats of a metronome, which is one word per beat. The sentences are shown on screen with comfortable font size for subjects to read as shown in Figure 5.2. Figure 5.2: Metronome Task 5.2.3.3 DAF Task In DAF task, the subject listens to SLP's pre-recorded wave files. He or she talks into the microphone and his or her speech is recorded and played back through earphones at 250 milliseconds of delay [97]. The sentences are shown on screen with comfortable font size for subjects to read as shown in Figure 5.3. 92 Figure 5.3: DAF Task 5.3 Assessment Procedures This software is designed for stuttering clients of any age especially CWS because stuttering should be treated in the early years, primarily because it becomes less tractable as children get older. It is an easy-to-use software program and most users will easily navigate the software in the first or second session. Assessment session consists of the following steps: First, the software prompts subject to keep silent for 5 seconds in order for the software to detect background noise level as shown in Figure 5.4. The detection of background noise level is important for scoring algorithms. Background noise level may vary with each subject's recording venue, time and condition. After 5 seconds, the software will signal subject for the end of background noise level detection as shown in Figure 5.5. 93 Figure 5.4: Detection of Background Noise Level Figure 5.5: End Detection of Background Noise Level Next, subject is required to select five wave files to be loaded into system as displayed in Figure 5.6. Normally, client visits SLP for an initial evaluation of the 94 speech fluency where SLP assesses the client’s speech pattern and assigns a set of sentences. Figure 5.6: Selection of Five Pre-recorded Wave Files After loading the wave files, subject is required to load the text file used for displaying the sentences during the recording of technique Metronome and technique DAF as shown in Figure 5.7. Figure 5.8 and Figure 5.9 indicates the client identification process where subject is required to key in his or her user name and create a personal history file. Subject can choose the desirable location to save the file. 95 Figure 5.7: Selection of Text File Figure 5.8: Input of User Name 96 Figure 5.9: Input of History File and Its Location Finally, the recording and playback session can be started where buttons are enabled as shown in Figure 5.10. Subject is guided from technique shadowing to technique DAF. Button Testing is clicked for subject to listen to pre-recorded SLP's wave file. At the same time, SLP's AMP is shown in red line on screen as displayed in Figure 5.11. Then, subject can start recording by clicking button Record. Referring to Figure 5.12, subject's AMP is shown in blue colour. 97 Figure 5.10: The Enabling of Buttons Figure 5.11: The AMP of SLP 98 Figure 5.12: The AMP of Client superimposed on SLP's AMP After completing all the three therapy techniques, the recorded wave files can be saved by using the Save Wave function. The Save As dialog box prompts subject to choose the directory path for saving the speex file as shown in Figure 5.13. 99 Figure 5.13: The File Saving of Recorded Utterances During the assessment session, SLP may need to refer to the client's past recording in order to evaluate the progress from time to time and to assist SLP in determining a new set of utterances for that particular client. This can be done by the Play Wave function where it enables both the SLP and client's wave files to be displayed on the same axis at anytime as displayed in Figure 5.14. Figure 5.15 indicated the fireworks display whenever a particular subject has achieved a scoring of 80 or higher. Meanwhile, applause is heard. 100 Figure 5.14: The File Playing of Both SLP and Subject's Utterances Figure 5.15: The Display of Fireworks 101 5.4 Data Collection The data collection process began by engaging each subject in a short conversation regarding their favourite sports or interests to obtain data on subject’s language level so that the author knows precisely which set of sentences to be used for that subject’s oral-reading task. At the beginning of the recording session, subjects were given a short practice for each sentence before the actual recording. There are two types of data collection. First, SLP observes the subject’s behaviour under conditions that were purposefully created, or structured. That is, the client is asked to do three oral-reading tasks (Shadowing, Metronome, DAF) and observes certain aspects of his or her behaviour while it is being done. The aspects are body language, physiological functioning, something he or she has written, or some combination of these. Second, the SLP also observes the aspects of a subject’s behaviour under conditions that were not created (structured). In this situation, he or she does not attempt to manipulate the behaviour. With the first, the SLP does attempt to do so by having the subject perform a task. With this, SLP observes the subject while the subject is having a conversation with author. Each set of speech samples contained three different speaking tasks equivalent to the shadowing, metronome and DAF. The duration of each sentence was approximately 6 s. Short conversation was included for each subject before going through three therapy techniques. Of the data collected, a total of 356 subject utterances have been analyzed in the present work including the control data and test subject data. In time, this amounts to 2136 seconds or 35.6 minutes of speech. Control data are totalled to 450 seconds or 7.5 minutes with 75 utterances. On the other hand, there are 1686 seconds of test subject data which are equivalent to 28.1 minutes or 281 utterances. The speech samples were presented to the SLP in both the audiovisual and audioonly mode. SLP judged each sample individually. 102 5.5 Results or Quantitative Analyses Evaluation is referred to tests used to measure a person's level of development or to identify a possible disease or disorder. Efficacy has been defined as the extent to which a specific intervention, procedure, regimen or service produces a beneficial result under ideally controlled conditions when administered or monitored by experts [98]. Sampling and satisfactory analysis of speech measures is essential for reliability and consistency. Although it is often recommended that a range of samples from different contexts be obtained, this can be impractical at times. It is also important to remember that a smaller amount of data analysed properly is worth more than a lot of data analyzed badly. There are different methods of clinical stuttering assessment [44]: • Amount of stuttering. Percentage of SS & stuttered words per minute (SW/M). In some cases, an increase in these measures may indicate progress in therapy. • Quality and characteristics of stuttering. Accessory behaviours during stuttering; amount and loci of tension. • Self-reported communication ease. Considers tension and emotions related to stuttering from the client’s perspective. • Increased communication participation, spontaneity, and risk-taking in speech. Raising hand in class, using telephone regularly, increased socialization with peers, increased participation in social groups, increased self-esteem and confidence in speaking, dating, no longer avoiding certain words/situations/people/topics. • Natural affect. Increased eye contact, more relaxed body postures, increased speech naturalness, effective nonverbal communication skills and gestures. It was discovered that some subjects tend to utter sentences without stuttering but they stuttered while speaking to author during the test. To what extent do the frequencies of these disfluencies (hesitation) types allow PWS to be distinguished from normal speakers? Representative data on the 103 frequency of occurrence of these behaviours in the spontaneous speech of ten test subjects are presented in Table 5.1. For example, subject A is diagnosed as having stuttering problem with the following characteristics: part-word repetition, prolongations, broken words, tense pauses, incomplete phrases and abnormal speaking rate. What appears to distinguish CWS from those who do not is the increased frequency of repetitions of words, phrases, and syllables and to a lesser extent prolonged sounds and broken words. It was also observed that the stuttering children had a greater number of repetition units per disfluency. The frequency of occurrences of stuttering is categorized in respective stuttering characteristics. These results suggest that stuttering is not a dichotomy but is rather a continuous scale where children who display an increased frequency of certain types of disfluencies are considered to be stuttering. Table 5.1: Occurrence Frequency of Stuttering Behaviours in Test Subjects Subject %SS Part-Word Repetitions A B C D E F G H I J Single & Multisyllabic Word Repetitions Prolongations Broken Words Stuttering Characteristics Tense Incomplete Interjections Revisions Abnormal Abnormal Pauses Phrases Speaking Loudness/ Rate Pitch Level 15.4717 13.2275 7.1856 10.4895 8.4507 13.2275 5.1661 9.6154 16.4384 26.4286 Table 5.2 indicated the range and quartile distribution of the frequency indices of disfluencies for each of the ten stuttering characteristics. The lowest index is the frequency of disfluencies per total words spoken by the subject who had the fewest occurrences; the highest index is the frequency found in the subjects who had the most occurrences. Q2 (the median) is the frequency exceeded by 50 percent of the subjects; Q1 is the frequency exceeded by 75 percents of the subjects; and Q3 is the frequency exceeded by 25 percent of the subjects. 104 Table 5.2: The Range and Quartile Distribution of the Frequency Indices for Stuttering Characteristics Stuttering Characteristic Categories Quartiles Lowest Index 0.0 Q1 1.3 Q2 1.4 Q3 2.3 Highest Index 2.8 Single & Multisyllabic Word Repetitions 0.0 0.0 0.2 1.4 2.1 Prolongations 0.7 1.9 2.5 3.2 3.7 Broken Words 0.0 0.9 1.4 2.7 3.5 Tense Pauses 0.0 0.0 0.6 1.1 2.0 Incomplete Phrases 0.0 0.0 0.2 1.1 2.0 Interjections 0.0 0.4 1.5 3.2 7.5 Revisions 0.0 0.0 0.7 0.8 1.6 Abnormal Speaking Rate 0.0 0.0 0.6 1.4 4.7 Abnormal Loudness or Pitch Level 0.0 0.0 0.3 1.2 2.1 Part-word Repetitions 5.5.1 Results Generated by Software As stated in previous chapters, the scoring displayed in the text file or personal history file can be used by the SLP to determine suitable stuttering therapy techniques for each client. Table 5.3 and Table 5.4 display the scoring generated by software for test subject and control data respectively. Figure 5.16 shows the utterances attempted, date and time of attempt and attempt scores for start of speech, end of speech, maximum magnitude and duration. 105 Table 5.3: Software Scoring for Test Subjects Subject A B C D E F G H I J Scoring (%) Technique Technique Technique Shadowing Metronome DAF 72 60 80 76 74 73 85 84 76 68 75 82 81 88 75 68 82 74 86 85 90 82 80 76 62 68 75 60 57 38 Table 5.4: Software Scoring for Control Data Control Data Scoring (%) Technique Technique Technique Shadowing Metronome DAF Normal 92 94 98 Late Start 0 2 0 Early End 0 2 2 Too Low 0 0 0 Too Loud 0 0 0 106 Figure 5.16: The Display of the Information of Attempted Utterances Figure 5.17 shows the scoring comparison between the normal control subjects and "Late Start" subjects. The blue bars indicate the average scoring of start location parameter generated by normal control subjects while the red bars show the average scoring of start location parameter achieved by the subjects who tend to start their speech late than the supposed start location. As illustrated in Chapter 4, a score of 100% is assigned for start location parameter if the subject can align his or her utterance within 100 ms of the SLP’s utterance. The score is reduced by 10% for each additional 100 ms that the both locations differ. 107 Score Scoring Comparison 120 100 80 60 40 20 0 98 94 92 Normal Late Start 2 0 Shadowing Metronome 0 DAF Therapy Techniques Figure 5.17: The Scoring Comparison for Start Location Parameter Figure 5.18 shows the scoring comparison between the normal control subjects and "Early End" subjects. The blue bars indicate the average scoring of end location parameter generated by normal control subjects while the red bars show the average scoring of end location parameter achieved by the subjects who tend to end their speech earlier than the supposed end location. As illustrated in Chapter 4, a score of 100% is assigned for end location parameter if the subject can align his or her utterance within 100 ms of the SLP’s utterance. The score is reduced by 10% for each additional 100 ms that the both locations differ. Scoring Comparison Score 100 94 94 94 80 60 Normal 40 Early End 20 0 2 2 0 Shadowing Metronome DAF Therapy Techniques Figure 5.18: The Scoring Comparison for End Location Parameter 108 Figure 5.19 shows the scoring comparison between the normal control subjects and "Too Low Amplitude" subjects. The blue bars indicate the average scoring of maximum magnitude generated by normal control subjects while the red bars show the average scoring of maximum magnitude achieved by the subjects who tend to speak in an extreme low sound level. As illustrated in Chapter 4, a score of 100% is assigned if the difference between the client’s and SLP’s maximum magnitude is less than 4000. The score is reduced by 10% for each additional 4000 that the maximum values differ. Score Scoring Comparison 100 80 60 40 20 0 92 90 84 Normal Too Low 0 Shadowing 0 Metronome 0 DAF Therapy Techniques Figure 5.19: The Scoring Comparison for Maximum Magnitude Location Parameter Figure 5.20 shows the scoring comparisons between the normal control subjects and "Too High Amplitude" subjects. The blue bars indicate the average scoring of maximum magnitude generated by normal control subjects while the red bars show the average scoring of maximum magnitude achieved by the subjects who tend to speak in an extreme high sound level. As illustrated in Chapter 4, a score of 100% is assigned if the difference between the client’s and SLP’s maximum magnitude is less than 4000. The score is reduced by 10% for each additional 4000 that the maximum values differ. 109 Score Scoring Comparison 100 80 60 40 20 0 92 90 84 Normal Too Loud 0 Shadowing 0 Metronome 0 DAF Therapy Techniques Figure 5.20: The Scoring Comparison for Maximum Magnitude Location Parameter Referring to Figure 5.21, among the three therapy techniques, in average, test subjects achieved the highest scores in technique Metronome, which is 75%. The same scores are achieved for both therapy techniques Shadowing and DAF. Average Score of Test Subjects Score Average Score 75.2 75 74.8 74.6 74.4 74.2 74 73.8 73.6 73.4 75 74 Shadowing 74 Metronome DAF Therapy techniques Figure 5.21: The Average Score of Each Therapy Technique 110 5.5.2 Results Analysis by SLP Result analyses were made in speech therapy room located at level 3, Polyclinic in Hospital Sultanah Aminah by SLP. The speech sample was presented to the SLP in both the audiovisual and audio-only mode. SLP judged each sample individually. Each test subject requires about 5 – 10 minutes for recording. Any part-word repetition, prolongation, or block was considered a stuttering episode. The %SS does not include counts of normal disfluencies. Measures of %SS were made by an SLP with 4 years experience in treating and measuring stuttering, who was independent of the study and had no knowledge of the participants. The SLP knew the topic of the research but not its details. %SS is simply the calculation of the total number of syllables containing unambiguous stuttering of any type divided by the total number of syllables assessed, and then multiplied by 100 to obtain a percentage. It is efficient because with relatively little practice it is possible to reliably count the frequency of breaks during both reading and conversational speech. Counts can be obtained by shadowing the syllable production of the subject and indicating those syllables on which stuttering occurs. Stuttered syllables can be indicated with a keyboard or by hand by marking dots and dashes for fluent and stuttered syllables, respectively. Although the datum %SS is commonly referred to as a stuttering rate or stuttering frequency measure, in the arithmetical sense it is a proportion. Hence, %SS scores will always be a positive number between zero and 100. Identification of stuttering was based on the stuttering taxonomy of [98]. Words were coded as stuttered if they contained any type of repeated movement (whole syllable repetitions, incomplete syllable repetitions, or multisyllable unit repetitions) or any type of fixed articulatory posture (with or without audible airflow). Each word was coded as stuttered only once, regardless of the number of different types of stuttering present within the word. Interjections such as ‘‘ah’’ or ‘‘um’’ were not counted or analyzed. 111 Based on [24], measurement of %SS for three techniques were made by SLP as shown in Table 5.5. From Table 5.5, it was diagnosed that two subjects (Subjects G and H) were identified as having mild stuttering, two subjects (Subjects C and E) were identified as having mild-to-moderate stuttering, three subjects (Subjects B, D, and F) as moderate, two subjects (Subject A and I) as moderate-to-severe, and one subject (Subject J) was identified as having severe stuttering. Out of ten subjects, only one subject had a history of receiving traditional speech therapy through his school system. The measure of %SS verified the scoring generated by the software as discussed in Section 5.5.3. Table 5.5: %SS for Each Therapy Technique Subject A B C D E F G H I J Percentage of Stuttered Syllables (%SS) Technique Technique Technique Shadowing Metronome DAF 15.2318 20.9677 9.6154 11.1111 13.4615 14.8649 4.7619 5.7692 11.5385 16.0000 7.8431 7.1429 8.6207 4.7619 11.9048 17.4603 7.6923 13.5135 4.7619 5.7710 3.8462 7.6923 9.6154 11.5385 22.4138 16.4179 12.7660 20.2381 23.0769 35.8696 None of the subjects reported having had any experience with DAF during previous speech therapy. In contrast, the fluent speech produced using DAF in experiments has been evaluated by SLP to sound natural. 5.5.3 Comparison between Software and SLP Analyses Figure 5.22 shows the comparisons between the results generated by developed software and results from SLP's analyses. The data indicated that SLP has agreed on all the therapy techniques determined by the scoring generated by the 112 software. The accuracy of the software is identified as 100%. This software has demonstrated great potential to aid SLP in determining suitable therapy technique for each of the PWS. For example, for Subject A, software scoring suggested that SLP should use technique DAF in the future speech therapy for Subject A because Subject A indicated the least stuttering while using technique DAF. By choosing the right therapy techniques, less therapy session will be required and the subject may recover in shorter time. During the assessment, SLP also agreed on the use of technique DAF for Subject A as Subject A performed the least %SS while using this technique. For Subject J, software scoring has shown that he scored the highest marks in technique Shadowing during the clinical trials. SLP listened to his speech recordings and manually made the %SS calculation. As shown in Table 5.5, the data indicated that %SS is the lowest in technique Shadowing which gave the same result as software scoring. The same method was used to compare the results between software scoring and SLP's manual calculation of %SS. Table 5.6: Comparison between the Determination of Therapy Technique for Each Test Subject by Software and SLP Subject Software SLP Technique Technique Technique Technique Technique Technique Shadowing Metronome DAF Shadowing Metronome DAF A B C D E F G H I J System effectiveness is the extent to which a system or software employed in the field does what it is intended to do for a specific population [99]. Based on the software and SLP analyses, the following are the common observation: 113 • Certain sounds are more likely to be stuttered than other sounds, mainly consonants, but with wide individual variations as to what particular sounds are problematic, although word-initial sounds are often a major determinant. • Certain parts of speech are more likely to be stuttered than other parts of speech, videlicet, adjectives, nouns, adverbs and verbs such as words belonging to open word classes. • The position of a word in a sentence affects the degree of difficulty it presents to the client, the first three words of a sentence being stuttered more often than words occurring later. • 5.5.4 Longer words seem to be stuttered more often than shorter words. Description of Individual Test Subjects The ten test subjects’ description is tabulated as in Table 5.6. Each subject is elaborated on his/her gender, age, stuttering severity, software scoring, %SS and the suitable therapy technique. Each subject responds in unique ways to different therapy technique. In some cases a particular approach won, and in another investigation another method finished first. The uniqueness of each individual PWS prevents any specific recommendations of therapy techniques from being universally applicable. Table 5.7: Description of Individual Test Subjects Subject Gender Age A Male 9 B C Male Female 9 11 D E Male Male 8 8 F G H I Male Male Male Male 8 11 11 12 J Male 12 Stuttering Severity Moderateto-Severe Moderate Mild-toModerate Moderate Mild-toModerate Moderate Mild Mild Moderateto-Severe Severe Shadowing Scoring %SS Metronome Scoring %SS DAF Scoring %SS Technique Chosen DAF 72 15.25% 60 20.97% 80 9.62% 76 85 11.11% 4.76% 74 84 13.46% 5.77% 73 76 14.86% Shadowing 11.54% Shadowing 68 81 16.00% 8.62% 75 88 7.84% 4.76% 82 75 7.14% DAF 11.90% Metronome 68 86 82 62 17.46% 4.76% 7.69% 22.41% 82 85 80 68 7.69% 5.77% 9.62% 16.42% 74 90 76 75 13.51% Metronome 3.85% DAF 11.54% Shadowing 12.77% DAF 60 20.24% 57 23.08% 38 35.87% Shadowing 114 5.6 Summary This chapter describes the implementation and verification of the clinical trial for our computer-based Malay stuttering assessment system. The assessment system has been tested and verified successfully in the clinical trial. Test subjects were selected from 6 primary schools located in the Skudai, Johor. They are assessed at the Speech Therapy Unit, Hospital Sultanah Aminah, Johor Bahru. A total of 11 subjects participated, 10 males and 1 female. Results generated by software and the SLP's analyses are detailed in Section 5.5.1 and Section 5.5.2 respectively with the demonstration of tables and graphs. The comparisons between both methods are analysed in Section 5.5.3. The data indicated that SLP has agreed on all the therapy techniques determined by the scoring generated by the software. This software has demonstrated great potential to aid SLP in determining suitable therapy technique for each of the PWS. CHAPTER 6 CONCLUSION 6.1 Introduction The thesis outlines the works on developing and implementing the computerbased Malay stuttering assessment system. The assessment system introduces stuttering therapy techniques in computer-based. The assessment system generates scoring to assist SLP in determining suitable therapy techniques for each client in a faster way. The computer-based stuttering therapy techniques developed in this project consists of Shadowing, Metronome and DAF. Hardware includes microphone, earphones and desktop equipped with sound card, while software means OS, the development tools and the necessary drivers. Window XP OS was chosen due to its availability and familiarity. Software development involved during the implementation of the computer-based assessment system has been described in details in Chapter four. In addition, a clinical trial has been carried out successfully to verify the operation of the computer-based stuttering assessment system. The software was developed based on standard FS techniques used in fluency rehabilitation regimen. DSP techniques were implemented to analyze speech signals. The maximum magnitudes of the clients’ and SLPs’ speech signals, corresponding to the AMPs, were determined and compared. The maximum magnitude was determined where a total of 15 neighbouring samples are summed to obtain a maximum value. Start location, end location, maximum magnitude and duration 116 were compared between clients’ and SLPs’ AMPs to generate scoring, the computational analyses help SLP to determine suitable techniques in a faster way. Clinical trials on control data and test subjects have also been carried out to make sure that the system runs properly before its practicality was verified. Control data were selected among university students where participants in the control group were regarded by themselves and by the SLP as normally fluent. Test subjects were selected from 6 primary schools located in the Skudai, Johor. The age span was between 8 and 12 years old. The subjects were not familiar with speech technology in any way. Of the data collected, a total of 356 subject utterances have been analyzed in the present work including the control data and test subject data. In time, this amounts to 2136 seconds or 35.6 minutes of speech. Control data are totalled to 450 seconds or 7.5 minutes with 75 utterances. On the other hand, there are 1686 seconds of test subject data which are equivalent to 28.1 minutes or 281 utterances. Software scoring was compared with SLP's calculated %SS. The data indicated that SLP has agreed on all the therapy techniques determined by the scoring generated by the software. The accuracy of the software is identified as 100%. This software has demonstrated great potential to aid SLP in determining suitable therapy technique for each of the PWS. Our hope is that researchers from different fields will join forces together in order to advance our knowledge of this disorder and its treatment. Without this approach, this progress will be as slow as in the last several decades. We believe this software tool will improve the effectiveness and availability of stuttering assessment. We hope that our software tool will provide insights into the implementation of computer-based Malay stuttering assessment system in Malaysia. 117 6.2 Future Works This thesis has concentrated on the development and verification of the computer-based Malay stuttering assessment system. This thesis has been written, hoping to stimulate some stuttering assessment ideas that can be incorporated into current best practices. The existing work can be further improved and enhanced. Several suggestions for future works include: • To introduce more stuttering therapy techniques in computer-based so that clients are given exposure to more therapy techniques and thus increase the accuracy in determining suitable technique for each client. Undoubtedly, future research similar in design to this study but drawing on different samples, will find additional guideposts that will also prove useful to SLP in selecting suitable therapy techniques for particular client. • To re-examined the current outcomes with larger and diverse samples and that the developed topic will be investigated through innovative methodologies in efforts to inform treatment directions in stuttering. Due to the well-documented variability of stuttering within subjects, speech samples should ideally be obtained under multiple conditions and on multiple occasions. This can be particularly important for young children, as stuttering has been reported to fluctuate greatly over time and sometimes cease entirely. Long-term assessments are essential. • To identify specific therapy procedures used in the stuttering therapy techniques that contribute the most to successful treatment outcomes as well as variables that are responsible for treatment failures based on the scoring generated by software. PWS deserve nothing less than rigorously tested and empirically supported treatments. Therefore, future research needs to identify new critical variables for study if sounder therapy techniques efficacy or evaluations are to become available. 118 • Except finding the suitable therapy techniques for each client (“Which treatment is the best?”), it can be improved to identify and match the characteristics of a client with the competencies and therapeutic philosophy of a SLP (or mentor) in order to promote a working alliance that is likely to result in a successful therapeutic outcome. • Further development of the animation engine that changes animations to follow client improvement. • Modification of software for other patient populations, such as Multiple Sclerosis or hearing impaired. 119 REFERENCES 1. Laura Plexico, Walter H.Manning, Anthony Dilolto. A Phenomenological Understanding of Successful Stuttering Management. Journal of Fluency Disorders. 2005. 30: 1-22. 2. Anderson, T. K., Felsenfeld, S. A Thermatic Analysis of Late Recovery from Stuttering. American Journal of Speech-Language Pathology. 2003. 12: 243253. 3. Onslow, M. Evidence-Based Treatment of Stuttering: IV. Empowerment through Evidence-based Treatment Practices. Journal of Fluency Disorders. 2003. 28: 237–245. 4. Boberg, E., & Kully, D. Long-Term Results of an Intensive Treatment Program for Adults and Adolescents Who Stutter. Journal of Speech and Hearing Research. 1994. 37: 1050-1059. 5. Hancock. Two- to Six-Year Controlled-Trial Stuttering Outcomes for Children and Adolescents. Journal of Speech and Hearing Research. 1998. 41:1242-1252. 6. Carys Thomas, Peter Howell. Assessing Efficacy of Stuttering Treatments. Journal of Fluency Disorders. 2001. 26: 311-333. 7. Harold B. Starbuck. Therapy for Stutterers. Memphis, Tennessee: Stuttering Foundation of America. 45 – 51; 1992. 8. Walter H. Manning. Clinical Decision Making in Fluency disorders. 2nd Edition. SanDiego, CA: Singular Thomson Learning. 228-296, 359-369; 2001. 9. Finn, P., & Felsenfeld, S. Recovery from Stuttering: The Contributions of the Qualitative Research Approach. Advances in Speech-Language Pathology. 2004. 6: 159–166. 120 10. Ingham, R. J., Kilgo, M., Ingham, J. C., Moglia, R., Belknap, H., & Sanchez, T. Evaluation of a Stuttering Treatment Based on Reduction of Short Phonation Intervals. Journal of Speech, Language and Hearing Research. 2001. 44: 1229–1244. 11. Davis, S., Howell, P., & Rustin, L. A Multivariate Approach to Diagnosis and Prediction of Therapy Outcome with Children who Stutter: The Social Status of the Child who Stutters. Proceedings of the Fifth Oxford Dysfluency conference. 2000. pp 32-41. 12. Rustin, L. Assessment Therapy Programme for Dysfluent Children. Windsor, England: NFER Nelson. 1987. 13. Rustin, L. & Cook, F. Parental Involvement in the Treatment of Stuttering. Language, Speech and Hearing Services in Schools. 1995. 26: 127-137. 14. Dworzynski, K. & Howell, P. Predicting Stuttering from Phonetic Complexity in German. Journal of Fluency Disorders. 2004. 29: 149-173. 15. Nail-Chiwetalu, B. Making Evidence-based Practice a Reality. Presentation to the SID-4 Leadership Conference. Boston, MA. August 4 2005. 16. Myers, F. L., St. Louis, K. O., Bakker, K., Raphael, L. J., Wiig, E. K., Katz, J., Daly, D. A., & Kent, R. D. Putting Cluttering on the Map: Looking Back. Seminar presented at the Annual Convention of the American SpeechLanguage-Hearing Association, Atlanta, GA. 2002. 17. Susca, M., & Healey, E. C. Listener Perspectives Along a Fluency-disfluency Continuum: A phenomenological Analysis. Journal of Fluency Disorders. 2002. 27: 135–161. 18. Conture, E. G. Stuttering: The Long and Winding Road from the Womb to the Tomb. Proceedings of the Sixth Oxford Dysfluency Conference. 2004. Leicester, UK: KLB Publications. 19. Franklin H. Silverman. Stuttering and Other Fluency Disorders. 2nd Edition. Needham Heights, M. A.: A Simon & Schuster. 178-181; 1996. 20. M.N. Hedge. PocketGuide to treatment in speech-language pathology. 4th Edition. London: Singular. 251-253; 1998. 21. Melissa Peter. STUT (E): Stuttering. Malaysian Association of Speech Language and Hearing. 2002. 121 22. Mulligan, H. F., Anderson, T. J., Jones, R. D.,Williams, M. J., & Donaldson, I. M. Dysfluency and Involuntary Movements: A New Look at evelopmental Stuttering. International Journal of Neuroscience. 2001. 109(1/2): 23–46. 23. Charles Van Riper. The Treatment of Stuttering. Englewood Cliffs: Prentice Hall. 203 – 368; 1973. 24. Bloodstein, Oliver. A Handbook on Stuttering. 5th Edition. San Diego: Singular Press. 1995. 25. Linda T. Miller, Christopher J. Lee. Gathering and Evaluating Evidence in Clinical Decision-Making. Journal of Speech-Language Pathology and Audiology. 2004. 28(2): 96-99. 26. Ezrati-Vinacour, R., Platzky, R., & Yairi, E. The Young Child’s Awareness of Stuttering-like Disfluency. Journal of Speech, Language, and Hearing Research. 2001. 44: 368–380. 27. Maner, K., Smith, A., & Grayson, L. Influences of Utterance Length and Complexity on Speech Motor Performance in Children and Adults. Journal of Speech, Language and Hearing Research. 2000. 43: 560–574. 28. Bosshardt, H. Effects of Cognitive Processing on the Fluency of Word Repetition: Comparison between Persons who Do and Do Not Stutter. Journal of Fluency Disorders. 2002. 27: 93-114. 29. Watkins, R., & Johnson, B. Language Abilities in Children who Stutter: Toward Improved Research and Clinical Application. Language, Speech and Hearing Services in Schools. 2004. 35: 82–89. 30. Boscolo, B., Bernstein Ratner, N.,&Rescorla, L. Fluency of School-aged Children with a History of Specific Language Impairment: An Exploratory Study. American Journal of Speech-Language Pathology. 2002. 11: 41–49. 31. Harris, V., Onslow, M., Packman, A., Harrison, & Menzies, R. An Experimental Investigation of the Impact of the Lidcombe Program on Early Stuttering. Journal of Fluency Disorders. 2002. 27: 203-214. 32. Arndt, J., & Healey, E. Concomitant Disorders in School-age Children who Stutter. Language, Speech, and Hearing Services in Schools. 2001. 32: 68–78. 33. Yairi, E., & Ambrose, N. Longitudinal Studies of Childhood Stuttering: Evaluation of Critiques. Journal of Speech, Language, and Hearing Research. 2001. 44: 867–872. 122 34. Yaruss, J. S., & Quesal, R. W. The Many Faces of Stuttering: Identifying Appropriate Treatment Goals. The ASHA Leader. 2001. 6(21): 4–14. 35. Julie D. Anderson, Mark W. Pellowski, Edward G. Conture. Childhood Stuttering and Dissociations across Linguistic Domains. Journal of Fluency Disorders. 2005. 30: 219–253. 36. J. Scott Yaruss, Robert W. Quesal. Overall Assessment of the Speaker’s Experience of Stuttering (OASES): Documenting Multiple Outcomes in Stuttering Treatment. Journal of Fluency Disorders. 2006. 31: 90–115. 37. Rachel Everard. Commentary on Partnerships between Clinicians, Researchers and People who Stutter in the Evaluation of Stuttering Treatment Outcomes. Stammering Research. 2004. 1(1): 24-25. 38. Mario Alberto Landera. The Self Perceptions of Adolescents who Stutter. Degree Thesis. Florida State University; 2004. 39. Crowe, T. A., DiLollo, A., & Crowe, B. T. Crowe’s Protocols: A Comprehensive Guide to Stuttering Assessment. San Antonio, TX: The Psychological Corporation. 2000. 40. DeNil, L., & Brutten, G. Speech-associated Attitudes of Stuttering and Nonstuttering Children. Journal of Speech and Hearing Research. 1991. 34: 60-66. 41. Guitar, B. Stuttering, an Integrated Approach to Its Nature and Treatment. Second Edition. Baltimore, MD: Williams & Wilkins. 1998. 42. Kaitlyn P. Wilson. Projective Drawing: Alternative Assessment of Emotion in Children who Stutter. Degree Thesis. Florida State University; 2004. 43. J Scott Yaruss, Robert W Quesal, Lee Reeves, Lawrence F Molt, Brett Kluetz, Anthony J Caruso, James A McClure, Fred Lewis. Speech Treatment and Support Group Experiences of People who Participate in the National Stuttering Association. Journal of Fluency Disorders. 2002. 27(2):115-134. 44. American Speech-Language-Hearing Association. Evidence-based Practice in Communication Disorders. Technical report. 2004. 45. Smits-Bandstra, S. M., & Yovetich, W. S. Treatment Effectiveness for School Age Children who Stutter. Journal of Speech-Language Pathology and Audiology. 2003. 27: 125–133. 123 46. Roger J. Ingham, Allison Warner, Anne Byrd, John Cotton. Speech Effort Measurement and Stuttering: Investigating the Chorus Reading Effect. Journal of Speech, Language, and Hearing Research. 2006. 49: 660-670. 47. Susca, M., & Healey, E. C. Perceptions of Simulated Stuttering and Fluency. Journal of Speech, Language, and Hearing Research. 2001. 44: 61–72. 48. H. S. Venkatagiri. Recent Advances in the Treatment of Stuttering: A Theoretical Perspective. Ph. D. Thesis. Iowa State University; 2004. 49. Yaruss, J. S. Evaluating the Treatment Outcomes for Adults who Stutter. Journal of Communication Disorders. 2001. 34: 163–182. 50. O’Brian, S., Cream, A., Onslow, M., & Packman, A. Prolonged Speech: An Experimental Attempt to Solve some Nagging Problems. Proceedings of the 2000 Speech Pathology Australia National Conference. 2001. Sydney, Australia. 51. O’Brian, Cream, Onslow & Packman. A Replicable, Nonprogrammed, Instrument-free Method for the Control of Stuttering with Prolonged Speech. Asia Pacific Journal of Speech, Language and Hearing. 2001. 6: 91-96. 52. Stuart, A., Kalinowski, J., Saltuklaroglu, T., Guntupalli, V. Investigations of the Impact of Altered Auditory Feedback in-the-ear Devices on the Speech of People who Stutter: One-year Follow-up. Disability and Rehabilitation. 2006. 28: 757-765. 53. Mark Jones, Mark Onslow, Ann Packman, Shelley Williams, Tika Ormond, Ilsa Schwarz, Val Gebski. Randomised Controlled Trial of the Lidcombe Programme of Early Stuttering Intervention. British Medical Journal. 2005. 331: 659. 54. Janis Costello Ingham. Evidence-based Treatment of Stuttering: I. Definition and Application. Journal of Fluency Disorders. 2003. 28: 197-207. 55. Garen Sparks, Dorothy E. Grant, Kathleen Millay, Delaine Walker-Batson and Linda S. Hynan. The Effect of Fast Speech Rate on Stuttering Frequency during Delayed Auditory Feedback. Journal of Fluency Disorders. 2002. 27: 187-201. 56. Ham, R. E. Therapy for Stuttering: Preschool through Adolescence. Englewood Cliffs, N.J.: Prentice Hall. 1990. 124 57. Amit Bajaj, Barbara Hodson and Carol Westby. Communicative Ability Conception among Children Who Stutter and Their Fluent Peers: A Quantitative Exploration. Journal of Fluency Disorders. 2005. 30: 41-64. 58. Crichton-Smith, I. Communicating in the real world: Accounts from People Who Stammer. Journal of Fluency Disorders. 2002. 27: 333-352. 59. Katrin Neumann, Christine Preibisch, Harald A. Euler, Alexander Wolff Von Gudenberg, Heinrich Lanfermann, Volker Gall and Anne-Lise Giraud. Cortical Plasticity Associated with Stuttering Therapy. Journal of Fluency Disorders. 2005. 30: 23-39. 60. Guitar, B. Stuttering: An Integrated Approach to Its Nature And Treatment. 2nd Edition. Baltimore: Williams & Wilkins. 1998. 61. Michael Blomgren, Nelson Roy, Thomas Callister and Ray M. Merrill. Intensive Stuttering Modification Therapy: A Multidimensional Assessment of Treatment Outcomes. Journal of Speech, Language, and Hearing Research. 2005. 48: 509-523. 62. Eichsta¨dt, A., Watt, N., & Girson, J. Evaluation of the Efficacy of a Stutter Modification Program with Particular Reference to Two New Measures of Secondary Behaviors and Control of Stuttering. Journal of Fluency Disorders. 1998. 23: 231–246. 63. Gregory, H. H. Stuttering Therapy: Rationale and Procedures. Boston: Allyn and Bacon. 2003. 64. Craig, A., Calver, P. Following Up on Treated Stutterers: Studies of Perceptions of Fluency and Job Status. Journal of Speech and Hearing Research. 1991. 34: 279-284. 65. Anthony J. Caruso, Edythe A. Strand. Clinical Management of Motor Speech Disorders in Children. 1st Edition. N.Y.: Thieme Medical Publishers. 1999. 66. Conture, E. G. Stuttering: Its Nature, Diagnosis, and Treatment. Boston: Allyn and Bacon. 2001. 67. Phyllis Bonelli, Maria Dixon, Nan Bernstein Ratner, and Mark Onslow. Child and Parent Speech and Language following the Lidcombe Programme of Early Stuttering Intervention. Clinical Linguistics & Phonetics. 2000. 14(6): 427- 446. 68. Dweck, C. The Development of Ability Conceptions: Development of 125 Achievement Motivation. San Diego, CA: Academic Press. 57-88; 2002. 69. Langefeld, S., Bosshardt, H.-G., Natke, U., Oertle, H.M., & Sandrieser, P. Fluency Disorders: Theory, Research, Treatment and Self-help. Proceedings of the Third World Congress of Fluency Disorders. 2001. Nijmegen: Nijmegen University Press. 2001. 359-360. 70. Sackett, D. L. and J. Sonis. Evidence-based medicine: How to Practice and Teach EBM. Edinburgh, New York: Churchill Livingstone. 250; 2000. 71. Bruce P. Bryan. Contigency Management and Stuttering in Children. The Behavior Analyst Today. 2004. 5: 144-150. 72. Langevin, M., & Kully, D. Evidence-based Treatment of Stuttering: III. Evidence-based Practice in a Clinical Setting. Journal of Fluency Disorders. 2003. 28: 219-237. 73. Sheenan, J. G. Problems in the Evaluation of Progress and Outcome. In W. H. Perkins (Ed.), Seminars in Speech, Language and Hearing. 1980. 389-401. New York: Thieme-Stratton. 74. Gary J. Rentschler. Developing Clinical Tools in Stuttering Therapy: Building Effective Activities. ASHA Convention. November 20, 2004. PA: Duquesne University. 2004. 75. G. Friedland, L. Knipping, J. Schulte, and E. T. a. E-chalk. A Lecture Recording System Using the Chalkboard Metaphor. International Journal of Interactive Technology and Smart Education. 2004. 1: 1. 76. Daniel Costa, Robert Kroll. Stuttering: An Update for Physicians. Canada Medical Association Journal. June 27, 2000. 162 (13): 1849-1855. 77. Selim S. Awad. The Application of Digital Speech Processing to Stuttering Therapy. Proceeding of 1997 IEEE Instrumentation and Measurement Technology Conference Ottawa, Canada. May 19-21, 1997. Ottawa, Canada: IEEE. 1997. 1361-1367. 78. Modaff, J. V. and D. P. Modaff. Technical Notes on audio Recording. Research on Language and Social Interaction. 2000. 33 (1): 101-118. 79. Andrew Bateman, Ian Paterson-Stephens. The DSP Handbook – Algorithms, Applications and Design Techniques. England: Prentice Hall. 2002. 80. Emmanuel C. Ifeachor, Barrie W. Jervis. Digital Signal Processing – A Practical Approach. British: Addison-Wesley. 1993. 126 81. Ken C. Pohlmann. Principles of Digital Audio. New York : McGraw-Hill. 2005. 82. Guojun Lu and Templar Hankinson. A Technique towards Automatic Audio Classification and Retrieval. Proceedings of 1998 IEEE Fourth International Conference on Signal Processing. October 12-16, 1998. Beijing: IEEE. 1998. 1142 – 1145. 83. Don Morgan. Practical DSP Modeling, Techniques, and Programming in C. Canada: John Wiley & Sons, Inc. 1997. 84. X. Huang, A. Acero, H.-W. Hon. Spoken Language Processing. Prentice Hall. 2001. 85. Michael Hazas, Hughes Hall. Processing of Non-Stationary Audio Signals. Msc. Thesis. University of Cambridge; 1999. 86. William H. Press, Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling. Numerical Recipes in C: The Art of Scientific Computing, 2nd Edition. UK: Cambridge University Press. 1992. 87. Sen M. Kuo and Woon-Seng Gan. Digital Signal Processors: Architectures, Implementations, and Applications. N.J.: Pearson/Prentice Hall, 2005. 88. Jean-Marc Valin. The Speex Codec Manual. GNU Free Documentation License, Version 1.1. 2003. 89. Taubman, D. “Speech and Audio Compression”, in Taubman, D., TELE4343: Source Coding and Compression Session 2, 2002, Lecture Notes, University of New South Wales. 90. Susan Weinschenk, Dean T. Barker. Designing Effective Speech Interfaces. Canada: John Wiley & Sons, Inc. 2000. 91. John G. Proakis, Dimitris G. Manolakis. Digital Signal Processing: Principles, Algorithms, and Applications. 3rd Edition. N. J.: Prentice-Hall, Inc. 1996. 92. Moscicki, E. K. Fundamental Methodological Consideration in Controlled Clinical Trials. Journal of Fluency Disorders. 1993. 18: 183–196. 93. Robert Eklund. Disfluency in Sweedish Human-human and Human-machine Travel Booking Dialogues. Ph. D. Thesis. Linkopings University; 2004. 94. Curlee, R.F. & Yairi, E. Early Intervention with Early Childhood Stuttering: A Critical Examination of the Data. American Journal of Speech-Language Pathology. 1997. 6: 8-18. 127 95. Jones, M., Gebski, V., Onslow, M., & Packman, A. Power in Health Research: A Tutorial. Journal of Speech, Language, and Hearing Research. 2003. 45: 243–255. 96. Adobe Systems Incorporated. Adobe Premiere Elements 2.0. 2005. 97. Ooi Chia Ai, J. Yunus. Overview of a Computer-Based Stuttering Therapy. Regional Postgraduate Conference on Engineering and Science (RPCES 2006). July 26-27, 2006. Skudai, Johore: UTM. 2006. 207-211. 98. Teesson, K., Packman, A., & Onslow, M. The Lidcombe Behavioral Data Language of Stuttering. Journal of Speech, Language, and Hearing Research. 2003. 46: 1009–1015. 99. Last, J. M. A Dictionary of Epidemiology. New York: Oxford University Press. 1983. 128 APPENDIX A WAVE FILE FORMAT Wave files have a master RIFF chunk which includes a WAVE identifier followed by sub-chunks. The data is stored in little-endian byte order. Field Length Contents ckID 4 Chunk ID: "RIFF" cksize 4 Chunk size: 4+n 4 WAVE ID: "WAVE" WAVEID WAVE chunks Wave chunks containing n format information and sampled data Format Chunk The Format chunk specifies the format of the data. There are 3 variants of the Format chunk for sampled data. These differ in the extensions to the basic Format chunk. 129 Field Length Contents ckID 4 Chunk ID: "fmt" cksize 4 Chunk size: 16 or 18 or 40 wFormatTag 2 Format code nChannels 2 nSamplesPerSec 4 nAvgBytesPerSec 4 Data rate nBlockAlign 2 Data block size (bytes) wBitsPerSample 2 Bits per sample cbSize 2 wValidBitsPerSample 2 Number of valid bits dwChannelMask 4 Speaker position mask SubFormat 16 Number of interleaved channels Sampling rate (blocks per second) Size of the extension (0 or 22) GUID, including the data format code The standard format codes for waveform data are given below. The references above give many more format codes for compressed data, a good fraction of which are now obsolete. 130 Format Code PreProcessor Symbol Data 0x0001 WAVE_FORMAT_PCM 0x0003 WAVE_FORMAT_IEEE_FLOAT IEEE float 0x0006 WAVE_FORMAT_ALAW 0x0007 WAVE_FORMAT_MULAW 0xFFFE WAVE_FORMAT_EXTENSIBLE PCM 8-bit ITU-T G.711 A-law 8-bit ITU-T G.711 µ-law Determined by SubFormat PCM Format The first part of the Format chunk is used to describe PCM data. • For PCM data, the Format chunk in the header declares the number of bits/sample in each sample (wBitsPerSample). The number of bits per sample is to be rounded up to the next multiple of 8 bits. This rounded-up value is the container size. This information is redundant in that the container size (in bytes) for each sample can also be determined from the block size divided by the number of channels (nBlockAlign / nChannels). o This redundancy has been appropriated to define new formats. For instance, Cool Edit uses a format which declares a sample size of 24 bits together with a container size of 4 bytes (32 bits) determined from the block size and number of channels. With this combination, the data is actually stored as 32-bit IEEE floats. The normalization (full scale 223) is however different from the standard float format. • PCM data is two's-complement except for resolutions of 1-8 bits, which are represented as offset binary. 131 Non-PCM Formats An extended Format chunk is used for non-PCM data. The cbSize field gives the size of the extension. • For all formats other than PCM, the Format chunk must have an extended portion. The extension can be of zero length, but the size field (with value 0) must be present. • For float data, full scale is 1. The bits/sample would normally be 32 or 64. • For the log-PCM formats (µ-law and A-law), the bits/sample field (wBitsPerSample) should be set to 8 bits. • The non-PCM formats must have a Fact chunk. Extensible Format The WAVE_FORMAT_EXTENSIBLE format code indicates that there is an extension to the Format chunk. The extension has one field which declares the number of "valid" bits/sample (wValidBitsPerSample). Another field (dwChannelMask) contains a bit which indicate the mapping from channels to loudspeaker positions. The last field (Sub-Format) is a 16-byte globally unique identifier (GUID). • With the WAVE_FORMAT_EXTENSIBLE format, the original bits/sample field (wBitsPerSample) must match the container size (8 * nBlockAlign / nChannels). This means that wBitsPerSample must be a multiple of 8. Reduced precision within the container size is now specified by wValidBitsPerSample. • The number of valid bits (wValidBitsPerSample) is informational only. The data is correctly represented in the precision of the container size. The number of valid bits can be any value from 1 to the container size in bits. • The loudspeaker position mask uses 18 bits, each bit corresponding to a speaker position (Front Left or Top Back Right), to indicate the channel to speaker mapping. This field is informational. An all-zero field indicates that 132 channels are mapped to outputs in order: first channel to first output, second channel to second output, etc. • The first two bytes of the GUID form the sub-code specifying the data format code, for example, WAVE_FORMAT_PCM. The remaining 14 bytes contain a fixed string, "\x00\x00\x00\x00\x10\x00\x80\x00\x00\xAA\x00\x38\x9B\x71". The WAVE_FORMAT_EXTENSIBLE format should be used whenever: • PCM data has more than 16 bits/sample. • The number of channels is more than 2. • The actual number of bits/sample is not equal to the container size. • The mapping from channels to speakers needs to be specified. Fact Chunk All (compressed) non-PCM formats must have a Fact chunk. The chunk contains at least one value, the number of samples in the file. Field Contents ckID 4 Chunk ID: "fact" cksize 4 Chunk size: minimum 4 dwSampleLength • Length 4 Number of samples (per channel) The Fact chunk "is required for all new WAVE formats", but "is not required for the standard WAVE_FORMAT_PCM files". One presumes that files with IEEE float data need a Fact chunk. • The number of samples field is redundant for sampled data, since the Data chunk indicates the length of the data. The number of samples can be determined from the length of the data and the container size as determined from the Format chunk. 133 • This is an ambiguity as to the meaning of "number of samples" for multichannel data. It should be interpreted to be "number of samples per channel". The statement is: "The <nSamplesPerSec> field from the wave format header is used in conjunction with the <dwSampleLength> field to determine the length of the data in seconds." With no mention of the number of channels in this computation, this implies that dwSampleLength is the number of samples per channel. • There is a question as to whether the Fact chunk should be used for (including those with PCM) WAVE_FORMAT_EXTENSIBLE files. One example of a WAVE_FORMAT_EXTENSIBLE with PCM data from Microsoft, does not have a Fact chunk. Data Chunk The Data chunk contains the sampled data. Field Length Contents ckID 4 Chunk ID: "data" cksize 4 Chunk size: n n Samples sampled data pad byte 0 or 1 Field Length Padding byte if n is odd PCM Data ckID 4 cksize 4 WAVEID 4 Contents Chunk ID: "RIFF" Chunk size: 4 + 24 + (8 + M * Nc * Ns + (0 or 1)) WAVE ID: "WAVE" 134 ckID 4 Chunk ID: "fmt " cksize 4 Chunk size: 16 wFormatTag 2 WAVE_FORMAT_PCM nChannels 2 Nc nSamplesPerSec 4 F nAvgBytesPerSec 4 F * M * Nc nBlockAlign 2 M * Nc wBitsPerSample 2 rounds up to 8 * M ckID 4 Chunk ID: "data" cksize 4 Chunk size: M * Nc* Ns sampled data pad M * Nc * Ns 0 or 1 Nc * Ns channel-interleaved Mbyte samples Padding byte if M * Nc * Ns is odd Non-PCM Data Field Length ckID 4 cksize 4 WAVEID 4 Contents Chunk ID: "RIFF" Chunk size: 4 + 26 + 12 + (8 + M * Nc * Ns + (0 or 1)) WAVE ID: "WAVE" 135 ckID 4 Chunk ID: "fmt " cksize 4 Chunk size: 18 wFormatTag 2 Format code nChannels 2 Nc nSamplesPerSec 4 F nAvgBytesPerSec 4 F * M * Nc nBlockAlign 2 M * Nc wBitsPerSample 2 cbSize 2 Size of the extension:0 ckID 4 Chunk ID: "fact" cksize 4 Chunk size: 4 4 Nc * Ns ckID 4 Chunk ID: "data" cksize 4 Chunk size: M * Nc * Ns dwSampleLength sampled data pad 8 * M (float data) or 16 (logPCM data) M * Nc * Nc * Ns channel-interleaved Ns 0 or 1 M-byte samples Padding byte if M * Nc * Nsis odd 136 Extensible Format Field Length ckID 4 cksize 4 Contents Chunk ID: "RIFF" Chunk size: 4 + 48 + 12 + (8 + M * Nc * Ns + (0 or 1)) WAVEID 4 WAVE ID, "WAVE" ckID 4 Chunk ID: "fmt " cksize 4 Chunk size: 40 wFormatTag 2 WAVE_FORMAT_EXTENSIBLE nChannels 2 Nc nSamplesPerSec 4 F nAvgBytesPerSec 4 F * M * Nc nBlockAlign 2 M * Nc wBitsPerSample 2 8*M cbSize 2 Size of the extension: 22 wValidBitsPerSample 2 at most 8 * M dwChannelMask 4 Speaker position mask 0 SubFormat 16 GUID (first two bytes are the data format code) ckID 4 Chunk ID: "fact" cksize 4 Chunk size: 4 4 Nc * Ns dwSampleLength 137 ckID 4 Chunk ID: "data" cksize 4 Chunk size: M * Nc * Ns sampled data pad M * Nc Nc * Ns channel-interleaved M-byte * Ns 0 or 1 samples Padding byte if M * Nc * Ns is odd • The Fact chunk can be omitted if the sampled data is in PCM format. • Microsoft Windows Media Player enforces the use of the WAVE_FORMAT_EXTENSIBLE format code. For instance a file with 24bit data declared as a standard WAVE_FORMAT_PCM format code will not play, but a file with 24-bit data declared as a WAVE_FORMAT_EXTENSIBLE file with a WAVE_FORMAT_PCM subcode can be played. 138 APPENDIX B STUTTERING SEVERITY INSTRUMENT (SSI-3) 139 APPENDIX C MODIFIED ERICKSON SCALE OF COMMUNICATION ATTITUDES (S-24) 140 APPENDIX D LOCUS OF CONTROL BEHAVIOUR (LCB) 141 APPENDIX E COMMUNICATION ATTITUDE TEST-REVISED (CAT-R) 142 APPENDIX F A-19 SCALE FOR CHILDREN WHO STUTTER 143 144 APPENDIX G STUTTERING PREDICTION INSTRUMENT FOR YOUNG CHILDREN (SPI) 145 146 147 148 APPENDIX H PHYSICIAN’S SCREENING PROCEDURE FOR CHILDREN WHO MAY STUTTER 149 APPENDIX I CODING This section shows the coding development using Microsoft Visual C++ 6.0. I-1 Audio File Format Wave files use the standard Resource Interchange File Format (RIFF) structure which groups the files contents such as sample format and digital audio samples into separate chunks, each containing its own header and data bytes. RIFF is a multimedia file format introduced by Microsoft and IBM in the early 1990s that is structured in "chunks." The chunk header specifies the type and size of the chunk data bytes. Wave files have a master RIFF chunk which includes a WAVE identifier followed by sub-chunks. The data is stored in little-endian byte order. 150 WAVEFORMATEX format is initialized for 16-bit, 16KHz, mono channel and Pulse Code Modulation (PCM). PCM is an audio format of “raw" audio. This is generally the format that audio hardware directly interacts with. Though some hardware can directly play other formats, generally the software must convert any audio stream to PCM and then attempt to play it. BlockAlign is used for buffer alignment where playback software needs to process a multiple bytes of data at a time. The cbSize is the extra information appended to the WAVEFORMATEX structure tightly. For PCM format, the cbSize is ignored. 151 I-2 Waveform Display There are 13 units for x-axis in milliseconds starting from 500ms and the TextOut defines the starting point of the text for logical x-coordinate and ycoordinate. Three digits are allowed after decimal point. There are 10 units for yaxis starting from 5000. MoveTo moves the current position to the point specified by x and y and LineTo draws a line from the current position up to, but not including the point specified. The dc, device context is a window data structure containing information about the drawing attribute of a device such as a display. MFC CPen class is used to create a solid pen for drawing solid lines with one pixel wide. The class encapsulates a windows graphics device interface (GDI) pen. Line 1 represents the SLP’s speech signal where the amplitude is drawn in red colour 152 RGB(255, 0, 0) while the line 2 is the client’s amplitude drawn in blue colour RGB(0, 0, 255). I-3 Playback How does the driver "signal" the program? The driver send messages to the program's Window, for example, the MM_WOM_DONE message is sent each time the driver finishes playing a given buffer. Parameters with that message include the address of the given buffer (actually the address of the WAVEHDR structure which encompasses the buffer) and the device's handle (the handle supplied when the device is opened). WaveOut sends audio data to a standard Windows audio device in real time. It is compatible with most popular Windows hardware. The data is sent to the hardware in uncompressed PCM format, and should typically be sampled at one of the standard Windows audio device rates: 8000, 11025, 22050, 16000 or 44100 Hz. Since audio devices generate real-time audio output, software must maintain a continuous flow of data to a device throughout simulation. Delays in passing data to the audio hardware can result in hardware errors or distortion of the output. This means that the process block in principle supplies data to the audio hardware as quickly as the hardware reads the data. However, the waveOut often cannot match the throughput rate of the audio hardware, especially when the simulation is running within execution rather than as generated code. Execution speed can vary during the simulation as the host operating system services other processes. WaveOut must therefore rely on a buffering strategy to ensure that signal data is available to the hardware on demand. 153 The codes construct a File Open dialog box for wave file. The software checks if the wave file formats are correct and copies the characters to new buffer, wavedata1 for playback. CFileDialog class encapsulates the windows common file dialog box which provides an easy way to implement File Open and File Save As dialog boxes in the application. Playing a wave file requires several steps. First, the data must be ready to be played. Once the wave data is available, it is required to open the wave device, prepare the wave header, and start playback of the wave data. Wave output device is opened for playback by calling the waveOutOpen. Before opening the device, though, it is a good idea to query the device to see whether it supports the format of the wave data. The waveOutOpen checks to see whether the device supports the given format, but it does not actually open the device. If the wave format is supported by the device, waveOutOpen returns 0 and the process move on to opening the device. If the wave format is not supported, waveOutOpen returns an error code, usually WAVERR_BADFORMAT. 154 The address of a variable containing a handle to the wave output device is passed for the first parameter where &hWaveOut is the pointer to buffer that receives a handle identifying the open output device. If waveOutOpen is successful (returns 0), the handle is returned when calling subsequent wave output functions. The second parameter is WAVE_MAPPER constant for use of the sound card's ID. It selects the output device capable of playing the given format, which is “.wav”. This constant tells Windows to select the sound card as the wave output device. The third parameter, &waveform points to the WAVEFORMATEX structure where the format of audio data to be sent to the device is stated. Finally, the CALLBACK_WINDOW constant is passed for the final parameter. This constant tells Windows to send any wave-out messages sent to the form's window procedure. The device is opened as long as the wave format is supported. The next step in playing a wave file is to prepare a wave header. The wave header is an instance of the WAVEHDR structure, and includes information about the wave buffer that contains the wave data. Specifically, it holds a pointer to the buffer and the size of the buffer in bytes. After a wave header structure is created and initialized, the waveOutPrepareHeader is called to assign the wave header to the currently open wave device. The waveOutPrepareHeader prepares a waveform audio data block for playback. WaveOutPrepareHeader() is used to initialize the buffer before reading into it. The wave handle, hWaveOut obtained from waveOutOpen is passed to waveOutPrepareHeader. A pointer, pWaveHdr1 is passed to the WAVEHDR structure and the final parameter is the size of the structure. At this point, the wave output device is opened, the header has been prepared, and it is ready to play the data in the buffer. 155 The waveOutWrite sends a data block to the output device with the size in bytes. The buffer must be prepared with the waveOutPrepareHeader before it is passed to waveOutWrite. Unless the device is paused by calling waveOutPause, playback begins when the first data block is sent to the device. If waveOutWrite is successful (returns 0), the wave file starts playing and control is immediately returned to the application. The next step is to detect when the wave file has finished playing so that the wave header can be cleaned up and the wave device is closed. The waveOutWrite starts the wave playing and immediately returns control to the application. This means that the application is fully operational while the wave file is being played. For this reason, the MM_WOM_DONE message must be used to determine when the file has finished playing. After the wave data has completed playback, the header that is prepared earlier must be unprepared and the wave output device is closed. The waveOutUnprepareHeader cleans up the preparation made by waveOutPrepareHeader. As always, the variable of wave handle is passed as the first parameter. The second and third parameters are identical to those passed in waveOutPrepareHeader earlier. Finally, the waveOutClose closes the wave device. If the wave device is not closed, it will be unavailable to other applications or to Windows. Wave file is always played asynchronously. The waveOutClose closes the output device. Buffers are stopped sending to the sound driver. API will simply cancel all pending buffers and send them back to the application. If the device is still playing a waveformaudio file, the close operation fails. In other words, waveOutClose will fail if there are pending buffers in the driver. 156 The PlaySound plays a sound specified by given filename, “beat.wav” where a beat is played every second for the second therapy technique, Metronome. The nword is defined as 1 second. The “beat.wav” is played asynchronously and it returns immediately after beginning the sound. PlaySound searches the following directories for sound files. The sound specified by “beat.wav” must fit into available physical memory and be playable by waveform-audio device driver. When a client talks into a microphone, his or her speech is recorded and playback through earphones at 250 milliseconds of delay for the third therapy technique, DAF. I-4 Recording The waveInPrepareHeader prepares buffer for input where hwavein is the handle to the waveform audio oinput device and pWaveHdr is the pointer to the 157 WAVEHDR structure. It is a structure that defines the header used to identify waveform audio buffer. The waveInAddBuffer() is used to supply each buffer to the driver. The first 2 buffers are supplied to the driver using waveInAddBuffer() before recording. Every time the program is signalled that a buffer is filled, waveInAddBuffer() is used to indicate what buffer the driver will use after it finishes filling whatever buffer it is currently filling. The waveInAddBuffer sends input buffer to the input device with the size of WAVEHDR structure in bytes. The waveInStart starts input on the input device with the handle to waveform audio input device. I-5 Compression and Decompression Using Speex 158 LPTSTR is a 32-bit pointer to a character string that is portable for Unicode. The FindLastOf uses its member function to find the last record that matches the condition. STARTUPINFOR is a structure used with the CreateProcess to specify main window properties when a new window of speex execution is created for the new process. PROCESS_INFORMATION structure is filled in by the CreateProcess with information about the newly created process and its primary thread. The GetModuleFileName retrieves the full path and file name for the file containing our specified module where the szPath is the pointer to the buffer that receives the path and file name. The length of the buffer is specified in MAX_PATH. If the length of the path and file name exceeds this limit, the string is truncated. 159 The &si pointer specifies how the main window for the new process should appear and the &pi receives the identification information about the new process created. The WaitForSingleObject returns when the command of decoding process completes where the time-out interval elapses. I-6 Save Wave Function Save wave function is implemented in the application. The first parameter of CFileDialog is set to FALSE to construct a File Save As dialog box. The dialog box 160 prompts client to save the speex file at desired location. A temporary path is created for the wave file where the lstrcat function appends the string of wave file.wav to szWavePath. The wave file is saved to temporary location. The try-catch block of exceptions is used to prevent continued operation if the program cannot obtain the required wave file. An ellipsis (...) is used as the parameter of catch where the handler will catch any exception no matter what the type of the exception is. This can be used as a default handler that catches all exceptions not caught by other handlers. When the exception occurs, the control goes to catch block. SpeexEncode is called to compress the wave file. The input file, which is a wave file, is given by szWavePath and the output file is saved at the full path entered in the dialog box. I-7 Play Wave Function 161 The Play Wave function prompts user twice in order to play both the wave files of SLP and client in a single screen. The first prompt is the SLP’s pre-recorded wave file and the second prompt is the client’s recorded speex file. The speex file is decompressed back to wave file in order to be displayed on screen. First prompt is for SLP’s pre-recorded wave file when button Play Wave is clicked. A File Open dialog box prompts user for loading the desired wave file. User browses to the directories where the wave file is saved. The wave file is checked for its mono channel, sampling rate, bit per sample, and PCM format to ensure it is the correct wave file format. Then, the data bytes in the wave file are copied to new buffer, wavedata1 before the application could display both the SLP’s and client’s AMPs. Second prompt is for client’s speex file when Play Wave button is clicked. The openfile2 is used to prompt user for the second audio file to be played which is the “.spx” file recorded by client. The client’s speex file is decoded to temporary wave file format in order to display both wave files in a single screen. The wave file is checked whether it is in defined wave file format. After it is checked, the data 162 bytes in the wave file are copied to new buffer, wavedata2 before the application could display both the SLP’s and client’s AMPs. After having both wave files in the buffers, the application is ready to display both the SLP’s and client’s wave files on screen. Again, the waveform audio output device is opened for playback. I-8 DC Offset Removal 163 I-9 Time Domain Windowing and Filtering I-10 Background Noise Level Detection 164 During the recording of background noise, gcvt converts the floating-point value of background noise level to a character string (which includes a decimal point and a possible sign byte) and stores the string in buffer, buff. The buff should be large enough to accommodate the converted value plus a terminating null character, which is appended automatically. This produces 20 digits in decimal format. Upon the detection of background noise, a message is sent to inform the client about the end of detection. I-11 History File 165 CFile object is constructed from a path where three actions can be taken when opening the file. The modeCreate directs the constructor to create a new file. If the file exists already, it is truncated to 0 length. For modeNoTruncate, if the file being created already exists, it is not truncated to 0 length. Thus the file is guaranteed to open, either as a newly created file or as an existing file. This might be useful, for example, when opening a history file that may or may not exist already. The modeWrite opens the file for writing only. A structure is defined for all the members in history file. Three stuttering therapy techniques are introduced in computer-based method. The techniques are 166 Shadowing, Metronome and DAF. History file includes the scoring of three therapy techniques in which the technique names are stated together with the scoring. The fopen opens for both reading and writing with the condition that the file must exist. The fgets reads a string from the input argument of FILE structure and stores it in sentence [i]. The fgets reads characters from the current stream position to and including the first newline character, to the end of the stream, or until the number of characters read is equal to n – 1, whichever comes first. I-12 The Loading of Wave Files 167 I-13 The Loading of Text Sentences I-14 Client Identification 168 Only after the client identification’s procedure, the features of Next Technique, Next Sentence, Testing, Recording, Save Wave and Play Wave are enabled. GetDlgItem retrieves handle to the control where the EnableWindows() enables the mouse and keyboard input to the specified control when it is set to “TRUE”. Otherwise, it disables any input when it is set to “FALSE”. I-15 Buttons Initialization The button Next Technique (IDC_Prev) enables client to proceed to next therapy technique after completing the previous one. Button Next Sentence (IDC_Next) is used to proceed to the next sentence every time client finishes recording for the previous sentence. The button Testing (IDC_Testing) is pressed whenever clients wish to listen to the playback. They can repeat the playback as many times as they wish. The same thing goes to the button Record (IDC_RecordTest) where clients can record as many times as they wish. 169 I-16 Scoring Parameter Definition The noscore is the number of scoring for each scoring category. For each therapy technique, the count is incremented one every time the client repeat recording for the same sentence. For example, the scoring for start location parameter is added to accumulative scoring upon each recording. The average scoring of each parameter (grade, grade1, grade2, grade3) is calculated by dividing the total scores (sscore, escore, mscore, dscore) with the total recording attempts (noscore). The average score for each sentence is calculated by dividing the total scores (startscore, endscore, maxscore, durscore) of all parameters with four. PlaySound is called to play the applause specified by given filename, “applause.wav”. The applause is played asynchronously and it returns immediately after beginning the sound. PlaySound searches the following directories for sound files. The sound specified by “applause.wav” must fit into available physical memory and be playable by waveform-audio device driver. 170 A CBitmap object, bitmap is constructed with a bitmap handle to it with one of the member function, LoadBitmap. This member function loads the bitmap resource ID number of the bitmap resource, IDB_BITMAP1 from the application’s executable file. The loaded bitmap is attached to the bitmap object. The BitBlt copies the bitmap of fireworks display from the source device context to current device context. I-17 Scoring for Start Location 171 I-18 Scoring for End Location I-19 Scoring for Maximum Magnitude Comparison 172 I-20 Scoring for Duration Comparison

COMPUTER-BASED MALAY STUTTERING ASSESSMENT SYSTEM OOI CHIA AI

Related documents

Products

Support

COMPUTER-BASED MALAY STUTTERING ASSESSMENT SYSTEM OOI CHIA AI

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib