Abstract In this research an expert system for automated detection of abnormality of electrocardiogram (ECG) signal is developed. For this purpose, an off-line data acquisition system from paper based ECG records is developed by using image processing techniques for the future development of an Indian standard ECG database. Binarization is done for conversion of TIFF formatted gray tone image to two-tone binary image with the help of histogram analysis, which almost removes the background noise (e g gridlines of ECG papers). The rest of the dotted noises are removed by runlength smearing technique. Thinning is also done to avoid the repetition of co-ordinate information in the dataset. The pixel-to-pixel co-ordinate information is extracted and calibrated using prior information and converted into ASCII data-file. The present database contains 100 normal and 100 diseased subjects. Out of the diseased 100 patients, 55 patients have Myocardial Infarction (MI) and the remaining patients have Myocardial Ischemia. This work is reported in the paper I of original paper list which is given in page (iv) of this thesis. The 2 nd chapter is prepared on the basis of this topic. The developed ECG database is further processed for removal of six different types of noises that can corrupt the ECG signals. All the noises are simulated and removed by digital FIR filters with the help of a software package Cool Edit Pro offered by Syntrillium Software Corporation. This is done to get a realistic situation for the algorithm and also to get greater accuracy in time-plane features detection. In the next step, different time-plane features of ECG signals, which are important for ECG interpretation and classification, are extracted. At first, the R-R intervals (QRS complex) are detected using square derivative curve of ECG signal. Base-line is also detected for accurate detection of P wave and ST segments. Standard assumptions are used for detection of baseline, T waves and ST segments region. P wave is detected from the first derivative of the samples. Depending on the zero-crossings and shape of the signal, a syntactic approach is used for detection of P QRS and T waves. The accuracy level for QRS was 99.4%, for T waves was 96.7% and for P waves 92.2%. The QRS or R-R interval detection part is reported in paper IV whereas the noise removal and the other features extraction methods are reported in VI and IX. The 3rd chapter of the thesis is prepared on the basis of these papers. Frequency plane analysis is also done for extracting frequency plane features, which plays an important role for heart disease categorization. Both amplitude and phase properties are achieved which are established into ‘if-then’ rule-base for disease classification. This part of the work is described in the 4th chapter of the thesis and is reported in the papers II, III, V and VIII. A rule based rough-set decision system is also developed from time-plane features using the most popular rough-set software toolbox ROSETTA. An accuracy of 100% for normal and MI patients for both trained and untrained samples are achieved where as for ischemic patients 100% and 95.8% accuracy obtained for trained and untrained samples. This portion is reported in papers VII and X. 5 th chapter of the thesis is prepared on the basis of this work. i Preface The dissertation is submitted to the University of Calcutta as a partial fulfillment of the requirements for the degree of Doctor of Philosophy in Technology. The research was carried out at the Computer Vision and Pattern Recognition Unit of Indian Statistical Institute, Kolkata, India as well as in the Department of Applied Physics, University College of Technology, University of Calcutta, Kolkata, India during the years 2000-2004. ii Acknowledgement I would like to express my gratitude to Dr Madhuchhanda Mitra of Department of Applied Physics, University of Calcutta for giving me the opportunity to do this work and for her proficient supervision and guidance during the course of study. I am deeply grateful to Professor B B Chaudhuri of Computer Vision and Pattern Recognition (CVPR) Unit of Indian Statistical Institute for his patient supervision and advice on the research work. I am indebted to him for his help in writing the reports, and revising the language of the thesis. The financial support granted by Council of Scientific and Industrial Research (CSIR), Government of India is gratefully acknowledged. I wish to thank Dr Arup Dasbiswas, Associate Professor of Institute of Post Graduate Medical Education and Research (IPGMER), University of Calcutta [S S K M Hospital, Kolkata] and Dr Ajoy Sarkar of Peerless Hospital and B K Roy Research Center, Kolkata for their expert opinion and collaboration regarding the medical aspects of this work. I would like to thank Dr. S.B. Bhattachariya of Department of Cardiology, S.S.K.M. Hospital(IPGMER), Dr. B. Hulder, Asst. Prof., Dept. of Cardiology, Burdwan Medical College Hospital, Dr. T. Goswami, Medical consultant, Dr. B. Bala, Dr. S. K. Roy and Dr. A. K. Das of Uttarpara General Hospital, Dr. S. Chatterjee of Barakpur General Hospital, Dr. Kanak Kumar Mitra, Asst. Prof. , Dept of Cardiology, R. G. Kar Medical College, Dr. Basujit Gangopadhyaya of Apollo Clinic, Dr. P. C. Mandal of Anandalok Groups of Hospital, Dr. D. Ghosh Roy, Dr. T. K. Saha, Dr. Kaushik Mitra and Dr. D. Bhattachariya for sharing their experience and opinion with us and also for their cooperation and help. I also wish to thank Mr. U. Garain of CVPR Unit and Dr. A. Raychowdhuri of Jadavpur University for their help in different occasions. I thank Goutam Dhar, a final year B. Tech. student for his help to develop the computer program of P and T wave detection as a part of his final year project I thank all the other faculty and staff members of CVPR Unit, Indian Statistical Institute and Department of Applied Physics, University of Calcutta for help and cooperation extended to me during the course of my research. Finally, I would like to thank all my family members specially my husband Mr. Saibal Mitra and my parents Mr. D. L. Sarkar and Mrs. Sipra Sarkar for their unstinted support and inspiration during these years of my studies. Kolkata, July 2004 Sucharita Sarkar iii List of Original Papers This thesis is based on the following papers, which are referred to in the abstract by their Roman numerals. The author is responsible for handling the whole problem, developing the computer programming for all the modules, doing the experiment, analyzing the data and writing the manuscripts. I. Sucharita Mitra, M Mitra , “An Automated Data Extraction System From 12 Lead ECG Images”, Computer Methods and Programs in Biomedicine, (Elsevier Science publication), vol. 71(1), May(2003), pp 33-38. II. S. Mitra, M. Mitra, “Phase Response Properties of Normal and Diseased ECG Signals using an Automated Data Acquisition System”, technical journal of The Institution of Engineers (India), vol. 84, May(2003), pp 14-17. III. S. Mitra, M. Mitra & B. B. Chaudhuri , “Generation of Digital Time Database from Paper ECG Records and Fourier Transform Based Analysis for Disease Identification” accepted for publication in Computers in Biology and Medicine, Elsevier Science publication. IV. S. Mitra, M. Mitra, “Detection of QRS Complex of ECG Signals from Square-Derivative Curve”, accepted for publication in the AMSE(France) journal (Advances in Modeling). V. S. Mitra, M. Mitra, S. B. Bhattachariya & B. B. Chaudhuri , “Generation of Digital Time Database from Paper ECG Records and Application of Frequency Plane Analysis for Disease Identification”, Proceedings-CD of International Congress on Biological and Medical Engineering (ICBME), 4-7 December, 2002 Singapore. VI. S. Mitra, M. Mitra & B. B. Chaudhuri, "Time-plane Feature Extraction from ECG signals for Development of Disease Classifier", Proceedings of International Conference on Communication, Devices and Intelligent Systems (CODIS 2004), 8-10 January, 2004, Kolkata, INDIA, pp. 653-656. VII. S. Mitra, M. Mitra, S. Chattopadhyay & B. B. Chaudhuri, “An Approach to a Rough-set Decision System for Classification of Different Heart Diseases”, Proceedings of International Conference on Modeling and Simulation, MS’2004, Lyon, France, 5-7 July, 2004. VIII. Sucharita Mitra, M. Mitra & B. B. Chaudhuri, “Frequency-plane Analysis of Normal and Pathological ECG Signals in Application of Disease Identification”, accepted in Journal of Medical Engineering and Technology. IX. Sucharita Mitra, M. Mitra & B. B. Chaudhuri, “ECG Features Extraction for Analysis with the Help of an Off-line Data Acquisition Package”, communicated and under review to IETE Technical Review. iv X. Sucharita Mitra, M. Mitra & B. B. Chaudhuri, “A Rough Set Based Inference Engine for ECG Classification” communicated and under review at IEEE Transactions on Instrumentation and Measurement. N. B. After marriage my surname has been changed from Sarkar to Mitra; and I have published my papers using my present surname Also note that I had registered as Sucharita Sarkar at the time of admission to Calcutta University Hence I have continued to use my maiden name for my Ph D Registration as well as for all official communications with the University v LIST OF FIGURES Figure1.1 A Schematic Diagram of the Circulatory System 3 Figure 1.2 The Anatomical Structure of Human Heart 4 Figure 1.3 Layers of the Heart Muscle 4 Figure 1.4 Pacemaker potentials and action Potentials in the SA node 6 Figure 1.5 An action potential in a myocardial cell from the ventricles 6 Figure 1.6 An ECG Strip 8 Figure 1.7 The placement of the bipolar leads and the exploratory electrode for the unipolar chest leads in an electrocardiogram (ECG); (RA = right arm, LA = Left arm, LL = left leg ) 9 Figure 1.8 The conduction of electrical impulses in the heart, as indicated by the electrocardiogram (ECG) The direction of the arrows in (e) indicates that depolarization of the ventricles occurs from the inside (endocardium) out (to the epicardium), whereas the arrows in (g) indicate that repolarization of the ventricles occurs in the opposite direction 10 Figure 2.1 Block Schematic of the Developed Off-line Data Acquisition Package 31 Figure 2.2 Gray Level Histogram of An ECG Image 33 Figure 2.3 Runlength Smearing Algorithm 34 Figure2.4 8 neighbors of P1 35 Figure 2.5 Image of original ECG signal from chart record 36 Figure 2.6 Extracted ECG signal before thinning 36 Figure 2.7 Reproduced ECG signal from extracted database after thinning 36 Figure 2.8 Image of original ECG signal from chart record 37 Figure 2.9 Extracted ECG signal before thinning 37 Figure 2.10 Reproduced ECG signal from extracted database after thinning 37 Figure 2.11 Original ECG images from image database and reconstructed signals from digital time database 37 Figure 2.12 Graphical analysis of standard deviation (for Ordinate) 41 Figure 3.1 Few noisy recorded ECG data 49 Figure 3. 2 (a) ECG Signal with 0% Noise Level Figure 3.2(b) ECG signal corrupted by 10% white noise 51 51 vi Figure 3.2(c) ECG signal corrupted by 20% white noise 51 Figure 3.2(d) ECG signal corrupted by 30% white noise 51 Figure 3.3(a) ECG signal corrupted by 10% power line oscillation 52 Figure 3.3(b) ECG signal corrupted by 20% power line oscillation 52 Figure 3.3(c) ECG signal corrupted by 30% power line oscillation 52 Figure 3.4 A screen shot of 50 Hz notch filter working on ECG signal corrupted by 10% power line oscillation 52 Figure 3.5(a) ECG signal corrupted by 10% base line shift 53 Figure 3.5(b) ECG signal corrupted by 20% base line shift 53 Figure 3.5(c) ECG signal corrupted by 30% base line shift 53 Figure 3.6(a) ECG signal corrupted by 10% abrupt baseline shift 53 Figure 3.6(b) ECG signal corrupted by 20% abrupt baseline shift 53 Figure 3.6(c) ECG signal corrupted by 30% abrupt baseline shift 53 Figure 3.7 ECG signal corrupted with 20% composite noise 54 Figure 3.8(a) Noise free ECG signal 54 Figure 3.8(b) Filtered output of ECG signal 54 Figure 3.9 A screen shot of 0 6 Hz high-pass filter working on ECG signal corrupted by 20% base line shift 54 Figure 3.10 QRS complex or R-R interval detection 56 Figure 3.11 Filtered output of ECG signal after removal of all simulated noises 57 Figure 3.12 Graphical representation of 2nd order derivative after removal of all simulated noises 57 Figure3.13 Graphical representation of square of 2nd order derivatives after removal of all simulated noises 57 Figure 3.14 Reproduced ECG signals with different shapes 58 Figure 3.15 Square Derivative Curve of reproduced ECG signals 58 Figure 3.16 Different time plane features which are extracted 59 Figure 4.1 Time-domain Representation of Sin-wave 68 Figure 4.2 Time-domain Representation of Square Wave 68 vii Figure 4.3 Spectrum or Frequency-domain Representation of the Square Wave 68 Figure 4.4 Effect in the frequency domain of sampling in the time domain (a) Spectrum of original signal (b) spectrum of sampling function (c) Spectrum of sampled signal with fs >2fc (d) Spectrum of sampling function with fs < 2fc (e) Spectrum of sampled signal with fs < 2fc 70 Figure 4.5 Block Schematic of the Proposed System 73 Figure 4.6 Amplitude diagram of Infarction and Normal patients for lead V4 77 Figure 4.7 Amplitude diagram of Ischemia and Normal patients for lead V6 78 Figure 4.8 Phase diagram of H7 for lead AVL for (a) Infarction Patients, (b) Normal subjects 82 Figure 5.1 Block Schematic of the Proposed System 90 Figure 5.2 A Screenshot of Generated Rule-set 96 Figure 5.3 Confusion Matrix Output for Standard Voting Classifier 96 viii LIST OF TABLES Table 2.1 QRS Portion Of Extracted Database Of The Image Shown In Figure 2 5 39 Table 2.2 QRS Portion Of Extracted Database Of Another Image 40 Table 3.1 QRS Detection Accuracy In Different Noise Levels 61 Table 3.2 T Wave Detection Accuracy In Different Noise Levels 62 Table 3.3 P Wave Detection Accuracy In Different Noise Levels 62 Table 3.4 A Portion Of Extracted Time Plane Features 62 Table 4.1 A Portion Of Numerical Values Of Amplitude And Phase Of ECG After DFS For Lead V6 75 Table 4.2 A Portion Of Numerical Values Of Amplitude And Phase Of ECG After DFS For Lead V4 76 Table 4.3 A Portion Of List Of Sum Of First Five Harmonics 79 Table 4.4 A Portion Of List Of Sum Of First Two Harmonics 79 Table 4.5 Phase Properties Of Normal Vs Myocardial Infarction 80 Table 4.6 Phase Properties Of Normal Vs Myocardial Ischemia 80 Table 4.7 Result Obtaine From Rule Based Classifier 81 Table 5.1 A Potion Of Decision Table 93 Table 5. 2 Result Obtained From Rule Based Rough Set Decision System 95 ix CONTENTS Abstract i Preface ii Acknowledgement iii List of Original Papers iv List of Figures vi List of Tables ix Contents x 1. General Introduction 1 The Heart 1 Anatomy 2 Electrical Activity of the Heart 5 Clinical Observation of Heart 6 Electrocardiography (ECG) Cardiac Diseases Interpreted by ECG 7 10 Computer Analysis of ECG and Historical Background 13 Thesis Overview 17 Literature Survey 20 ECG Data Extraction and Signal Analysis 21 ECG Feature Extraction and Time-plane Analysis 23 Frequency-plane Analysis of ECG Signals 25 ECG Classification and Abnormality Detection 27 2. Automated Data Extraction from 12 Lead ECG Images 29 Introduction 29 Materials and Detailed Methods 31 Binarization of the Input Images 31 Removal of Grid Lines from Graphical Papers 33 Thinning of the Input Signal 34 Raw Data Extraction 35 Data Sorting and Regeneration of the ECG Signal 36 Results 38 Discussions 41 x 3. Noise Removal and Time-plane Features Extraction from ECG Signal 43 Introduction 43 Noise Removal from Extracted ECG Signals 45 Noise Characteristics 45 Different Filters and their Usage 47 Noise Simulation and Removal from ECG Signals 49 A Few Important Time-plane Features 54 Time-plane Features Extraction 56 QRS Complex Detection 56 Base Line Detection 58 P, ST Segment and T Wave Detection 59 Detailed Algorithm 60 Results 61 Discussion 64 4. Frequency Plane Features Extraction from ECG Signals 65 Introduction 65 Frequency Plane Transformation of Digitized ECG Data 66 The Theory of Analysis of Digitized ECG Wave 71 Experimental Procedure 72 Result 73 Discussions 82 5. A Rough Set Based Inference Engine from Different Time-plane Features for ECG Classification 85 Introduction 85 Rough Set- A Tool for Representing and Reasoning about Imprecise or Uncertain Information 87 Proposed Method 90 Knowledge Base Development 90 Development of Inference Engine 92 Result 93 Discussion 97 6. Conclusion and Future Scopes 99 References 105 Appendix A 119 Appendix B 120 xi Annexure 1 (Publication List) 121 Annexure 2 (Different Modules) 123 Vita 144 Original Papers 145 xii