File: readme.doc G. Moody July 1989 Copyright (C) Massachusetts Institute of Technology 1989. reserved. All rights Overview of this disk ===================== This CD-ROM contains over two hundred hours' worth of two-lead ECG recordings, most of which are annotated beat-by-beat. Approximately one-fourth of the disk is occupied by the MIT-BIH Arrhythmia Database, and the rest contains supplementary ECG recordings comprising seven additional databases. Each of the databases occupies its own directory on this disk. You will also find three directories on this disk which do not contain ECG recordings; these are named `\bin', `\lib', and `\udb'. With the exception of `.doc' and `.h' files, the files on this disk are binary (i.e., they are not directly printable). There are several hundred ECG recordings on this CD-ROM, ranging in length from 20 seconds to 24 hours. The program `view' (in the `\bin' directory) allows you to examine any of these recordings interactively. Run it without any command-line arguments for instructions on its use. The rest of this file briefly describes the contents of this CD-ROM. More information can be found in the manuals which accompany the disk. If you need additional copies of these manuals, or if you have further questions, please write to: Beth Israel Hospital Biomedical Engineering Division, KB-26 330 Brookline Avenue Boston, MA 02215 USA or call (617)735-4553. Programs ======== The `\bin' directory contains a few application programs which can be run under MS-DOS to examine the ECG recordings. You may copy these programs to a directory on your magnetic disk which is in your PATH. If you wish to run them directly from the CD-ROM, simply add the `\bin' directory of your CD-ROM drive to your PATH. (See your DOS manual for information on how to do this.) See `\bin\bin.doc' for further information about the applications. A larger package of applications is available on floppy disks for MS-DOS or Macintosh systems. These applications, along with the DB library functions (see below), are also available in C source form for use on UNIX systems or for customization. Write to the address above for details. Subroutine Library ================== If you wish to construct your own database applications, the `\lib' directory of this disk contains the DB library, a set of functions which provide access to database files. The DB library can be used with the Microsoft C compiler under MS-DOS. See `\lib\lib.doc' for details about files in `\lib' and `\udb'. ECG Recordings ============== The ECG recordings on this disk are organized into eight databases, each in its own directory. Each database record contains a continuous ECG recording from a single subject. Records are identified by unique record names (for example, MIT-BIH Arrhythmia Database record names are 3-digit numbers). All files belonging to a record have the same primary file name (which is the record name), and extensions which indicate the file type. Header Files -----------Each record includes a header (.hea) file, which contains general information about the record such as its length. Header files occupy one disk block each. If you have a utility for converting UNIX text files to MS-DOS format, you can use it to produce printable versions of these files if you are curious about their contents. Signal Files -----------With the exception of the Atrial Fibrillation Database (see below), each record includes a signal (.dat) file, which contains digitized ECG signals. Most of the signal files contain two signals (with samples from each signal stored alternately), but several of those in the ST Change Database contain only one signal, and one record in the Long-Term Database contains three signals. The sampling frequencies range from 128 to 360 samples per second per signal (see the descriptions of the individual databases below). Signal files occupy most of the disk, and range in size from 20 Kb to about 45 Mb. Signal files may be examined using `rdsamp' (in the `\bin' directory of this disk). Annotation Files ---------------Most records include reference annotation (.atr) files. The reference annotation files for the MIT-BIH Arrhythmia Database are the most complete and accurate of these; they describe each beat by an annotation (i.e., a label), and include additional annotations which describe rhythm and signal quality changes. As noted below, reference annotation files for the other databases are less extensive (some omit rhythm or signal quality annotations, and others contain only rhythm annotations) and may contain small numbers of errors since they have not been subjected to the intense scrutiny directed at the MITBIH Arrhythmia Database during its development and in the nine years between its initial release and the publication of this disk. The Atrial Fibrillation Database contains a set of machine-generated annotation files with `.qrs' extensions (see below), in addition to its reference annotation (.atr) files. Annotation files vary in length, according to their contents and the duration of the record, from less than 1 Kb to several hundred Kb. These files may be examined using `rdann' (in the `\bin' directory of this disk). ECG Databases ============= This section briefly describes each of the eight databases on this CDROM, lists the record names for each, and includes a few references for further reading. MIT-BIH Arrhythmia Database (\mitdb) -----------------------------------Record names: 100 105 111 116 122 101 106 112 117 123 102 107 113 118 124 103 108 114 119 104 109 115 121 \----- (the `100 series') ----/ 200 201 202 203 205 \----- 207 213 220 230 208 214 221 231 209 215 222 232 210 217 223 233 212 219 228 234 (the `200 series') ----/ This database consists of 48 annotated records, obtained from 47 subjects studied by the Arrhythmia Laboratory of Beth Israel Hospital in Boston between 1975 and 1979. About 60% of the records were obtained from inpatients. The database contains 23 records (the `100 series') chosen at random from a set of over 4000 24-hour Holter tapes, and 25 records (the `200 series') selected from the same set to include a variety of rare but clinically important phenomena which would not be well-represented by a small random sample. Several records in the 200 series were chosen specifically because features of the rhythm, QRS morphology, or signal quality may be expected to present significant difficulty to arrhythmia detectors. Each record is slightly over 30 minutes in length. Each signal file contains two signals sampled at 360 Hz. The header files include information about the leads used, the patient's age, sex, and medications. (This information is reproduced in the MIT-BIH Arrhythmia Database Directory, which accompanies this disk.) The reference annotation files include beat, rhythm, and signal quality annotations. Each of the roughly 109,000 beats was manually annotated by at least two cardiologists working independently; their annotations were compared, consensus on disagreements was obtained, and the reference annotation files were prepared. Four records (102, 104, 107, and 217) include paced beats. The original analog tapes do not represent the pacemaker artifacts with sufficient fidelity to permit them to be recognized by pulse amplitude (or slew rate) and duration alone, the method commonly used for real-time processing. The database records reproduce the analog recordings with sufficient fidelity to permit use of pacemaker artifact detectors designed for tape analysis, however. The MIT-BIH Arrhythmia Database has been used for evaluation of arrhythmia detectors at approximately 100 sites worldwide prior to the publication of this CD-ROM. Since its initial release in 1980, sixteen errors in beat annotations have been discovered and corrected. No such errors have been found since 1987. Reference: Mark, R.G., Schluter, P.S., Moody, G.B., Devlin, P.H., and Chernoff, D. An annotated ECG database for evaluating arrhythmia detectors. Frontiers of Engineering in Health Care: Proceedings of the 4th Annual Conference of the IEEE Engineering in Medicine and Biology Society, pp. 205-210. New York: IEEE Press (1982). Noise Stress Test Database (\nstdb) ----------------------------------Record names: bw 118_02 em 118_04 ma 118_06 118_08 118_10 118_12 119_02 119_04 119_06 119_08 119_10 119_12 This database consists of 15 thirty-minute records. Three of these (records `bw', `em', and `ma') contain noise of the types typically observed in ECG recordings. They were obtained using a Holter recorder on an active subject, with leads placed so that the subject's ECG is not visible. Two signals were recorded simultaneously. Record `bw' contains primarily baseline wander, a low-frequency signal usually caused by motion of the subject or the leads. Record `em' contains electrode motion artifact (usually the result of intermittent mechanical forces acting on the electrodes), with significant amounts of baseline wander and muscle noise as well. Record `ma' contains primarily muscle noise (EMG), with a spectrum which overlaps that of the ECG, but which extends to higher frequencies. Electrode motion artifact is usually the most troublesome type of noise for arrhythmia detectors since it can closely mimic characteristics of the ECG. The remaining records reproduce MIT-BIH Arrhythmia Database records 118 and 119 with `em' noise added at various levels. Since the correct beat labels are known for these records, they may be used to test the noise tolerance of an arrhythmia detector. (`\mitdb\118.atr' and `\mitdb\119.atr' are the reference annotation files for these records; there are no annotation files in the `\nstdb' directory.) Records 118 and 119 were chosen for this purpose because they are not intrinsically noisy, and because most arrhythmia detectors can analyze them without significant errors. Thus any errors which occur in the analysis of the records to which noise has been added can be attributed to the noise, and not to any inherent difficulty in analyzing the ECG itself. The names of these records are of the form `rrr_nn', where `rrr' is the name of the original ECG record and `nn' indicates the noise-to-signal ratio during the noisy periods (02 = noise 20% as large as the QRS complexes, 12 = noise 120% as large as the QRS complexes, etc.). Reference: Moody, G.B., Muldrow, W.K., and Mark, R.G. A noise stress test for arrhythmia detectors. Computers in Cardiology 11:381-384 (1984). ST Change Database (\stdb) -------------------------Record names: 300 305 301 306 302 307 303 308 304 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 This database consists of 28 unannotated records ranging in length from 13 to 67 minutes, obtained from 28 subjects. Records 300 to 323 were obtained during exercise stress tests, using an FM instrumentation tape recorder; these records exhibit transient ST depression in response to exercise-induced ischemia. The header files for these records include information about the gain of the signals which can be used to calibrate ST measurements in terms of body surface potentials. Records 324 to 327 were obtained from Holter tapes, and show ST elevation. Records 313 to 317 and 319 to 323 contain only one signal; the rest contain two signals. All signals were sampled at 360 Hz. Reference: Albrecht, P. S-T segment characterization for long-term automated ECG analysis. MIT M.S. thesis (1983). Malignant Ventricular Arrhythmia Database (\vfdb) ------------------------------------------------Record names: 418 423 419 424 420 425 421 426 422 427 428 429 430 602 605 607 609 610 611 612 614 615 This database consists of 22 thirty-five-minute records, obtained from Holter tapes of 16 subjects. It is annotated only with respect to rhythm changes, which include 89 episodes of ventricular tachycardia, 60 episodes of ventricular flutter, and 42 episodes of ventricular fibrillation. The signal files contain two signals, each sampled at 250 Hz. References: Greenwald, S.D., Albrecht, P., Moody, G.B., and Mark, R.G. Estimating confidence limits for arrhythmia detector performance. Computers in Cardiology 12:383-386 (1985). Greenwald, S.D. Development and evaluation of a ventricular fibrillation detector. MIT M.S. thesis (1986). Atrial Fibrillation/Flutter Database (\afdb) -------------------------------------------Record names: 00735 04126 03665 04746 04015 04908 04043 04936 04048 05091 05121 05261 06426 06453 06995 07162 07859 07879 07910 08215 08219 08378 08405 08434 08455 This database may be useful for development and evaluation of atrial fibrillation/flutter detectors which rely on timing information only. It consists of 25 ten-hour records (obtained from Holter tapes of 25 subjects) containing about 300 episodes of atrial fibrillation and 40 episodes of atrial flutter. Because of space limitations, it is not feasible to include all 250 hours of the ECG signals on this disk. The `\afdb' directory contains two sets of annotation files for all 25 records, and a signal file for record 04936. The signal file contains two signals, sampled at 250 Hz. The reference annotation (.atr) files contain only rhythm change annotations. The beat annotation (.qrs) files were produced by an automated QRS detector, and all beats are labelled normal; the R-R interval sequences may be recovered from these files and used as input to the atrial fibrillation/flutter detector to be tested. Note that the .qrs files have not been audited, and contain a small number of errors. Reference: Moody, G.B., and Mark, R.G. A new method for detecting atrial fibrillation using R-R intervals. Computers in Cardiology 10:227-230 (1983). ECG Compression Test Database (\cdb) -----------------------------------Records: 08730 11442 11950 12247 12431 12490 12531 12621 12713 12921 (7) (4) (5) (4) (5) (2) (5) (4) (4) (1) 12936 12940 12981 13005 13030 13045 13059 13130 13139 13227 (3) (5) (3) (2) (5) (3) (7) (2) (3) (6) 13229 13274 13301 13317 13370 13380 13420 13425 13431 13508 (4) (5) (4) (5) (2) (5) (12) (5) (3) (5) 13543 13556 13559 13580 13585 13590 13649 13687 (6) (7) (1) (2) (8) (6) (4) (4) This database consists of 168 unannotated records, each 20.48 seconds in duration, obtained from Holter tapes from 38 subjects. Each subject is identified by one of the five-digit numbers in the table above; the number in parentheses next to each of these subject numbers is the number of records which were obtained from that subject's Holter tape. The record names are of the form `sssss_nn', where `sssss' is the subject number and `nn' is a segment number, beginning at 01; thus the records for subject 12490 are named `12490_01' and `12490_02'. The records exhibit a wide variety of arrhythmias, conduction disturbances, and noise. Many were selected because the characteristics of the signal or noise may be expected to pose a problem for an ECG compression method which is not exactly invertible. By comparing diagnoses made on the basis of compressed versions of these records with independent diagnoses made from the uncompressed versions, the ability of an ECG compression method to preserve clinically important waveform details can be assessed. Each record contains two signals, sampled at 250 Hz. Reference: Moody, G.B., Mark, R.G., and Goldberger, A.L. Evaluation of the "TRIM" ECG data compressor. Computers in Cardiology 15 (1988). Supraventricular Arrhythmia Database (\svdb) -------------------------------------------Record names: 800 805 801 806 802 807 803 808 804 809 810 811 812 This database contains the first 13 thirty-minute records of a database we are building to supplement the examples of supraventricular arrhythmias in the MIT-BIH Arrhythmia Database. The records were obtained from Holter tapes of 13 subjects. They have been annotated using a semi-automated method which gives highly accurate results, but the annotations have not been audited to the extent of those in the MIT-BIH Arrhythmia Database, and a small number of errors may be present. The reference annotation files include beat and signal quality annotations, but no rhythm annotations. Each record contains two signals, sampled at 128 Hz. Long-Term Database (\ltdb) -------------------------Record names: 14046 14157 14134 14172 14149 14184 15814 This database contains seven annotated long-term records ranging in length from 14 to 24 hours. These records are complete Holter tapes from seven subjects. As for the Supraventricular Arrhythmia Database, the records have been annotated using a semi-automated method, and a small number of errors may be present. The reference annotation files include beat and signal quality annotations, but no rhythm annotations. Six of the records contain two signals; record 15814 contains three. All signals are sampled at 128 Hz. -----------------------------------------------------------------------------Macintosh is a registered trademark of Apple Computer, Inc. Microsoft and MS-DOS are registered trademarks of Microsoft Corporation. UNIX is a registered trademark of AT&T.