Classifying Neurosurgery Operation Notes using Text Mining Techniques Introduction: Inspiration and Goal • 台灣神經外科醫學會101年新訂手術病例統 計表 • How can we build a ‘reasonable’ classification system for neurosurgery from the informatics point of view Brain tumor Vascular (1) Glioma Tumor High grade Malignant Low grade Benign (2) Meningioma HIVD (3) Pituitary tumor Cervical (4) Acoustic neuroma Lumber (5) Others Stenosis Aneurysm (Microsurgery) Cervical AVM (Excision) Lumber EC-IC bypass Other Instrumentation Endarterectomy Cervical Cavernoma (Excision) Lumber Dural AVF (Microsurgery) Spontaneous ICH Spine Decompressiom for Infarction Carpal tunnel syndrome EDH PNS Acute SDH Chronic SDH Lesioning Traumatic ICH DBS CVA Cranioplasty Peripheral nerves Head injury3 VP Shunt MVD Hyperhidrosis Functional Epilepsy surgery Aneurysym (Coiling) Endovascular surgery Others AVM Dural AVF Carotid angioplasty / stent Aspiration Infection (abscess,empyema) Drainage Excision Trauma Brain Tumor Spinal Spinal Metastasis Tu mo Neurilemmoma r Meningioma Tethered cord syndrome Acute EDH Craniotomy + evacuation Acute SDH Chronic SDH Craniotomy + evacuation Burr hole drainage Brain swelling Decompressive craniectomy Skull bone defect Cranioplasty (Para)spinal lesion Glioma Excision (total, subtotal, partial) Intramedullary tumor Laminectomy + Excision/Biopsy Meningioma Excision (Simpson grade Ependymoma Lymphoma Excision / Biopsy Epidermoid tumor Excision ) Anterior / Posterior decompression (biopsy) Excision Excision Detethering Nil (CT-guided Bx) Excision Trauma C-spine TL-spine fracture/disloca Laminectomy with TPS tion Spine C-stenosis C-HIVD Posterior fusion (Sonntag, Gallie, TAS, Occipitocervical) Laminectomy ACDF (no instrument) L-stenosis Laminectomy (without/with discectomy) Cavernous sinus tumor Excision L-HIVD Discectomy (microsurgical) Medulloblastoma Excision Spondylolisthesis Laminectomy + transpedicle screw (name ) Chordoma Excision Transcranial/trenssphenoidal excision Epidural abscess Laminectomy + drainage C-OPLL Multilevel corpectomy + plating Pituitary Brain Metastasis Cavernoma CPA tumor Craniopharyngioma Transcranial/transsphenoidal adenomectomy Excision Excision Excision (retrosigmoid / presigmoid) fracture/disloca ACDF w traction tion C1-2 subluxation MATERIALS (TNSS2012 poster) • Between Apr, 2009 and Mar, 2012, 4639 operations were performed on 2852 patients admitted to the neurosurgery service of a medical center in northern Taipei. • We downloaded these operation notes from the hospital database using the patient list obtained from our proprietary software for scheduling admissions and operations. • A simple parser was applied to separate the operation notes into four segments: header, timeline, billing information and free text note. Free text operation notes recording procedures performed by neurosurgeons were extracted. Materials and Methods • 4639 semi-structured operation notes stored in HIS downloaded into PC • • • • Preprocessing Keyword selection and identification Agglomerative clustering Evaluation for appropriateness 4639 Semi-Structured Texts • Format 1/2/3: 753/2552/2087 notes – Format 2 = 3 (only different billing data order) • Header ~ basic data – who/what/when/where/how • Timeline: timing of each operation and anesthesia stages • Billing information: NHI codes and counts • Free text note: recording procedures performed by neurosurgeons Format3 (I) • • • • • • • • • • • • NAME (gender;DOB;age) 手術日期 yyyy/mm/dd 手術主治醫師 xxx 手術區域 rr xxx房 yy號 診斷 Brain tumor 器械術式 Brain tumor Crainotomy(Others) 手術類別 預定手術 手術部位 頭、頸 傷口分類 清潔 麻醉方式 全身麻醉 麻醉 主治醫師 yyy ASA 2 紀錄醫師 yyy 時間資訊 • 00:00 臨時手術NPO • 13:18 進入手術室 • 13:20 麻醉開始 • 14:00 誘導結束 • 14:10 抗生素給藥 • 14:10 手術開始 • 18:05 手術結束 • 18:05 麻醉結束 • 18:15 送出病患 Format3 (II) 醫令資訊 類別 名稱 量 刀 側 • 手術 腦瘤切除-手術時間 在4小時以內 1 1 L • 麻醉 PRE-ANESTHESIA EVALUATION 1 0 • 麻醉 SEMI-CLOSED INTRATRACHEAL INTUBA 1 0 • 麻醉 G-anesthesia (2-4 hours,each 30 4 0 • 麻醉 SEMI-CLOSED INTRATRACHEAL INTUBA 2 0 • 麻醉 Peripheral arterial line inserti 1 0 • 麻醉 C.V.P. catheter in ubation 1 0 • 麻醉 Lactic Acid (lactate) 1 0 • 麻醉 動脈血液檢查全套 1 0 • 麻醉 Hemoglobin (Hb) 1 0 • 麻醉 測血糖 1 0 • 麻醉 Ca (Calcium) 1 0 • 麻醉 Na (Sodium) 1 0 • 麻醉 K (Potassium) 1 0 • 麻醉 Blood gas analysis 1 0 Format3 (III) 摘要__ 手術科部: 外科部 套用罐頭: Craniotomy for ICT 開立醫師 : YYY 開立時間: 2011/03/31 18:25 Pre-operative Diagnosis Left convexity meningioma Post-operative Diagnosis Left convexity meningioma Operative Method Left frontoparietal craniotomy for tumor exicision, Simpson grade I Specimen Count And Types Several fragments of one tumor was sent for pathology. Pathology Pending Operative Findings One extraaxial dura-based, firm to elastic, well-capsulated, about 6-7cm, tumor located at left frontoparietal region. Operative Procedures With endotracheal general anaesthesia, the patient was put in supine position with head fixed in Mayfield head clamp. After scalp shaved, scrubbed, disinfected, and then draped, we made one U-shape skin incision at left frontopareital area. We drilled for burr holes, and then created craniotomy. We made dura incision around the tumor base, and dissected the arachnoid membrane plane around the tumor. The dura was closed in water-tight fahsion, and bone graft was fixed back with wires. The wound was closed in layers after CWV insertion. Operators VS XXX Assistants R4 YYY Free Text Preprocessing • Convert into lower case – Chinese character already dropped • Replace special characters with “ ” – All punctuations removed – Still keeping spaces between words • Abbreviations: kept in original form Punctuations/Stop words • Puntuations for perl: tr/.,:;!?"(){}//d; – Only alphabets retained 9906 words • Stop word list used to eliminate meaningless bigrams – Salton G. The SMART Retrieval System. Englewood Cliffs, NJ, Prentice Hall; 1971 – ‘Right’ retained right/left IS important • Section headings removed Types of Keywords 42 manually selected, may be combined or isolated (e.g. fronto-temporo-parietal) – Wildcards and spaces used • Anatomy (16): location, structure • Pathology (11) • Procedure (15): name, steps – Including instruments (intra-op and implants) – May be identical/similar to procedure name Anatomy Keywords (16) • Brain (1+6) – front*, pariet*, tempor*, occipit*, cerebell*, ventricl* • Spin* (1+4) – root, thecal, sac, disc • Others/common to brain and spine (4) – pituitary, carotid, trache* – nerve Pathology Keywords (11) • General (3) – tumor, injury, abscess • Specific to brain/more common in brain (5) – parkinso*, aneurysm – hematoma, hemorrhag*, swelling • Specific to spine (1) – spondylo* • Others (2) – hyperhidrosis, csf lea* (pituitary) Procedure/Instrument Kwds (15) • Brain (7) – burr, hole, craniotomy, craniectomy, cranioplasty – ventriculoperitoneal, shunt • Spine (2) – fusion, cage • Common to brain and spine (2) – decompressi*, scre* • Others/nonspecific (4) – radiofrequency (PNS?), debrid*, port a, emergen* – trache*: already used as anatomy keyword Issues in Choosing Keywords • Specificity (anat/path/proc) – Brain (19): 7/5/7 words – Spine (8): 5/1/2 words • C- and L- spine: (single alphabet, too nonspecific) – Brain and spine: decompressi*, screw • Common bigrams (various specificity) – Ventriculoperitoneal shunt – Thecal sac – Burr hole “ Trache*”, “Thecal Sac” • Initially, “trach*” was used to find “tracheostomy”, but “endotracheal” or “endo-tracheal” still valid • “trach*” changed into “ trach*” to eliminate the FPs • Dura, dural sac nonspecific to spine • Thecal sac specific to spine Horizontal Dendrograms Cluster Dissimilarity (cases clusters) and Linkage Criterion • In order to decide which clusters should be combined (for agglomerative), or where a cluster should be split (for divisive), a measure of dissimilarity between sets of observations is required. • Achieved by use of an appropriate metric (a measure of distance between pairs of observations), and a linkage criterion which specifies the dissimilarity of sets as a function of the pairwise distances of observations in the sets. Linkage criteria • The linkage criterion determines the distance between sets of observations as a function of the pairwise distances between observations. • Some commonly used linkage criteria between two sets of observations A and B are – Maximum or complete linkage clustering: worst – Minimum or single-linkage clustering: best – Mean or average linkage clustering, or UPGMA: average of all pairs Manning,C.D. (1999) Foundations of Statistical Natural Language Processing MIT Press, Cambridge, Mass. Examples of Different Strategies Six Variants of Agglomerative Clustering • For 45 vectors in 3657 dimensions • Vector-based variants: easy to compute, but create new vectors, z = x or y, z = (x + y)/2 and z = x + y – Recalculate at each stage of linking • Set-based variants: applying maximum, minimum and mean distance between observations in set pairs – Lookup table possible Hierarchical Cluster • Set-based – Single linkage – Complete linkage – UPGMA • Vector-based – UPGMC – WPGMC – OR Results Similarity matrix Similarity matrix derived from the 45 binary TF Single linkage Complete linkage UPGMA WPGMC WPGMC WPGMC OR