Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research Plan • Knowledge discovery in brief • Eg 1: Optimizing treatment of childhood ALL • Eg 2: Predicting survivals of patients with DLBC lymphoma • Concluding remarks Copyright © 2004 by Limsoon Wong Copyright © 2004 by Limsoon Wong Knowledge Discovery in Brief What is Knowledge Discovery? Jonathan’s blocks Jessica’s blocks Whose block is this? Jonathan’s rules : Blue or Circle Jessica’s rules : All the rest Copyright © 2004 by Limsoon Wong What is Knowledge Discovery? Question: Can you explain how? Copyright © 2004 by Limsoon Wong Steps of Knowledge Discovery • Training data gathering • Feature generation – k-grams, colour, texture, domain know-how, ... • Feature selection – Entropy, 2, CFS, t-test, domain know-how... • Feature integration – SVM, ANN, PCL, CART, C4.5, kNN, ... Some classifiers/learning methods Copyright © 2004 by Limsoon Wong Knowledge Discovery for Copyright © 2004 by Limsoon Wong Optimizing Treatment of Childhood ALL Image credit: Yeoh et al, 2002 Childhood ALL • Major subtypes: T-ALL, E2A-PBX, TEL-AML, BCR-ABL, MLL genome rearrangements, Hyperdiploid>50, • Diff subtypes respond differently to same Tx • Over-intensive Tx • The subtypes look similar • Conventional diagnosis – Immunophenotyping – Cytogenetics – Molecular diagnostics – Development of secondary cancers – Reduction of IQ • Under-intensiveTx – Relapse Copyright © 2004 by Limsoon Wong • Unavailable in most ASEAN countries Single-Test Platform of Microarray & Knowledge Discovery training data collection feature integration Image credit: Affymetrix Copyright © 2004 by Jinyan Li and Limsoon Wong Impact Conventional Tx: • intermediate intensity to all 10% suffers relapse 50% suffers side effects costs US$150m/yr Our optimized Tx: • high intensity to 10% • intermediate intensity to 40% • low intensity to 50% • costs US$100m/yr Copyright © 2004 by Jinyan Li and Limsoon Wong •High cure rate of 80% • Less relapse • Less side effects • Save US$51.6m/yr Knowledge Discovery for Copyright © 2004 by Limsoon Wong Predicting Survival of Patients with DLBC Lymphoma Image credit: Rosenwald et al, 2002 Diffuse Large B-Cell Lymphoma • DLBC lymphoma is the most common type of lymphoma in adults • Can be cured by anthracycline-based chemotherapy in 35 to 40 percent of patients DLBC lymphoma comprises several diseases that differ in responsiveness to chemotherapy Copyright © 2004 by Limsoon Wong • Intl Prognostic Index (IPI) – age, “Eastern Cooperative Oncology Group” Performance status, tumor stage, lactate dehydrogenase level, sites of extranodal disease, ... • Not good for stratifying DLBC lymphoma patients for therapeutic trials Use gene-expression profiles to predict outcome of chemotherapy? Knowledge Discovery from Gene Expression of “Extreme” Samples 240 samples “extreme” sample selection 47 shortterm survivors 26 longterm survivors knowledge discovery from gene expression 84 genes T is long-term if S(T) < 0.3 T is short-term if S(T) > 0.7 7399 genes 80 samples Kaplan-Meier Plot for 80 Test Cases p-value of log-rank test: < 0.0001 Risk score thresholds: 0.7, 0.5, 0.3 Improvement Over IPI (A) IPI low, p-value = 0.0063 (B) IPI intermediate, p-value = 0.0003 Merit of “Extreme” Samples (A) W/o sample selection (p =0.38) (B) With sample selection (p=0.009) No clear difference on the overall survival of the 80 samples in the validation group of DLBCL study, if no training sample selection conducted Knowledge Discovery for Copyright © 2004 by Limsoon Wong A Few Other Biomedical Applications Predict Epitopes, Find Vaccine Targets • Vaccines are often the only solution for viral diseases • Finding & developing effective vaccine targets (epitopes) is slow and expensive process • Develop systems to recognize protein peptides that bind MHC molecules • Develop systems to recognize hot spots in viral antigens Recognize Functional Sites, Help Scientists • Effective recognition of initiation, control, & termination of biological processes is crucial to speeding up & focusing scientific expts • Data mining of bio seqs to find rules to recognize & understand functional sites Dragon’s 10x reduction of TSS recognition false positives Understand Proteins, Fight Diseases • Understanding function & role of protein needs organised info on interaction pathways • Such info are often reported in scientific paper but are seldom found in structured db • Knowledge extraction system to process free text • extract protein names • extract interactions Benefits of Bioinformatics • To the patient: – Better drug, better treatment • To the pharma: – Save time, save cost, make more $ • To the scientist: – Better science Copyright © 2004 by Limsoon Wong References • A. Yeoh et al, “Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling”, Cancer Cell, 1:133--143, 2002 • A. Rosenwald et al, “The use of molecular profiling to predict survival after chemotherapy for diffuse large B-cell lymphoma”, NEJM, 346:1937--1947, 2002 • H. Liu et al, “Selection of patient samples and genes for outcome prediction”, Proc. CSB2004, pages 382-392 Copyright © 2004 by Limsoon Wong Copyright © 2004 by Limsoon Wong Any Question? • • • • To be presented 10/10/04, 8.30--10.00am Raffles Convention Centre NHG-IBM Symposium Copyright © 2004 by Limsoon Wong