clinical Microarray Mining Resource: Building a clinical data warehouse for mining patient-based gene expression data from DNA microarrays Medical Informatician: Software Developer: Software Developer: Head of Data Warehousing: Head of Microarray Centre: Academic Advisor: Tito Castillo Stelios Alexandrakis Bhuwan Tiwari Chris Tomlinson Laurence Game Tim Aitman 9th March 2006, Integrated Health Records (IHR) - Practice and Technology, National e-Science Centre Presentation Outline • Gene Expression, Microarrays & the Clinician • Existing Standards and Resources • Microarray Centre History • cMiMiR Project • Conclusions What is Gene Expression? • Gene's coded information (DNA) is translated into cell structures. Nucleus • 1000s of different genes. Gene 1 Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 RNA Gene 1 Translation RNA RNA • Measurement of RNA allow us to quantify gene expression. – Low levels – High levels Down regulation Up regulation • Gene expression reflect disease processes. DNA Gene 1 RNA Cytoplasm Protein Transcription Microarray Analysis • Microarrays contain 1000s of sites to detect transcripts. • Create a “molecular fingerprint” of a sample of cells • Allow global approach to analysis of gene expression • Powerful tool for molecular classification of disease states • ~50MB data per microarray Down regulated gene Up regulated gene Disease Classification Microarray analysis in Acute Lymphoid vs. Acute Myeloid Leukaemia Golub et al, 1999 Science 286:531-7 “ …….. Class prediction provides an unbiased, general approach to constructing prognostic tests, provided that one has a collection of tumour samples for which eventual outcome is known.” A Gene-Expression Signature as a Predictor of Survival in Breast Cancer van de Vijver et al, NEJM 2002 70 gene prognosis profile used Of a total of 295 consecutive patients with breast carcinoma • 180 poor-prognosis signature • 115 good-prognosis signature “… more powerful predictor of outcome of disease in patients with breast cancer than standard systems based on clinical and histological criteria.” Inter-operability standards MGED Ontology • Standards for microarray experiments – Standard vocabulary • MGED Ontology – Standard interchange format • MAGE-ML MGED Core Ontology Audit And Security Package Roles Consultant Investigator Institution Hardware manufacturer Curator Funder Submitter Minumum Information About a MicroArray Experiment (MIAME) MAGE-ML . <Person identifier="someplace.org:Person:j_bloggs" email="j.bloggs@someplace.org" lastName="Bloggs“ firstName="Joe"> <Roles_assnlist> <OntologyEntry category="Roles" value="submitter“/> </Roles_assnlist> </Person> . Public Repository Services European Bioinformatics Institute (Array Express) National Centre for Biotechnology Information (GEO) – – Submission of experimental details for peer review Required by most major scientific journals Repository Total experiments Clinical Trials(%) ArrayExpress 1206 35 (3%) GEO 3009 366 (12%) Imperial/CSC Microarray Centre • Established March 2000 • Serves 200+ research groups at CSC and Imperial • Data warehouse for Microarray Experiments (MiMiR) MiMiR: Microarray Mining Resource • Microarray Data Warehouse (Est. 2000) – 68 experiments (2900 microarrays) – Includes MGED Ontology browser – MIAME compliant – MAGE-ML output to ArrayExpress – Contains Annotation & Raw microarray data – Used by local annotation team Clinical MiMiR Data Warehouse Aim: Extend MiMiR with clinical information Department of Health award, 2004 (NEAT award) • Software Developer, Medical Informatician, Computing, Travel, Training Consumer Advisory Group • Patient representatives, Medical Ethicists, Media Objectives – – – – Ethical Approval Extend Data Management Infrastructure Ontologies for Sample Description Integration with commercial data mining platform cMiMiR Workflow Clinical Area Microarray Centre Personalised Healthcare ClinML Clinical Information System Anonymised Clinical data Data Mapping Tool cMiMiR Data mining & Visualisation Tools Microarray data Clinical Data entry Laboratory Microarray Database Clinicians Clinical tissue samples Biologist Clinicians Data analysts Researchers cMiMiR: Service Oriented Architecture cMiMiR Browser Web Server Services (EJB) Application Security Gateway DMZ Data Mapping • Define relationship between Clinical Trial Data and a generic Clinical Document exchange format • Available clinical vocabularies – SNOMED-CT – UMLS • Standards for exchange of health-related concepts – Health Level 7 (HL7) – Clinical Data Interchange Standards Consortium (CDISC) • Data mapping initiatives – caBIG – Cancer Grid • Developed generic data mapping tool – Light-weight interchange format (ClinML) Clinical Collaborator’s Toolkit Vocabulary Service SNOMED-CT UMLS other Clue API KSS API other API API Application Programming Interface cMiMiR Concept Mapping Mapping Specification Data Mapping Tool Services (EJB) Record Extraction Clinical Trial Data Query Tool Clinical Collaborator’s Toolkit Security Gateway DMZ Source of Clinical Data ID Age Sex ID PatientID Date Procedure 102 34 Male 78 27 06/06/2002 Hysterectomy 103 27 Female 79 103 23/07/2002 104 12 Female Breast lumpectomy 103 02/01/2006 Mastectomy Patient Table .. 148 Surgery Table Adding a concept which is a numeric quantity Adding a term to a list of possible coded values Integration with Commercial Data Mining & Analysis Suite MAGE-ML Outputter Server cMiMiR Resolver JDBC Server Framework JDBC JDBC Algorithm Framework Services (EJB) Warehouse Framework Data Access Server Sided Scripting Data Pipeline Data Caching Data Analysis Data Annotation Admin Services Security Gateway DMZ Client Framework Core Managers Annotation Managers Data Searches Visualisation Tools Wizards Data Pipeline Monitors Client Basic Science to Hospital Bed Technical Organisational Legal Conclusion • Ongoing initiative • Support the sharing of high quality research data • Insight into genomic function and the environment Human Genome Project in support of Clinical Practice