c Mi R linical

advertisement
clinical Microarray Mining Resource:
Building a clinical data warehouse for mining
patient-based gene expression data from DNA microarrays
Medical Informatician:
Software Developer:
Software Developer:
Head of Data Warehousing:
Head of Microarray Centre:
Academic Advisor:
Tito Castillo
Stelios Alexandrakis
Bhuwan Tiwari
Chris Tomlinson
Laurence Game
Tim Aitman
9th March 2006, Integrated Health Records (IHR) - Practice and Technology, National e-Science Centre
Presentation Outline
• Gene Expression, Microarrays & the
Clinician
• Existing Standards and Resources
• Microarray Centre History
• cMiMiR Project
• Conclusions
What is Gene Expression?
• Gene's coded information
(DNA) is translated into cell
structures.
Nucleus
• 1000s of different genes.
Gene 1
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1
RNA
Gene 1 Translation
RNA
RNA
• Measurement of RNA allow us
to quantify gene expression.
– Low levels
– High levels
Down regulation
Up regulation
• Gene expression reflect
disease processes.
DNA
Gene 1
RNA
Cytoplasm
Protein
Transcription
Microarray Analysis
• Microarrays contain 1000s of
sites to detect transcripts.
• Create a “molecular fingerprint”
of a sample of cells
• Allow global approach to
analysis of gene expression
• Powerful tool for molecular
classification of disease states
• ~50MB data per microarray
Down regulated gene
Up regulated gene
Disease Classification
Microarray analysis in
Acute Lymphoid vs.
Acute Myeloid Leukaemia
Golub et al, 1999
Science 286:531-7
“ …….. Class prediction provides an unbiased,
general approach to constructing prognostic
tests, provided that one has a collection of
tumour samples for which eventual outcome
is known.”
A Gene-Expression Signature as a
Predictor of Survival in Breast Cancer
van de Vijver et al, NEJM 2002
70 gene prognosis profile used
Of a total of 295 consecutive patients with breast carcinoma
• 180 poor-prognosis signature
• 115 good-prognosis signature
“… more powerful predictor of outcome of disease
in patients with breast cancer than standard systems
based on clinical and histological criteria.”
Inter-operability standards
MGED Ontology
• Standards for microarray
experiments
– Standard vocabulary
• MGED Ontology
– Standard interchange format
• MAGE-ML
MGED Core Ontology
Audit And Security Package
Roles
Consultant
Investigator
Institution
Hardware manufacturer
Curator
Funder
Submitter
Minumum Information About
a MicroArray Experiment
(MIAME)
MAGE-ML
.
<Person identifier="someplace.org:Person:j_bloggs"
email="j.bloggs@someplace.org"
lastName="Bloggs“ firstName="Joe">
<Roles_assnlist>
<OntologyEntry category="Roles" value="submitter“/>
</Roles_assnlist>
</Person>
.
Public Repository Services
European Bioinformatics Institute (Array Express)
National Centre for Biotechnology Information (GEO)
–
–
Submission of experimental details for peer review
Required by most major scientific journals
Repository
Total experiments
Clinical Trials(%)
ArrayExpress
1206
35 (3%)
GEO
3009
366 (12%)
Imperial/CSC Microarray Centre
• Established March 2000
• Serves 200+ research
groups at CSC and
Imperial
• Data warehouse for
Microarray Experiments
(MiMiR)
MiMiR: Microarray Mining Resource
• Microarray Data Warehouse (Est. 2000)
– 68 experiments (2900 microarrays)
– Includes MGED Ontology browser
– MIAME compliant
– MAGE-ML output to ArrayExpress
– Contains Annotation & Raw microarray data
– Used by local annotation team
Clinical MiMiR Data Warehouse
Aim: Extend MiMiR with clinical information
Department of Health award, 2004 (NEAT award)
• Software Developer, Medical Informatician, Computing, Travel,
Training
Consumer Advisory Group
• Patient representatives, Medical Ethicists, Media
Objectives
–
–
–
–
Ethical Approval
Extend Data Management Infrastructure
Ontologies for Sample Description
Integration with commercial data mining platform
cMiMiR Workflow
Clinical Area
Microarray Centre
Personalised Healthcare
ClinML
Clinical
Information
System
Anonymised
Clinical data
Data
Mapping
Tool
cMiMiR
Data mining &
Visualisation
Tools
Microarray
data
Clinical
Data
entry
Laboratory
Microarray
Database
Clinicians
Clinical tissue
samples
Biologist
Clinicians
Data analysts
Researchers
cMiMiR: Service Oriented Architecture
cMiMiR
Browser
Web
Server
Services (EJB)
Application
Security Gateway
DMZ
Data Mapping
• Define relationship between Clinical Trial Data and a
generic Clinical Document exchange format
• Available clinical vocabularies
– SNOMED-CT
– UMLS
• Standards for exchange of health-related concepts
– Health Level 7 (HL7)
– Clinical Data Interchange Standards Consortium (CDISC)
• Data mapping initiatives
– caBIG
– Cancer Grid
• Developed generic data mapping tool
– Light-weight interchange format (ClinML)
Clinical Collaborator’s Toolkit
Vocabulary
Service
SNOMED-CT
UMLS
other
Clue API
KSS API
other API
API
Application
Programming
Interface
cMiMiR
Concept
Mapping
Mapping
Specification
Data Mapping Tool
Services
(EJB)
Record
Extraction
Clinical
Trial
Data
Query Tool
Clinical Collaborator’s
Toolkit
Security
Gateway
DMZ
Source of Clinical Data
ID
Age
Sex
ID
PatientID
Date
Procedure
102 34
Male
78
27
06/06/2002
Hysterectomy
103 27
Female
79
103
23/07/2002
104 12
Female
Breast
lumpectomy
103
02/01/2006
Mastectomy
Patient Table
..
148
Surgery Table
Adding a concept which
is a numeric quantity
Adding a term to a list of
possible coded values
Integration with Commercial Data
Mining & Analysis Suite
MAGE-ML
Outputter
Server
cMiMiR
Resolver
JDBC
Server Framework
JDBC
JDBC
Algorithm Framework
Services
(EJB)
Warehouse Framework
Data Access
Server Sided Scripting
Data Pipeline
Data Caching
Data Analysis
Data Annotation
Admin Services
Security
Gateway
DMZ
Client Framework
Core Managers
Annotation Managers
Data Searches
Visualisation Tools
Wizards
Data Pipeline Monitors
Client
Basic Science to Hospital Bed
Technical
Organisational
Legal
Conclusion
• Ongoing initiative
• Support the sharing of high quality research data
• Insight into genomic function and the environment
Human Genome Project in support of Clinical Practice
Download