Teasing out the VOLS from the VOUS
-‐
Christopher E. Mason
DNA RNA Protein
1) Variant calling
2) Genome InterpretaAon / Pharmacogenomics
3) Data Provenance and Management
Oliver Hofmann and Brad Chapman: hPp://bcbio.wordpress.com/2013/05/06/framework-‐for-‐evaluaAng-‐variant-‐detecAon-‐methods-‐ comparison-‐of-‐aligners-‐and-‐callers/
Does a $1,000 genome need a $100,000 interpretaAon? At least a big phone bill.
Genome, NGS, and Clinical
Standards
Groups
Personalized
Medicine using
Variant
AnnotaAon and
ContextualizaAon
-‐NGS -SEQC
Cloud-‐based
Approaches to
InformaAcs and
Sequencing
PaAent and Data
Sharing IniAaAves for Treatment
R-‐make is open source (under review): hPp://physiology.med.cornell.edu/faculty/mason/lab/data/r-‐make/
Intra-‐paAent allelic dynamics detail cell heterogeneity:
Same paAent at diagnosis (D) and relapse (R)
Paired RNA-‐Seq reveals a seemingly endless ocean of private mutaAons
Using RNA-‐Seq to find chemo-‐resistant clones in ALL
Meyer et al, Nature Gene*cs , 2013
Meyer et al, Nature Gene*cs , 2013
We see enrichment of a gene for nucleoAde metabolism, mutaAons near binding pocket
NT5C2: 5'-‐nucleoAdase (purine), cytosolic type II
Meyer et al, Nature GeneAcs, 2013
That took four years.
What if we did it again today?
Lanvis, GSK (thioguanine) Mercaptopurine
Only Significantly Associated Clinical
Variable Was Early Relapse
7 of 40
Meyer et al, Nature Gene*cs , 2013
We see enrichment of a gene for nucleoAde metabolism, mutaAons near binding pocket
NT5C2: 5'-‐nucleoAdase (purine), cytosolic type II
Meyer et al, Nature GeneAcs, 2013
NT5C2 Mutants Confer Chemoresistance to
Purine Nucleoside Analogue Treatment
6-‐MP 6-‐TG
Reh cells transiently lenAviral infected with WT, GFP, and mutants
Meyer et al, Nature GeneAcs, 2013
Meyer et al, Nature GeneAcs, 2013
The NIH Undiagnosed Diseases Program (UDP)
• Launched May 19, 2008
• Supported by the Office of Rare Diseases, NHGRI and the NIH Clinical Center.
• Goals:
– Addresses unmet need for diagnosis of mysterious diseases.
– Helps discover new diseases that provide insight into human physiology and genetics.
UDP441 spiny follicular hyperkeratosis case
• 50 year-old Caucasian woman.
• Spiny follicular hyperkeratosis with alopecia, follicular plugging, abscesses.
• Exacerbated by UV-A light and oral retinoid therapies.
• Ruled out infections, hormone and paraneoplastic causes.
• No known dermatological causes.
• RNA-Seq and WGS
Already we may have been able to predict this potenAal from the germline DNA
Found mutaAons in gene involved in follicle development and cell cycle
50Tb
More genomes are coming…
100 TeraBytes is only the beginning
Current -‐ 1,000 genomes = 100 TeraB
WGS for all new U.S. babies/year:
4,300,000 genomes = 430 PetaB
WGS for everyone >40 in U.S. in 2013
100 million genomes = 10 ExaB
Every human in China
1.3 billion genomes = 130 ExaB
Each biopsy/checkup/visit… per paAent
100 billion genomes = 10 ZePaB
And that is only the Genome!
Family
Medical
History
Robust
Sample Prep
& Data
AcquisiAon
AnnotaAons,
TDBGV miRbase,
TCGA,
1KG,dbSNP
StaAsAcal
Models Built from Per-‐
Base, High-‐
Res Data
G
Bacteria
/Fungi
Virus
G
1. Mass spectrophotometry data
1.
Protein-‐Protein InteracAons
2.
Polysome-‐Associated mRNAs
1.
2.
3.
mC or hmC modificaAons
Histone marks (HXXX)
Regulatory informaAon
Nucleosome effects
Gene regulaAon
1.
Variants (SNVs, indels)
2.
Structural variants (CNVs, TxC)
3.
Ancestry predicAon from
AIMs
RNA edits, eQTLs
TFBS
mRNAs degrade
1.
Small Variants (SNVs, indels)
2.
DE by exon, gene, intron, & jxn
3.
PolyA sites and miRNA changes
4.
TARs and ORF potenAal
5.
Gene fusion events
6.
Allele-‐specific expression
1. Metabolic changes drugs vitamin hormone
N other species
Environment
T=0 T=1
Environment
T=N
hPp://www.allthingstechnology.net/2011/07/how-‐much-‐byte-‐make-‐yoPabyte.html
$100 for one terabyte
1YB=1 trillion TB
1YB = $100 trillion
…
We need per-‐base, per-‐allele, per generaAon data chr pos bcalls chr10 79090 14 chr10 79091 13 chr10 79092 13 chr10 79093 12 chr10 79094 14 chr10 79095 14 chr10 79096 15 chr10 79097 15 chr10 79098 15 chr10 79099 15 chr10 79100 15 chr10 79101 15 chr10 79102 15 chr10 79103 13 chr10 79104 8 chr10 79105 8 chr10 79106 8 chr10 79107 8 chr10 79108 8 chr10 79109 8 chr10 79110 8 bcalls filt
5
5
5
1
1
5
5
5
5
1
1
1
2
1
2
2
4
4
2
4
4 ref
Q
(snp)
GT
A 0
A 0
A 0
C 0
T 0
T 0
A 0
C 0
A 0
A 0
T 0
T 0
C 0
T 0
A 40 AG
A 0 AA
A 0
A 0
AA
AA
T 0
A 0
A 0
TT
AA
AA
AA
TT
TT
CC
TT
AA
AA
AA
CC
TT
TT
AA
CC
AA
AA
AA
AA
CC
TT
TT
AA
CC
AA
AG
AA
AA
AA
TT
AA
AA
AA
TT
TT
CC
TT
Q max_gt poly site
Q (max_gt
|poly_site) As used
75
69
54
54
54
54
54
54
54
40
75
75
75
75
75
75
72
69
69
66
72
42
36
21
21
21
21
21
21
21
73
42
42
42
42
42
42
39
36
36
33
39
0
0
0
15
13
8
8
0
8
8
15
15
15
0
15
15
14
0
0
0
0
Cs used
8
0
0
0
0
0
0
8
0
0
0
0
0
0
0
0
0
0
0
12
0
Gs used Ts used
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
8
8
0
0
0
0
0
0
15
0
0
0
0
0
0
0
13
13
0
14
We can now observe the dynamics and evoluAon of cancers
Ding L, et.al, Clonal evoluAon in relapsed acute myeloid leukemia revealed by whole-‐genome sequencing. Nature. 2012 Jan 11;481(7382):
506-‐10.
A Future of Publically Personalized Medicine
Today I am even more sure I shouldn’t smoke
“All science is either physics or stamp collecAng”
Ernest Rutherford
Awarded Nobel Prize in 1908 for Chemistry, not Physics.
“The energy produced by the breaking down of the atom is a very poor kind of thing. Anyone who expects a source of power from the transformaAon of these atoms is talking moonshine."
– 1933
“An alleged scienAfic discovery has no merit unless it can be explained to a barmaid.”
All physicists were stamp collectors unAl around
1935.
All genomicists will have to be stamp collectors for a few years more.
“When we have found how the nucleus of atoms is built up we shall have found the greatest secret of all — except life.” -‐ Ernest Rutherford