Slides

advertisement
Are you ready for the genomic age?
An introduction to human genomics
Jacques Fellay
EPFL School of Life Sciences
Swiss Institute of Bioinformatics
Lausanne, Switzerland
What is the genome?
“It's a shop manual, with an incredibly detailed
blueprint for building every human cell.
It's a history book - a narrative of the journey of
our species through time.
It's a transformative textbook of medicine, with
insights that will give health care providers new
powers to treat, prevent and cure disease.”
Francis Collins
Glossary
• Genome: the complete genetic constitution of an
organism, encoded in nucleic acids
• Gene: discrete DNA sequence encoding a protein
The human genome
3 billions base pairs (ATGC)
20’000 protein-coding genes
99.6% inter-individual identity (yet 4 millions differences)
99% identical to chimpanzee genome (yet 6% different genes)
2001: A Species Odyssey
Exploring the human genome
2002
Sanger sequencing,
targeted genotyping
2008
Genome-wide
genotyping (GWAS)
Exome
Genome
sequencing sequencing
International HapMap Project
 Identification of common genetic variation in
270 individuals from 4 populations
• CEU: CEPH (Utah residents with ancestry from northern and
western Europe) (30 trios)
• CHB: Han Chinese in Beijing, China (45 individuals)
• JPT: Japanese in Tokyo, Japan (45 individuals)
• YRI: Yoruba in Ibadan, Nigeria (30 trios)
1000 Genomes Project
Whole genome sequencing and complete description
of human genetic diversity in >1000 individuals from
multiple world populations
www.1000genomes.org
Short video – Sequencing the genome
http://ed.ted.com/lessons/how-to-sequence-thehuman-genome-mark-j-kiel
We are all different…
4 million DNA variants / individual

Single nucleotide variants

Multi-nucleotide variants
• Small insertions/deletions (indels)
• Large copy number variants (CNVs)
• Inversions
• Translocations
• Aneuploidy
Glossary
• SNV = single nucleotide variant:
DNA sequence variation in which a single nucleotide
— A, T, C or G — differs between members of the
same species
• SNP = single nucleotide polymorphism:
SNV occurring commonly within a population (> 1%)
SNV/SNP
Glossary
• Allele: One of a number of alternative forms of the
same genetic locus (for example a SNP)
About 2% of people have two copies of the APOE4 allele and are
very likely to succumb to Alzheimer’s disease
About 1% of us have two copies of a small deletion in CCR5 and
are largely immune to infection by the HIV virus
And about 7% do not make any functional CYP2D6 enzyme and
therefore codeine provides no pain relief
Glossary
• Linkage Disequilibrium (LD): Non-random
association of alleles that descend from single,
ancestral chromosomes (i.e. usually close to each
other)
• Haplotype: Combination of alleles at adjacent
locations on a chromosome that are inherited
together
How to read the genome?
Genotyping
Sequencing
Glossary
• Genotyping:
Process of determining genetic differences between
individuals by using a set of markers
• Sequencing:
Process of determining the full nucleotide order of a
DNA sequence
Genotyping
Genome-wide chips:
500K to >1 mio single nucleotide polymorphisms (SNPs)
SNP output
rs1372493
rs1372493
1.60
16000
1.40
14000
1.20
12000
1
8000
Norm R
Intensity (B)
10000
6000
0.80
0.60
4000
0.40
2000
0.20
0
0
-2000
2317
834
74
-0.20
0
2000
4000
6000
8000
10000
Intensity (A)
12000
14000
16000
18000
20000
0
0.20
0.40
0.60
Norm Theta
0.80
1
Homozygous 1
Heterozygous
Homozygous 2
Allele frequency of variant
<<<<<1%
>5%
Sequencing
+++
Genome-wide
genotyping
++
Clinical impact
+
High-throughput Sequencing (NGS)
– Huge amount of data (terabytes)
– Analysis computationally intensive
– Dedicated IT infrastructure
• Pipeline
FastQ format – single read
@G:1:1:11:1079#0/1
TGATTGATTCCATTCCATTCCATTCCATTTCATTCCATTGCAATCCCTTCCAATCCATTCCATTCCATTCCATTC
+G:1:1:11:1079#0/1
`Xa^YO\_^a_`__`a__^a^a^_a``^_\`\\]``[XUGXXXXXWUTWWVWUSTXXPUWYYRVWYYYXZYXYWZ
A complete, high-coverage genome
will have over 1 billion reads
• Pipeline
• Pipeline
• Pipeline
• Pipeline
http://www.ncbi.nlm.nih.gov/core/assets/variation/images/popfreq_example.jpg
• Pipeline
Summary of a single human genome
SNVs
Premature stop
3.5 million
80
Stop loss
Non-synonymous
Synonymous
10
11,000
11,000
Essential splice site
25
indels
300,000
Frameshift
In-frame
80
200
Whole genome vs. exome sequencing
Exome
-Coding regions
-Cheaper/Faster
-Uneven capture of both alleles
-Incomplete capture of target region
-Bias towards known biology
Genome
-Complete sequence
-Expensive/Throughput
-IT issues
Clinical sequencing?
“Sequencing of the genome or exome for clinical
applications has now entered medical practice.
Several thousand tests have already been ordered
for patients, with the goal of establishing
diagnoses for rare, clinically unrecognizable, or
puzzling disorders that are suspected to be genetic
in origin.”
Leslie G. Biesecker and Robert C. Green, NEJM, 19 June 2014
Clinical sequencing?
TODAY
• Rare functional variants (Mendelian diseases)
• Pharmacogenetic variants (150 gene-drug
pairs in the FDA “Table of Pharmacogenomic
Biomarkers in Drug Labels”, but only 40 genes
involved)
• Oncogenomics
IL28B genotype and response
to anti-hepatitis C treatment
Ge, Fellay et al. Nature 2009
Clinical sequencing?
TOMORROW
• Neonatal sequencing
• Maternal blood sequencing
• DTC genomics brought to doctors
Clinical sequencing?
LATER
• Complex trait genomics (genome data in
every health record) – will depend on indepth understanding of functional genomic
variation
A revolution in the making
Eric Green et al., Charting a course for genomic medicine from base pairs to bedside, Nature 2011
Perspective
• Genomic-based medicine is around the corner
• Considerable space for new (personal) genomic
market in health, nutrition, well-being…
• Genomic-based medicine is only the beginning
of “big-data-based” personalized healthcare
Perspective
None of this can happen
without trust
Download