Document 13553440

advertisement
16th Annual Mee-ng of the Organiza-on for Human Brain Mapping Catalonia Palace of Congreses, Barcelona, Spain June 6-­‐10, 2010 Structure and Analysis
of Genetic Variation
Structure & Analysis of
Genetic Variation
•  What we will look into:
–  Genes, promoters, micro RNA
–  SNP, CNV, microsatellites
–  methylation
–  Cis - trans-acting & epistasis *
–  transcriptome to Genes
–  Genes to Pathways
Let’s see first how we
have modified our idea
of what a genetic
disorder looks like …
The “old” paradigm of genetics …
Cystic Fibrosis
Key concepts in genetic epidemiology of complex traits
… and how now we think genetics works..
Multiple gene variants interacting with each other,
and with multiple environmental factors
Key concepts in genetic epidemiology of complex traits
Variability in human
“Variability is the law of life, and
as no two faces are the same,
so no two bodies are alike,
and no two individuals react alike,
and behave alike under the
abnormal conditions which we
know as disease.”
Sir William Osler
(1849-1919)
What is a gene? A real example, the DRD4
Regulatory regions
coding region
not coding region
12bp repeat
(+61 to +85)
48 bp
VNTR
C-11T
(SmaI)
120bp
duplication
(PstI)
5’
Exon I
Exon II
(G)n repeat
Gly11Arg
(G+31C)
Exon III
G+492T
(high freq)
Exon IV
Val194Gly
(T+581G)
(low freq)
(low freq)
Single Nucleotide Polymorphism
3’
C+870A
(high freq)
Classes of human genetic variants
Frazer et al., 2009 Nature Rev Genet. 10:241-51.
Key concepts in genetic epidemiology of complex traits
Human genome and human variability
Our DNA is made up of ~ 4 billion nucleotides which are like
“letters”, eg. ACGGCATTGC ….that make up “words”, aka genes.
Each “word” codes for an amino acid, and a sequence of these
“words” codes for a “sentence”, a particular protein made by a cell..
Human genome and human variability
Our DNA is made up of ~ 4 billion nucleotides which are like
“letters”, eg. ACGGCATTGC ….that make up “words”, aka genes.
Each “word” codes for an amino acid, and a sequence of these
“words” codes for a “sentence”, a particular protein made by a cell..
Each of us differs from another by ~ 4 million of these “letters”
Single nucleotide polymorphism or “SNP”
Person A
Person B
Key-words in genetics
Single Nucleotide
SNPs
A B C D E F G H
Polymorphisms
chr 1
at different loci
alleles
C A A C A T G C
chr 1
alleles
T
C A T C T T
genotype
t
en
Example = gene . ic – g .. etic
r
am
C
haplotype
general population
genomic
sequences
... A
C
T
T
T
G
A ...!
... A
T
T
T
T
G
A ...!
= SNP
(Single Nucleotide
Polymorphism)
… today we know almost 15 - 18 million common SNPs
(and many more not so common)
Person A
Person B
Person C
Person D
Person A
Person B
Person C
Person D
Key concepts in genetic epidemiology of complex traits
A new technology: DNA MICROARRAYS
Allow us to detect these SNPs ….
← ~1000s individuals
~ 1M + SNPs →
AA
AG
AA
GG
AG
AA
..
AC
AC
AC
AA
AC
CC
..
CT
CC
CC
CT
CC
CC
..
GG
GG
GG
AG
GG
GG
..
TT
AT
AA
AT
AA
AT
..
CT
TT
CT
TT
CT
CC
..
..
..
..
..
..
..
..
Single Nucleotide Polymorphisms
Population-based designs: what does practically means?
Population-based :Cases and unrelated population controls from the same study
base
Affected Individuals (CASES)
A
a
656
879
CONTROLS 525
471
CASES
A
a
606
929
CONTROLS 555
441
CASES
p-value = 0.3
Not affected Individuals (CONTROLS)
p-value = 0.1
A
a
856
679
CONTROLS 325
671
CASES
p-value = 0.01 !!!
SIGNIFICANT !! …and so on!
WHOLE GENOME SCAN ASSOCIATION
1
2
3
Known Gene
New Gene
4
5
6
How many genes?
•  In complex traits, there are genes acting together and we
must understand “how” if we want to understand the biology
of disease:
modeling gene^gene interactions – the Epistasis effect
Gene A
Gene B
+
+ +
+ + +
+ +
+
+ + +++++
PNAS 105: 12387-92, 2008
GWAS..here we are..
Outcome of a Genome wide association study
What are CNVs?
Stretches of DNA larger than 1 kb that display copy
number differences in comparison to a reference genome
Types of Genomic Structural Changes Affecting Segments of DNA
Leading to Deletions, Duplications, Inversions, and CNV Changes
(biallelic, Multiallelic, and Complex
The current map of human structural variation is far from
complete….
http://humanparalogy.gs.washington.edu/structuralvariation/
http://projects.tcag.ca/variation/
Am J Hum Genet. 2009;84:148-61
~16% of the
genome
Low
frequency
High
frequency
…..converging evidences on loci
Chr. 1q21.1
Chr. 15q11.2
Chr. 15q13.3
…..converging evidences on genes
… genomic burden of rare variants
Science 320: 539-543, 2008
Nat Rev Genet 40: 8881-885, 2008
Genetics of complex disorders:
what has been achieved so far
“Traditional” genetic methods (eg, association)
in a genomic perspective point to extreme
alternatives:
•  GWAS
•  CNV/CNP
CVCD
RVCD
Dissecting the genetic basis of disease risk requires
measuring all forms of genetic variation, including
SNPs and copy number variants (CNVs) ……..
…….. Most common, diallelic CNPs were in strong
linkage disequilibrium with SNPs, and most lowfrequency CNVs segregated on specific SNP
haplotypes
Nature Genetics 40: 1168-74, 2008
Looking at individual structural variants with sequencing
technologies
PLoS Biol. 2007 4;5:e254.
Looking at individual structural variants with sequencing
technologies
PLoS Biol. 2007 4;5:e254.
4.1 million DNA variants (~12.3 Mb)
1.288.319 (~30%) novel!
We have uncovered only the tip of the iceberg…
…each genomic region contributes a
modest effect, and collectively all
associated region for a given trait
explain only a small fraction (5-10%) of
the observed phenotypic variation
attributed to genetic elements…
Outcome of a Genome wide association study
We have uncovered only the tip of the iceberg…
…each genomic region contributes a
modest effect, and collectively all
associated region for a given trait
explain only a small fraction (5-10%) of
the observed phenotypic variation
attributed to genetic elements…
Where is the rest of the missing heritability?
•  Incomplete marker coverage
• Allelic heterogeneity at a given locus
• Contribution of rare variants, including
structural (CNVs and smaller) variants
• Epistatic interactions
• GxE interactions
• Epigenetic modifications
• Overestimation of heritability
Outcome of a Genome wide association study
Once a gene has been mapped:
understanding the function …
Outcome of a Genome wide association study
Where is the culprit?
Functional variants and affected genes
Ioannidis et al., 2009 Nature Rev Genet. 10:318-29.
Key concepts in genetic epidemiology of complex traits
…. Our results suggest that there are at least 35%
more functional promoters in the human genome
than previously annotated.
RNA interference
RNA interference (RNAi) is an evolutionarily conserved mechanism
that uses short antisense RNAs that are generated by ‘dicing’ dsRNA
precursors to target corresponding mRNAs for cleavage.
However, recent developments have revealed that there is also
extensive involvement of RNAi-related processes in regulation at the
genome level. dsRNA and proteins of the RNAi machinery can direct
epigenetic alterations to homologous DNA sequences to induce
transcriptional gene silencing or, in extreme cases, DNA elimination.
Furthermore, in some organisms RNAi silences unpaired DNA regions
during meiosis. These mechanisms facilitate the directed silencing of
specific genomic regions.
ON
OFF
ATTCGGTCTTACCGATATTCGG
From S. Beck 2008
Integrated genomic approach
phen MVPs
BisSeq
DMRmap
Genome
HapMap
tag SNPs
WGAmap
DeepSeq
phen SNPs
candidate ‘(dys)-functional gene’
From S. Beck 2008
From Genes to Pathways:
toward a systemic understanding
of disease
Human GWAS legacy data SNP
Disease Phenotype In Genome-Wide Association Studies
(GWAS) our goal is to find out the
relationship between a Single Nucleotide
Polymorphism (SNP, as a proxy for a
gene) and the Disease Phenotype of
interest
Human GWAS legacy data Disease Phenotype SNP Gen
e
Protein Neuronal func-on Neural
circuitry
Human GWAS legacy data Animal models SNP
Gen
e
Protein Neuronal Disease Phenotype Neural
circuitry
func-on Bioinforma-cs Systems Biology addresses links between
SNPs and human phenotype originally
identified by GWAS …..
Human GWAS legacy data Animal models SNP
Genes Genome Non-­‐
coding RNA PROTEIN Neuronal Disease Phenotype Neural
circuitry
func-on Bioinforma-cs ….. including information related to
WHOLE genomic complexity
From single genes to networks
Genes associated with asthma
Leukemia disease network
Inferred Networks based on mouse PFC expression data
Gene interaction network inferred from prefrontal cortex gene expression in 42 different
inbred mouse strains. Schizophrenia candidate genes from GWAS are in yellow. Some
unexpected connections: DACT3 (circled), encodes regulator of Wnt signaling that has
been linking to schizophrenia41-43.
AHI1 exon expression in brain, LCL and
immortalized cell-lines (187 subjects):
a TE-derived TSS effect?
Sequencing the genome of schizophrenic patients
BioDataInsight: the mining Space Paris HVP -­‐ Andrea Calabria -­‐ andrea.calabria@unimi.it Biological knowledge and annota;ons i.e. genes annota-ons, pathways, proteins, SNPs, etc Mining Phenotypes i.e. clinical data, pa-ents records, etc Experimental Data i.e. Genotyping, Sequencing, etc Results of Analyses i.e. TE analyses Graphical Engine Paris HVP -­‐ Andrea Calabria -­‐ andrea.calabria@unimi.it Graphical Data Representa-on End Users Applica+on Engine Query Builder Database Engine Data Integra-on & Data Warehouse Annota-on and Knowledge Data Experimental Data Analyses’ Results Paris HVP -­‐ Andrea Calabria -­‐ andrea.calabria@unimi.it Annota-on Data Extended Analysis Data Experimental Data 
Download