Lecture Slides - McMaster University

advertisement
Introduction to the concept of functional genomics
David Meyre, Associate Professor, McMaster University
(meyred@mcmaster.ca)
HRM 728 Graduate Course: Genetic Epidemiology – October, 24th 2014
Population Genomics
Program
Introduction to the concept of functional genomics
What Is Functional Genomics?
The goal of functional genomics is to understand the
relationship between an organism’s genome and its phenotype.
Functional genomics is a field of molecular biology that is attempting
to make use of the vast wealth of data produced by genome
sequencing projects to describe genome function. Functional
genomics uses high-throughput techniques like DNA microarrays,
proteomics, epigenomics, metagenomics, metabolomics and mutation
analysis to describe the function and interactions of genes.
The genomic revolution
Human genome sequence
High-throughput technologies
GENES
Biostatistics & Bioinformatics
Large human biobanks
FUNCTIONAL
GENOMICS
Gene identification approaches
Genome-wide linkage
Candidate gene
GENES
Homozygosity mapping
Genome-wide association
Full exome / genome sequencing
Classification of human genetic diseases
OBESITY
Syndromic disease
Monogenic disease
Polygenic disease
( < 0.004%)
( < 2 %)
(~ 20%)
Genes and causality
SYNDROMIC / MONOGENIC DISEASE
Beyond co-segregation studies, additional arguments are needed to
demonstrate the causal role of a mutation in the disease
 functional genomics
Genes and causality
POLYGENIC DISEASE (e.g. type 2 diabetes)
Beyond association studies, additional arguments are needed to
demonstrate the causal role of a variant / gene in the disease
 functional genomics
Sladek et al., Nature 2007
Introduction to the concept of functional genomics
I-TRANS-ETHNIC FINE MAPPING APPROACH
Trans-ethnic fine mapping approach
.Linkage disequilibrium is the non-random association of alleles at two or
more loci
. The human genome is composed of blocks of linkage disequilibrium
. The extent of linkage disequilibrium blocks varies according to the ethnic
background
Trans-ethnic fine mapping approach
SNP1
SNP2
SNP3
SNP4
SNP5
Icelandic
French
Asian
African
Distance (Kb)
Causal SNP
Disease-associated
LD block
Trans-ethnic fine mapping approach
. Large-scale resequencing and case control association studies in
Icelandic, Danish, West African and American African subjects identified the
rs903146 as the likely causal type 2 diabetes-associated SNP
Introduction to the concept of functional genomics
II-EVOLUTIONARY GENETICS
Evolutionary genetics
Natural selection is the gradual, non-random
process by which biological traits become either
more or less common in a population as a function
of differential reproduction of their bearers. It is a
key mechanism of evolution. The term "natural
selection" was popularized by Charles Darwin.
Evolutionary genetics (Huxley 1942)
-advantageous mutations have been positively selected in human
populations during recent evolution
-disadvantageous mutations have been negatively selected in human
populations during recent evolution
Evolutionary genetics
THRIFTY GENOTYPE HYPOTHESIS: the 'thrifty' genotype
would have been advantageous for hunter-gatherer
populations, especially child-bearing women, because it
would allow them to fatten more quickly during times of
abundance. Fatter individuals carrying the thrifty genes
would thus better survive times of food scarcity.
 Obesity and type 2 diabetes predisposing mutations may
show evidence of positive signature of evolution
Evolutionary genetics
.The LCT rs4988235 T variant confers lactase persistence
. The LCT rs4988235 T variant is associated with more milk / dairy
products consumption and increased body mass index
. The LCT rs4988235 T variant has a selective advantage in milkproducing dairy farming populations and has been submitted to
positive selection in relation with events of cattle domestication
. The LCT rs4988235 T allele frequency is more frequent in Northern
(MAF: 0.7) than in Southern Europe (MAF: 0.1)
Evolutionary genetics
LCT rs4988235 T allele
frequency in UK
Davey-Smith et al., EJHG 2009
Evolutionary genetics
. Genome-wide approaches in diverse ethnic backgrounds have
identified several hundreds of regions showing recent positive
natural selection
. New methods are able to identify causal variants in regions with
positive natural selection signature
. The amino-acid change Lys109Arg in the LEPR gene is as a causal
variant submitted to positive selection
. The Lys109Arg variant is associated with body mass index
variation
Grossman et al., Science 2010
Introduction to the concept of functional genomics
III-GENE VARIANT AND FUNCTION
Gene variant and function
 Missense, nonsense, frameshift (indels) coding
mutations: altered protein function
 Intron / exon mutations: exon skipping
 Copy Number Variants (CNV): modulation of
gene expression, haplo-insufficiency
 gene variant in the promoter (Transcription
Factor Biding Site): change in gene expression
 gene variant in 3’UTR: altered mRNA stability
 gene variant in microRNAs: change in
expression
 gene variant in a long-range enhancer: change
in expression of another gene
 gene variant in a CpG methylation site: change
in DNA methylation pattern
How to prove causality between a genetic
variant and a biological effect?
In silico prediction studies
Mutations
PolyPhen-2
PANTHER
SIFT
SNAP
PMUT
K26E
-
-
-
-
-
M125I
+
-
-
-
-
T175M
+
+
+
+
+
N180S
+
+
+
+
-
Y181H
+
+
+
+
-
G226R
+
+
-
+
+
S325N
+
+
+
+
-
T558A
+
NA
-
-
-
G593R
+
NA
+
+
+
+: deleterious
-: neutral
Eight coding non-synonymous mutations in the PCSK1 gene have been identified
in extreme obese patients: the Polyphen-2 software (conservation of the aminoacid across evolution + protein structure) is 100% concordant with in vitro
studies
Creemers et al., Diabetes 2012
In vitro functional studies
68% of non-synonymous
mutations found in obese
patients are deleterious
(test alpha-MSH)
Stutzmann et al., Diabetes 2008
Intron / exon mutations and exon skipping
. Extreme obesity cosegregates with homozygosity
for a G/A substitution in the splice donor site of
exon 16 of the LEPR gene
. The intron / exon mutation induces skipping of
exon 16 and a truncated inactive leptin receptor
Clement et al., Nature 1998
CNVs are highly causal variants in mendelian diseases
 a 600kb heterozygous deletion (~30 genes) on chromosome 16p11.2
explains 0.7% of morbid hyperphagic obesity and is associated with
developmental delays
 duplications in the same chromosomal region are associated with
underweight and eating restrictive disorders
 SH2B1, a key modulator of the response to the satiety hormone leptin, and
a Mendelian hyperphagic obesity gene, is located in the deleted interval
Walters et al., Nature 2010; Jacquemont et al., Nature 2012
Gene variation in the promoter and gene expression
. The -11391 G>A variant in the
promoter of the ACDC/adiponectin gene
is associated with higher in vitro
promoter activity and with higher
plasma adiponectin level in lean and in
obese children
Bouatia-Naji et al., Diabetes 2006
Gene variation in 3’UTR and mRNA stability
.A>G +1044 TGA SNP is included in the ENPP1 risk haplotype
associated with higher ENPP1 plasma level and risk of obesity / T2D
.A>G +1044 TGA forms a linkage disequilibrium block in 3’UTR with
A>C +1092 TGA and C>T+1157 TGA
.In HLA cells transfected with either 3’UTR variant or wild-type cDNA,
specific ENPP1 mRNA half-life was increased for those transfected
with 3’UTR variant cDNA (t/2=4.35 vs. 2.55 h; p=0.001)
Meyre et al., unpublished
Gene variation and long-range enhancer
. The obesity-associated FTO intron 1 region directly interacts
with the promoter of IRX3 gene (580 Kb downstream of FTO)
. The intron 1 SNP in FTO modulates IRX3 (but not FTO)
expression
. Irx3-deficient mice display a leanness phenotype
Smemo et al., Nature 2014
Gene variation at a CpG methylation site
. Gene variant rs1421085 in intron 1 of FTO is the main contributor to
polygenic obesity (Dina et al., Nat Genet 2007)
. Gene variant rs7202116, in full linkage disequilibrium with
rs1421085, creates a CpG methylation site and is associated with
increased methylation of a 7.7 kb regulatory region within FTO
. The 7.7 kb regulatory region encapsulates a Highly-Conserved non
Coding Element that acts as a long range gene expression enhancer
Bell et al., PLOS One 2010
Introduction to the concept of functional genomics
IV-GENE CANDIDACY
Gene candidacy
Sometimes several genes are included in a same linkage
disequilibrium block  how can we identify the causal gene(s)?
Sladek et al., Nature 2007
Gene candidacy
. Are genes in the disease-associated LD block involved in
syndromic / monogenic forms of the same disease?
-loci associated with polygenic obesity: MC4R, BDNF, POMC,
PCSK1, SIM1
-GWAS for complex traits: 20% of the GWAS loci include genes
involved in mendelian disorders for the same trait
. Are genes in the disease-associated LD block involved in
a corresponding phenotype in animal models (KO, Tg,
SiRNA)?
-loci associated with polygenic obesity: MC4R, BDNF, POMC,
PCSK1, SIM1, FTO, GIPR, NPC1, SH2B1, TBC1D1, NEGR1
- > 170 genes induce a phenotype of severe obesity in genetic
mice models
. Gene function, biology
-function related to energy metabolism
Gene candidacy
In order to find the causal gene in a disease-associated linkage
disequilibrium block, mRNA expression studies can be useful
(microarrays, RT-PCR):
1-Is the gene expressed in target tissues for the disease (obesity:
brain, adipocytes; T2D: pancreas)?
2-Is the gene mRNA expression modulated by the disease status in a
relevant tissue?
3-Is the gene mRNA expression modulated by the diseaseassociated SNP in a relevant tissue?
Gene candidacy
. ORMDL3 is one of the 19 genes located in the asthma-associated
LD block
. ORMDL3 is expressed in the lung
. ORMDL3 mRNA level is modulated by asthma disease status in
lymphoblastoid cell lines
. ORMDL3 mRNA level is strongly modulated by the asthmaassociated SNP in lymphoblastoid cell lines
 ORMDL3 is a highly relevant candidate gene at this locus
Moffatt et al., Nature 2007
Cis versus Trans e-QTLs?
. The polymorphism rs9585056 is associated with T1D,
modulates the expression of the cis-gene GPR183 and the
expression of the IRF7 network genes
Heinig et al., Nature 2010
Gene candidacy
. Combination of expression mRNA and GWAS studies
. 27 genes differentially regulated in adipose tissue of
monozygotic twins discordant for obesity
. ‘Hypothesis driven’ GWAS analysis for these 27 genes followed
by a replication in a second independent sample identified a
novel obesity gene: F13A1
Naukkarinen et al., PLOS Genet 2010
Introduction to the concept of functional genomics
V-STUDY OF ENDOPHENOTYPES
Study of endophenotypes
. Rs17782313 near MC4R has been associated with BMI by GWAS
. Deleterious coding mutations in MC4R are the commonest form of
monogenic obesity with hyperphagia and increased stature
. If the SNP modulates the expression / function of MC4R, we can
predict associations with the same traits in an appropriate direction
. The SNP rs17782313 obesity predisposing allele is associated with
more snacking and overeating and increased stature
 MC4R is a highly relevant candidate gene at this locus
Stutzmann et al., Int J Obes 2009
Introduction to the concept of functional genomics
I-TRANS-ETHNIC FINE MAPPING APPROACH
II-EVOLUTIONARY GENETICS
III-GENE VARIANT AND FUNCTION
IV-GENE CANDIDACY
V-STUDY OF ENDOPHENOTYPES
FTO, a good illustration of integrative approach
.Novel variants identified in African
populations
.FTO SNP shows evidence of positive natural
selection
.The SNP is associated with different
patterns of methylation (demethylase)
. FTO complete deficiency leads to a
polymalformative lethal syndrome in humans
. FTO partial deficiency does not relate to
leanness/obesity in humans
. FTO knock-out mice are lean, FTO
transgenic mice are obese
. FTO is highly expressed in hypothalamus
and is regulated by fasting and feeding
. FTO SNP is associated with food intake in
humans
Ichimura et al., Nature 2012
ANY QUESTIONS?
The French fair-play!
Download