Supplementary Methods (doc 360K)

advertisement
1
SUPPLEMENTAL METHODS
Analysis of expression quantitative trait loci (eQTL) at the GATA4 locus in silico
eQTL mapping is used to identify genetic variants which affect the regulation of genes,
considering genome-wide mRNA expression levels as quantitative phenotypes. Expression
phenotypes are mapped to genome-wide SNPs. Some SNPs that influence the mRNA level of a
gene are mapped to eQTLs which are in the proximal region of the gene, i.e. they are putatively
cis active. Less often SNPs map to to distal, putatively trans active eQTLs42. The eQTL browser
(
http://eqtl.uchicago.edu) summarizes eQTL loci from four recent large-scale
eQTL studies.14,43-45
GATA4 amino acid position 377: test for functional constraint at the single amino acid site
Amino-acid changing mutations which have a functional impact on a single amino acid site and
affect protein function are constantly being removed from the gene pool. Nonsynonymous (amino
acid changing, replacement) substitutions in an amino acid position lead to a change in the
protein primary sequence and are more likely to have an influence on protein function than
random synonymous (silent) mutations that leave the primary protein sequence unchanged. The
degree of constraint on a single amino-acid site can be estimated from an alignment of the human
protein sequence with that of diverse genetic model organisms. No occurrence of
nonsynonymous substitutions in an alignment of diverse organisms indicates a functional
constraint at the single amino acid site.
In the GATA4 protein, serine (S) at amino acid position 377 (as well as the corresponding major
human allele A on the nucleotide level), remained unchanged across a wide monophyletic group46
comprising human, rhesus, mouse, rat, rabbit, dog and cow, but is variable outside this clade
(supplementary figure 2). Threonine (T) is present at amino acid position 377 in armadillo, tenrec,
opossum; proline (P) is observed in the chicken and serine (S) in the frog. In contrast to the weak
conservation of serine (S) at position 377, some invariant amino acid positions that are supposed
2
to be under stronger functional constraint are found nearby, e.g., tyrosine (Y) at position 374
(supplementary figure 2).
Analysis of SNP localization in conserved splice sites
The human GATA4 gene belongs to the 92–94% of human genes that undergo alternative
splicing47;
http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?exdb=AceView&db=36a&term=GATA
4). SNPs can influence splicing and can lead to changes in disease susceptibility48. To analyse
rs13273672 and the 16 SNPs in complete LD (D´= 1.0; Supplementary table 1) for a possible
influence on splicing, we searched if they are listed as SNPs in conserved splice sites in the
Genome-Wide Splice-Site Single Nucleotide Polymorphism Database
(http://variome.kobic.re.kr/ssSNPTarget/). None of these 17 SNPs was among the 151 entries in
conserved splice sites listed for human chromosome 8.Supplementary figure 1. Linkage
disequilibrium at the GATA4 locus, based on HapMapIII data.
3
Supplementary figure 2. Alignment of partial GATA4 (NM_002052) orthologous sequences of
human and diverse genetic model organisms. Ser377Gly (grey, rs3729856 in dark grey) is codon
377 in the NM_002052 mRNA. Upper line: amino acid sequence; lower line: corresponding
nucleotide sequence; in brackets: version of whole-genome sequence assembly of the respective
organism.
G
GGC
↑
372373374375376377378379380381382
S H Y G H S S S V S Q
TCTCACTACGGGCACAGCAGCTCCGTGTCCCAG
Rhesus(rheMac2)
S H Y G H S S S V S Q
TCTCACTACGGGCACAGCAGCTCCGTGTCCCAG
Mouse (mm8)
S H Y G H S S S M S Q
TCTCACTATGGGCACAGCAGCTCCATGTCCCAG
Rat (rn4)
S H Y G H S S S M S Q
TCTCACTATGGGCACAGCAGCTCCATGTCCCAG
Rabbit(oryCun1)
S H Y G H S S S M S Q
TCTCACTATGGCCACAGCAGCTCCATGTCCCAG
Dog (canFam2)
S H Y G H S S S M S Q
TCCCACTATGGGCACAGCAGCTCCATGTCCCAG
Cow (bosTau2)
S H Y G P S S S L S Q
TCCCACTATGGGCCCAGCAGCTCCCTGTCGCAG
Armadillo(dasNov1) S H Y G H T S P L S Q
TCTCACTACGGGCACACCAGCCCCTTGTCCCAG
Tenrec(echTel1)
S H Y G H T S P M S Q
TCTCACTATGGGCACACCAGCCCCATGTCCCAG
Opossum(monDom4)
S H Y G H T S P M S Q
TCCCATTATGGACATACTAGCCCCATGTCTCAG
Chicken(galGal2)
S H Y G H P S P I S Q
TCTCATTATGGGCACCCCAGCCCAATTTCTCAG
Frog (xenTro1)
P P Y G H S S S L S Q
CCTCCATATGGCCATTCGAGTTCTCTATCTCAG
Human(hg18)
serine (S)
codon AGC
Supplementary table 1. SNPs in complete LD (D´=1) with rs13273672 in HapMapIII data, r2
values, their distance to rs13273672 and localization according to gene structure. Only SNPs with
minor allele frequency >= 0.05 are regarded here. Annotation of functional features overlapping
SNP positions.
Marker
Distance to
rs13273672
r²
Localisation
rs7827193
15335
0,016
intron (GATA4 NM_002052.3)
Expression
quantitiative
trait locus
(eQTL)
-
Amino acid
exchange
Splice site
SNP
-
-
rs904006
8355
0,050
intron (GATA4 NM_002052.3)
-
-
-
rs10503425
6017
0,069
intron (GATA4 NM_002052.3)
-
-
-
rs11784693
1707
0,186
intron (GATA4 NM_002052.3)
-
-
-
rs11250164
1695
0,184
intron (GATA4 NM_002052.3)
-
-
-
rs804283
1338
0,184
intron (GATA4 NM_002052.3)
-
-
-
rs17153747
1028
0,043
intron (GATA4 NM_002052.3)
-
-
-
rs13262643
102
0,235
intron (GATA4 NM_002052.3)
-
-
-
rs13264774
66
0,295
intron (GATA4 NM_002052.3)
-
-
-
rs13273672
0
1.000
intron (GATA4 NM_002052.3)
-
-
-
4
rs804280
317
0,297
intron (GATA4 NM_002052.3)
-
-
-
rs3729851
461
0,066
intron (GATA4 NM_002052.3)
-
-
-
rs4841588
1844
0,216
intron (GATA4 NM_002052.3)
-
-
-
rs3729856
2194
0,043
coding sequence (GATA4 NM_002052.3)
-
Ser377Gly
-
rs11785481
4761
0,071
3´-UTR (GATA4 NM_002052.3)
-
-
-
rs17153785
10484
0,034
intergenic
-
-
-
rs1065712
89741
0,021
3´-UTR (CTSB NM_001908.3, NM_147780.2)
-
-
-
Download