1 SUPPLEMENTAL METHODS Analysis of expression quantitative trait loci (eQTL) at the GATA4 locus in silico eQTL mapping is used to identify genetic variants which affect the regulation of genes, considering genome-wide mRNA expression levels as quantitative phenotypes. Expression phenotypes are mapped to genome-wide SNPs. Some SNPs that influence the mRNA level of a gene are mapped to eQTLs which are in the proximal region of the gene, i.e. they are putatively cis active. Less often SNPs map to to distal, putatively trans active eQTLs42. The eQTL browser ( http://eqtl.uchicago.edu) summarizes eQTL loci from four recent large-scale eQTL studies.14,43-45 GATA4 amino acid position 377: test for functional constraint at the single amino acid site Amino-acid changing mutations which have a functional impact on a single amino acid site and affect protein function are constantly being removed from the gene pool. Nonsynonymous (amino acid changing, replacement) substitutions in an amino acid position lead to a change in the protein primary sequence and are more likely to have an influence on protein function than random synonymous (silent) mutations that leave the primary protein sequence unchanged. The degree of constraint on a single amino-acid site can be estimated from an alignment of the human protein sequence with that of diverse genetic model organisms. No occurrence of nonsynonymous substitutions in an alignment of diverse organisms indicates a functional constraint at the single amino acid site. In the GATA4 protein, serine (S) at amino acid position 377 (as well as the corresponding major human allele A on the nucleotide level), remained unchanged across a wide monophyletic group46 comprising human, rhesus, mouse, rat, rabbit, dog and cow, but is variable outside this clade (supplementary figure 2). Threonine (T) is present at amino acid position 377 in armadillo, tenrec, opossum; proline (P) is observed in the chicken and serine (S) in the frog. In contrast to the weak conservation of serine (S) at position 377, some invariant amino acid positions that are supposed 2 to be under stronger functional constraint are found nearby, e.g., tyrosine (Y) at position 374 (supplementary figure 2). Analysis of SNP localization in conserved splice sites The human GATA4 gene belongs to the 92–94% of human genes that undergo alternative splicing47; http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?exdb=AceView&db=36a&term=GATA 4). SNPs can influence splicing and can lead to changes in disease susceptibility48. To analyse rs13273672 and the 16 SNPs in complete LD (D´= 1.0; Supplementary table 1) for a possible influence on splicing, we searched if they are listed as SNPs in conserved splice sites in the Genome-Wide Splice-Site Single Nucleotide Polymorphism Database (http://variome.kobic.re.kr/ssSNPTarget/). None of these 17 SNPs was among the 151 entries in conserved splice sites listed for human chromosome 8.Supplementary figure 1. Linkage disequilibrium at the GATA4 locus, based on HapMapIII data. 3 Supplementary figure 2. Alignment of partial GATA4 (NM_002052) orthologous sequences of human and diverse genetic model organisms. Ser377Gly (grey, rs3729856 in dark grey) is codon 377 in the NM_002052 mRNA. Upper line: amino acid sequence; lower line: corresponding nucleotide sequence; in brackets: version of whole-genome sequence assembly of the respective organism. G GGC ↑ 372373374375376377378379380381382 S H Y G H S S S V S Q TCTCACTACGGGCACAGCAGCTCCGTGTCCCAG Rhesus(rheMac2) S H Y G H S S S V S Q TCTCACTACGGGCACAGCAGCTCCGTGTCCCAG Mouse (mm8) S H Y G H S S S M S Q TCTCACTATGGGCACAGCAGCTCCATGTCCCAG Rat (rn4) S H Y G H S S S M S Q TCTCACTATGGGCACAGCAGCTCCATGTCCCAG Rabbit(oryCun1) S H Y G H S S S M S Q TCTCACTATGGCCACAGCAGCTCCATGTCCCAG Dog (canFam2) S H Y G H S S S M S Q TCCCACTATGGGCACAGCAGCTCCATGTCCCAG Cow (bosTau2) S H Y G P S S S L S Q TCCCACTATGGGCCCAGCAGCTCCCTGTCGCAG Armadillo(dasNov1) S H Y G H T S P L S Q TCTCACTACGGGCACACCAGCCCCTTGTCCCAG Tenrec(echTel1) S H Y G H T S P M S Q TCTCACTATGGGCACACCAGCCCCATGTCCCAG Opossum(monDom4) S H Y G H T S P M S Q TCCCATTATGGACATACTAGCCCCATGTCTCAG Chicken(galGal2) S H Y G H P S P I S Q TCTCATTATGGGCACCCCAGCCCAATTTCTCAG Frog (xenTro1) P P Y G H S S S L S Q CCTCCATATGGCCATTCGAGTTCTCTATCTCAG Human(hg18) serine (S) codon AGC Supplementary table 1. SNPs in complete LD (D´=1) with rs13273672 in HapMapIII data, r2 values, their distance to rs13273672 and localization according to gene structure. Only SNPs with minor allele frequency >= 0.05 are regarded here. Annotation of functional features overlapping SNP positions. Marker Distance to rs13273672 r² Localisation rs7827193 15335 0,016 intron (GATA4 NM_002052.3) Expression quantitiative trait locus (eQTL) - Amino acid exchange Splice site SNP - - rs904006 8355 0,050 intron (GATA4 NM_002052.3) - - - rs10503425 6017 0,069 intron (GATA4 NM_002052.3) - - - rs11784693 1707 0,186 intron (GATA4 NM_002052.3) - - - rs11250164 1695 0,184 intron (GATA4 NM_002052.3) - - - rs804283 1338 0,184 intron (GATA4 NM_002052.3) - - - rs17153747 1028 0,043 intron (GATA4 NM_002052.3) - - - rs13262643 102 0,235 intron (GATA4 NM_002052.3) - - - rs13264774 66 0,295 intron (GATA4 NM_002052.3) - - - rs13273672 0 1.000 intron (GATA4 NM_002052.3) - - - 4 rs804280 317 0,297 intron (GATA4 NM_002052.3) - - - rs3729851 461 0,066 intron (GATA4 NM_002052.3) - - - rs4841588 1844 0,216 intron (GATA4 NM_002052.3) - - - rs3729856 2194 0,043 coding sequence (GATA4 NM_002052.3) - Ser377Gly - rs11785481 4761 0,071 3´-UTR (GATA4 NM_002052.3) - - - rs17153785 10484 0,034 intergenic - - - rs1065712 89741 0,021 3´-UTR (CTSB NM_001908.3, NM_147780.2) - - -