Supplementary Material (doc 45K)

advertisement
Supplementary Material
Positional cloning and next-generation sequencing identified a TGM6 mutation in a
large Chinese pedigree with acute myeloid leukaemia (AML)
Li-li Pan1, Yuan-mao Huang1, Min Wang1, Xiao-e Zhuang1, Dong-feng Luo1,
Shi-cheng Guo2, Zhi-shun Zhang3, Qing Huang1, Sheng-long Lin1, Shao-yuan Wang1
Li-li Pan and Yuan-mao Huang contributed equally to this work and should be
considered co-first authors.
Correspondence: Prof. Shaoyuan Wang, Fujian Institute of Hematology, Fujian
Provincial Key Laboratory on Hematology, Department of Hematology, Fujian
Medical University Union Hospital, 29 Xinquan Road, Fuzhou 350001, PR China.
E-mail: mdsy.wang@gmail.com
Materials and methods
Patients and materials
The criteria of “ members at potential preleukaemic phases” included: 1) A history of
easy bruising and/or fatigue; 2) Persistent unexplained cytopenia or neutrophil
leucocytosis; 3) Hypercellular bone-marrow with a mild increase in myeloblasts or
persistent mild dysplastic features; 4) Cytogenetic abnormalities. Be consistent with
the above items, but can’t be diagnosed of AML, CML, MDS or MPN. They were
re-examed at least twice and followed for several years.
Genotyping, linkage and haplotype analysis
Genome-wide linkage scans were performed on 13 members of the family (III-5, III-8,
III-13, III-14, III-15, IV-7, IV-10, IV-13, IV-18, IV-19,V-3 and the spouses of III-8 and
III-15) using the Affymetrix GeneChip Mapping Array 500K set. Non-Mendelian
error checking of genotypes and the generation of linkage format files from raw
Affymetrix array files were performed using ProgenyLab (Progeny, South Bend, IN).
MERLIN1 was used to further remove additional unlikely genotypes that were
consistent with potential genotyping errors. An in-house program was then used to
implement tag SNP selection based on the following filters: 1) NoCall,
Non-Mendelian and Mendelian error SNPs were removed; 2) alleles presenting an AA
or BB in all samples were also excluded; 3) minor allele frequencies below 0.01 were
excluded; and 4) the distance between two successive tags was required to be at least
0.5 cM to avoid linkage disequilibrium (LD). An additional step was performed after
the initial tag selection to further remove high-LD SNPs. We employed MERLIN
(v1.1.2) to perform multipoint linkage analysis using the non-parametric model and
dominant model, with a disease allele frequency of 0.0001 and penetrance of 0.99.
Targeted capture and 454 sequencing
We applied array-based sequence capture followed by 454 sequencing to
simultaneously analyse all genes in the region due to the large number of genes in the
linkage regions and the lack of obvious candidate genes. NimbleGen 385K
microarrays were produced to capture the critical region at 20p13 (7.8-13 cM) in two
affected individuals patients (III-15 and IV-19), including all exons, flanking intronic
sequences, untranslated regions (UTRs), microRNAs, and the highly conserved
regions of all candidate genes in this region. An additional 30 bp were added at each
end of all exons for the detection of splice-site mutations. Targets smaller than 250 bp
were enlarged by extending both ends of the region. Library construction was
completed according to the manufacturer’s instructions (GS FLX Titanium general
library preparation kit) (Roche 454 company, CT, USA).2 Pre- and post-captured
libraries were subjected to quantitative PCR to estimate the magnitude of enrichment.
Post-captured libraries were subjected to em-PCR (GS FLX Titanium LV or SV
emPCR Kits) and sequenced on the Genome Sequencer FLX platform (GS FLX
Titanium Sequencing Kit XLR70).
Whole exome sequencing (WES)
We also performed WES on the two patients plus a normal family member to fully
explore the candidate exonic mutations. Briefly, 15mcg of genomic DNA from each
sample was enriched for the target region of the consensus coding sequence (CCDS)
exons with NimbleGen 2.1M human exome array and subsequently sequenced on the
Illumina Hiseq2000 platform following the manufacturer’s instructions (Illumina,
CA).3 The raw data was mapped to the human genome reference sequence (hg19)
with Burrows-Wheeler Aligner (BWA).4 Single nucleotide variants (SNV) and short
Insertion/Deletion (InDels) were detected with SOAPsnp5 and SAMtools,6
respectively. After that, the low-quality variations were filtered out using the
following criteria: (i) quality score ≧20 (Q20); (ii) average copy number at the allele
site ≦2; (iii) distance of two adjacent SNPs ≧5bp; and (iv) sequencing depth ≧4
and ≦1000. Then we used ANNOVAR7 to annotate the confident variant results.
Variants within the linkage region at 20p13 were selected for downstream analysis.
SNPs in the dbSNP135 and 1000 genome project databases (2011) were removed.
And the remaining variants that were shared by the two patient samples but absent in
the normal control were selected and classified by genomic context: exonic, intronic,
intergenic, splicing, ncRNA, 5’UTR or 3’UTR.
Molecular modelling
The 3D molecular models of TGM6 were built using homology modelling. Templates
for modelling included human transglutaminase 3 (1L9M) and transglutaminase 2
(2Q3Z), which were downloaded from the RCSB database (http://www.rcsb.org/).
Templates and target sequences were aligned using Promals3D
8
under the default
settings. Molecular models were generated using the Modeller program (version 9.11;
released September 6, 2012) 9and viewed in PyMOL (http://pymol.sourceforge.net).
Supplementary Table 1. Clinical Presentation of Patients in the Chinese AML
Pedigree.
Supplementary Table 2. Sequencing data within the linkage region on 20p13. A=Ⅲ
15; B=Ⅳ19; C=Ⅲ15's wife; A+B=the common variants that were shared by patients
Ⅲ15 and Ⅳ19; A+B-C=variants present in A+B but absent in C.
Supplementary Table 3.
36 HCDiffs shared by the two patients in family.
Supplementary Table 4.
4 variants shared by the two patients in family.
Supplementary Table 5.
Information about the markers and genes located at
significant LOD score peak of chr20. LOD_mpt represents the score of multipoint
linkage analysis under the dominant model. Associated_Genes represent the genes
located at the position of the markers.
Supplementary Figure 1. LOD plots for chromosome 18. X-axis represents genetic
distance (cM), and y-axis represents the corresponding LOD score. LOD score peak
located within a region ranging from 91.46 to 97.06 cM (66127086-69342671) with
an average HLOD score of 1.56 (average p = 0.0074). The LOD score was
significantly higher in dominant model than non-parametric model, which supported a
dominant transmission of the disease in the family we studied.
Supplementary Figure 2. The 3D structure and β-barrel 1 domain of TGM6. A. The
3D structure of the compact, inactive form of TGM6 is shown using a ribbon model in
which the TG active site is buried. The NH2 and COOH termini are labelled. The four
domains (β-sandwich, residues 3-136; catalytic core, res. 137-462; β-barrel 1, res.
494-605; β-barrel 2, res. 606-706)
10,11
are depicted in different colours. The flexible
loop that connects the catalytic core with the β-barrel 1 domain is shown in blue. The
Leu517 residue is depicted in space-filling style, and the GDP-binding pocket is
shown by an arrow. B. The 3D structure of the extended, active form of TGM6 is
shown using a ribbon model in which the TG active site is exposed. C. The β-barrel 1
domain of TGM6. Residue L517 is shown in a sphere model. The GDP-binding
pocket that is inferred from the GDP-bound TG2 structure is shown in a
semitransparent surface model in which the key residues are depicted as stick models.
D. The β-barrel 1 domain of TGM6 in which the L517 residue is substituted by W517.
REFERENCES
1
Abecasis, G. R., Cherny, S. S., Cookson, W. O. & Cardon, L. R. Merlin--rapid
analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30,
97-101, doi:10.1038/ng786 (2002).
2
Rehman, A. U., Morell, R. J., Belyantseva, I. A. et al. Targeted capture and
next-generation sequencing identifies C9orf75, encoding taperin, as the
mutated gene in nonsyndromic deafness DFNB79. Am J Hum Genet 86,
378-388, doi:10.1016/j.ajhg.2010.01.030 (2010).
3
Wang, J. L., Yang, X., Xia, K. et al. TGM6 identified as a novel causative
gene of spinocerebellar ataxias using exome sequencing. Brain 133,
3510-3518, doi:10.1093/brain/awq323 (2010).
4
Sunyaev, S., Ramensky, V., Koch, I., Lathe, W., 3rd, Kondrashov, A. S. &
Bork, P. Prediction of deleterious human alleles. Hum Mol Genet 10, 591-597
(2001).
5
Li, R., Li, Y., Fang, X., Yang, H., Wang, J. & Kristiansen, K. SNP detection for
massively parallel whole-genome resequencing. Genome Res 19, 1124-1132,
doi:10.1101/gr.088013.108 (2009).
6
Li, H., Handsaker, B., Wysoker, A. et al. The Sequence Alignment/Map format
and
SAMtools.
Bioinformatics
25,
2078-2079,
doi:10.1093/bioinformatics/btp352 (2009).
7
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of
genetic variants from high-throughput sequencing data. Nucleic Acids Res 38,
e164, doi:10.1093/nar/gkq603 (2010).
8
Pei, J., Kim, B. H. & Grishin, N. V. PROMALS3D: a tool for multiple protein
sequence and structure alignments. Nucleic Acids Res 36, 2295-2300,
doi:10.1093/nar/gkn072 (2008).
9
Eswar, N., Webb, B., Marti-Renom, M. A. et al. Comparative protein structure
modeling using MODELLER. Current protocols in protein science / editorial
board,
John
E.
Coligan
...
[et
al.]
Chapter
2,
Unit
2
9,
doi:10.1002/0471140864.ps0209s50 (2007).
10
Iismaa, S. E., Mearns, B. M., Lorand, L. & Graham, R. M. Transglutaminases
and disease: lessons from genetically engineered mouse models and inherited
disorders. Physiol Rev 89, 991-1023, doi:10.1152/physrev.00044.2008 (2009).
11
Thomas, H., Beck, K., Adamczyk, M. et al. Transglutaminase 6: a protein
associated with central nervous system development and motor function.
Amino Acids, doi:10.1007/s00726-011-1091-z (2011).
Download