Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07 May 2012 Introduction CNVs are - variations in the # of copies of genomic regions - Can be insertions, deletions and duplications - have size ranging from > 1 Kb to Mbs CNV vs. SNPS CNV SNP Total Number 38,406 14,708,752 % of Reference Genome 29.74% <1% Introduction contd.. A B C D F B C D E Origin A Types A B C D D E A B Duplication C E Deletion Frequency Occurrence Polymorphism Phenotypic Variability Mutation Disease Susceptibility Introduction contd.. Consequence of CNVs Unmask recessive alleles Alter regulation Disrupt genes Cumulative effects Scherer et al. Nature Review Genetics 2006 Objectives: 1. To identify CNVs in diverse Indian populations 2. To map CNV regions with disease susceptibility 3. To study consequence of CNV in disease 4. To explore the role of CNV in Spinocerebellar Ataxia CNV & Diseases Proof -of-concept study APOBEC3b: insertion/ deletion polymorphism Cytidine deaminase family of proteins 29 kb insertion/deletion polymorphism Kidds et al. PLoS Genetics, 2007 Spectrum of APOBEC3B deletion frequency in Indian populations studied APOBEC3b insertion/deletion polymorphism & malaria endemicity Insertion deletion White - insertion Dark - deletion Significant association of APOBEC3b with falciparum malaria Malaria cohort Endemic Non-endemic Comparisons (Fisher's test) Genotypes Odds Ratio (95 % CI) P value Non-severe vs. control AB & AA 7.11 (3.20 to 15.97) 1x10-7 Severe vs. control AB & AA 8.13 (2.62 to 26.59) 1.7x10-5 Severe vs. non-severe AB & AA 1.14 (0.37 to 3.81) 0.8 Severe vs. control AB & AA 0.39 (0.16 to 0.93) 0.0211 Severe vs. control BB & AB 6.44 (1.76 to 24.99) 0.0012 Severe vs. control BB & (AA+AB) 3.17 (1.10 to 10.32) 0.0177 A - insertion allele B- deletion allele Insertion allele of APOBEC3B seems to be protective for malaria Positive Selection for APOBEC3B locus in Malaria 500 Kb upstream 5' 500 Kb downstream 3' APOBEC3B markers markers EHH and Haplotype Analysis Positive selection ??? Haplotype based analysis for larger linkage disequilibrium Endemic case Non-endemic case Endemic control Non-endemic control Selection for ABOPEC3B region has not been observed in malaria Schematic representation of APOBEC gene cluster and segmental duplication region Segmental duplication regions Due to large no. of segmental duplication regions in this locus selection for APOBEC3B was not observed Conclusions • Insertion allele of APOBEC3B seems to be protective for malaria • APOBEC3B locus has not Shown signature of positive selection by conventional methods may be due to high recombination events • Since this gene is expressed in liver & spleen this might provide a new mechanism of host protective response Identification of CNVs in the Indian population A basal Database Identification of large CNVs (>100k) in the Indian population : Methodology Sampling of IGV populations Affy 50k array (~58000 SNPs with av. inter-marker distance 50 kb) IE-N-LP5 TB-N-SP1 TB-N-IP1 IE-N-LP9 IE-N-LP1 IE-N-LP18 IE-N-IP2 IE -W-IP2 IE-N-SP4 IE-W-LP3 IE-N-LP10 IE-NE-IP1 TB-NE-LP1 IE-E-IP1 AA-C-IP5 IE-W-LP4 OG-W-IP Raw intensity files IE-NE-LP1 AA-NE-IP1 DR-C-IP2 IE-W-LP1 IE-E-LP2 IE-E-LP4 CNV calling and QC (Genotyping Console+SVS7) AA-E-IP3 IE-W-LP2 Cluster 1 DR-S-LP DR-S-LP Cluster 2 Cluster 3 Cluster 4 Cluster 5 Retrieve segments >100 kb length & minimum 10 probes using GConsole DR-S-LP3 477 samples, 26 populations Validation using Sequenom massARRAY QGE assay Results Instances of genomic segment prone to CNVs Raw CNV deletion = 70174 (<1Mb segment size) and 212 (>1Mb segment size) Raw CNV duplication = 73580 (<1Mb segment size) and 60 (>1Mb segment size) Total CNVRs deletions = 1425 Total CNVRs duplications = 1337 result contd.. Extent of CNVs in IGV populations Chromosomal landscape of common CNV regions in all the populations pooled together result contd.. Concordance of dataset using two independent algorithms GTC 3.0.2 1006 (11%) 5750 (65%) Deletion 2048 (23%) Duplication SVS 7 1515 (25%) 2986 (50%) Deletion 1461 (25%) Duplication ~ 60% of copy number variable regions showed deletion and duplication both Comparison using both the software shown 50% concordance prone to CNVs CNV Validation and Heterogeneity result contd.. Validation using Sequenom MassARRAY QGE Less validation due to heterogeneity in CNV boundaries Selection of probe for validation is a also key factor Deletion Amplification CNVs and Population Structure result contd.. TB populations and isolated Himalayan populations AA and DR isolated populations IE large populations Populations clustered according to genetic and linguistic affinity CNVs present in IGV map to genes that are associated with diseases SN 1 2 3 4 GENE_SYMBOL KDR IRF4 BRAF KCNE2 5 AGT,AGTR1 6 ADRB1 7 KRT6A 8 GTF2H5 9 10 11 12 PRSS2 IL23R ABCG5 HGD 13 PPM2C 14 A2M,APP 15 16 17 18 ATXN8OS ATXN1 PRKCH BFSP1 19 HTRA1 20 HMCN1 21 PTGDR,IL12B,HNMT,PTGER2 Disorder name Hemangioma, capillary infantile, somatic Multiple myeloma Adenocarcinoma of lung, somatic Atrial fibrillation, familial, Long QT syndrome-6 Hypertension, essential, Renal tubular dysgenesis Congestive heart failure, susceptibility to, Resting heart rate Pachyonychia congenita, JadassohnLewandowsky type Trichothiodystrophy, complementation group A, Pancreatitis, chronic Crohn disease Sitosterolemia Alkaptonuria Pyruvate dehydrogenase phosphatase deficiency Alzheimer disease, susceptibility to, Emphysema due to alpha-2-macroglobulin deficiency Spinocerebellar ataxia 8 Spinocerebellar ataxia-1 Cerebral infarction Cataract, cortical, juvenile-onset Macular degeneration, age-related, 7, Macular degeneration, age-related, neovascular type Macular degeneration, age-related, 1, Posterior column ataxia with retinitis pigmentosa Asthma Class Cancer Cancer Cancer Cardiovascular Cardiovascular Cardiovascular Dermatological Dermatological Gastrointestinal Gastrointestinal Metabolic Metabolic Metabolic Neurological Neurological Neurological Neurological Ophthamological Ophthamological Ophthamological Respiratory Conclusions Observed 0.05 % to 1.46% of genomic fraction per individual • A set of genes that are encompassed in CNVRs are novel and not reported in DGV (database of genomic variation). • Validation process of individual CNVs showed substantial heterogeneity in the boundaries of CNVs within a gene. • CNVs can be shared between genetically related populations • Basal data for genomic region prone to CNVs in Indian population • CNV regions predispose to many diseases in Indian populations. Role of CNVs as a genetic modifier in SCA12 phenotype Investigating the involvement of CNV in sub-phenotypes of SCA12 SCA12 Neuro-degenerative disorder CAG repeat expansion in 5’ UTR region of PPP2R2B gene Two distinct sub-phenotypes have been observed Tremor dominant Gait dominant Could CNV be involved???? Workflow of CNV Identification IE large populations SCA12 (CAG repeat in PPP2R2B) 10 index cases of Gait 14 index cases of Tremor Affymetrix 6.0 SNP array Data QC CNV calling (PennCNV) Gene Annotation Validation (RealTime method) Functional annotation clustering Copy number state distribution in SCA12 and IE population CN state Count in SCA12 Count in IE 0 987 389 1 2697 1226 3 257 465 4 158 257 Case control association analysis between gait and tremor groups Chr CNV end Sizes in Kb Genes 10582072 10582389 3.17 8 8 Non genic 1 4 2 0 0.017 2 Inf chr1 10560946 10564162 32.1 4 8 1 Non genic 6 1 1 1 0.004 4 25.144 2 GOLPH 3 0 5 0 0 0.004 8 Inf chr1 CNV start chr5 32142841 32208250 51 Gait Gait HT HT p Del Dup Del Dup value odds ratio (OR) Amplification of chr5p13.3 region in Gait Ataxia GOLPH3 amplification 5/8 of gait samples 0/14 of HT samples Real Time validation GOLPH3 (golgi phosphoprotein 3 (coat-protein)) A Golgi localized protein Have a regulatory role in Golgi trafficking Identified as potent oncogene modulates mTOR signaling Inhibition of mTOR induces autophagy and reduces toxicity of polyglutamine expansions in fly and mouse models of Huntington disease Brinda Ravikumar et al. Nature Genetics (2004) Autophagy induction reduces mutant ataxin-3 levels and toxicity in a mouse model of spinocerebellar ataxia type 3 Fiona M. Menzies et al. Brain (2009) Functional annotation clustering of genes under CNV specific to SCA12 Term GO; 0005216~ ion channel activity GO:0022838~substr ate specific channel activity GO:0015267~chann el activity GO:0022803~passiv e transmembrane transpore activity Count % P value Bonferron Benjamin Fold i i Enrichme nt 18 6.593 3.74E-05 0.0172 0.0172 3.2549 18 6.593 5.48E-05 0.0252 0.0084 3.1568 18 6.593 8.39E-05 0.0383 0.0097 3.0495 18 6.593 8.64E-05 0.0394 0.0080 3.0421 significant enrichment of ion channel activity processes in SCA12 A multigene enrichment analysis for dissection of biological system Biological process Molecular functions Cellular components CNV in ion channel genes and its involvement in different biological, molecular and cellular functions suggest physiological impairment in SCA12 Future direction Conclusions • Although SCA12 is a monogenic disorder, phenotypic variability could be due to other Genetic factors. • Amplification in GOLPH3 gene could be a modifier gene that leads to gait ataxia feature. • As Autophagy pathway is influenced by GOLPH3 through mTOR pathway that finally leads to Autophagolysis of inclusion bodies. • GOLPH3 could be good intervention molecule for SCA12 pathogenesis. • Ion channel genes and its implication in different neurological diseases, suggests physiochemical abnormalities in SCA12 Conclusion of my PhD work …………… “Any two individual genomes taken from nature, in any species, will have dozens to hundreds of differences in their total number of functional genes.” [Daniel R. Schrider and Matthew W. Hahn, Proc. R. Soc. B; 2010] In conclusion our genome is less static and CNVs could play an important role in dynamics of the genome that facilitates evolution, adaptation and selection in populations and diseases due to dosage effect of functional genes/regions. Publications Jha P, Sinha S, Kanchan K, Qidwai T, Narang A, Singh PK, Pati SS, Mohanty S, Mishra SK, Sharma SK, Awasthi S, Venkatesh V, Jain S, Basu A, Xu S; Indian Genome Variation Consortium, Mukerji M, Habib S. Deletion of the APOBEC3B gene strongly impacts susceptibility to falciparum malaria. Infect Genet Evol. 2012 Jan;12(1):142-8. Datta S, Chowdhury A, Ghosh M, Das K, Jha P, Colah R, Mukerji M, Majumder PP. A Genome-Wide Search for Non-UGT1A1 Markers Associated with Unconjugated Bilirubin Level Reveals Significant Association with a Polymorphic Marker Near a Gene of the Nucleoporin Family. Ann Hum Genet. 2012 Jan;76(1):33-41. Abhimanyu, Indian Genome variation consortium, Jha P and Mridula Bose. Footprints of genetic susceptibility to pulmonary tuberculosis: Cytokine gene variants in north Indians. Indian J Med Res., 2011 (accepted) Lall M, Thakur S, Puri R, Verma I, Mukerji M, Jha P. A 54 Mb 11qter duplication and 0.9 Mb 1q44 deletion in a child with laryngomalacia and agenesis of corpus callosum. Mol Cytogenet. 2011 Sep 21;4:19. Gautam P*, Jha P*, Kumar D, Tyagi S, Varma B, Dash D, Mukhopadhyay A; Indian Genome Variation Consortium, Mukerji M. Spectrum of large copy number variations in 26 diverse Indian populations: potential involvement in phenotypic diversity. Hum Genet. 2011 Jul 9. * Equal contributing authors. Ankita Narang*, Jha P*, Vimal Rawat, Arijit Mukhopadhayay, Debasis Dash, Analabha Basu, Mitali Mukerji. Recent admixture in an Indian population of African ancestry. Am. J. Hum. Genet. 2011 Jul 5. * Equal contributing authors. Jha P, Suri V, Sharma V, Singh G, Sharma MC, Pathak P, Chosdol K, Jha P, Suri A, Mahapatra AK, Kale SS, Sarkar C. IDH1 mutations in gliomas: First series from a tertiary care centre in India with comprehensive review of literature. Exp Mol Pathol. 2011 May 3;91(1):385-393. Abhimanyu, Jha P, Jain A, Arora K, Bose M. Genetic association study suggests a role for SP110 variants in lymph node tuberculosis but not pulmonary tuberculosis in north Indians. Hum Immunol. 2011 Apr 20. Abhimanyu, Mangangcha IR, Jha P, Arora K, Mukerji M, Banavaliker JN, Consortium IG, Brahmachari V, Bose M. Differential serum cytokine levels are associated with cytokine gene polymorphisms in north Indian populations with active pulmonary tuberculosis. Infect Genet Evol. 2011 Apr 1. Jha P, Suri V, Jain A, Sharma MC, Pathak P, Jha P, Srivastava A, Suri A, Gupta D, Chosdol K, Chattopadhyay P, Sarkar C. O6-methylguanine DNA methyltransferase gene promoter methylation status in gliomas and its correlation with other molecular alterations: first Indian report with review of challenges for use in customized treatment. Neurosurgery. 2010 Dec; 67(6):1681-91. Jha P, Jha P, Pathak P, Chosdol K, Suri V, Sharma MC, Kumar G, Singh M, Mahapatra AK, Sarkar C. TP53 polymorphisms in gliomas from Indian patients: Study of codon 72 genotype, rs1642785, rs1800370, and 16 base pair insertion in intron-3. Exp Mol Pathol. 2011 Apr;90(2):167-72. (2010) Nov 27. Aggarwal S, Negi S, Jha P, Singh PK, Stobdan T, Pasha MA, Ghosh S, Agrawal A; Indian Genome Variation Consortium, Prasher B, Mukerji M. EGLN1 involvement in high-altitude adaptation revealed through genetic analysis of extreme constitution types defined in Ayurveda. Proc Natl Acad Sci U S A. (2010) Nov 2;107(44):189616. HUGO Pan-Asian SNP Consortium, Mapping human genetic diversity in Asia. Science. (2009) Dec 11;326(5959):1541-5 Indian Genome Variation Consortium. Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet. (2008) Apr;87(1):3-20. Acknowledgements Qui ckTi me™ and a decompressor are needed to see this pictur e. Quick Time™ an d a d eco mp res sor ar e n eed ed to s ee this pic tur e. CSIR TCGA for Genotyping Facility Indian Genome Variation Consortium Thank you Extra slides Copy Number Variation in Indian Population 547 healthy individuals from26 Reference Population from Indian Genome Variation Consortium Affymetrix 50k Xba 240 array (raw intensity file) Genotype QC Reference Sample(30) Test Sample(447) ≥ 10 probes ≥ 100 kb segment CNV calling and QC (Genotyping Console+SVS7) Common CNV (> 5% of samples) Validation using Sequenom massARRAY QGE assay (a subset of 12 genes) Rare CNV (< 5% of samples) Functional Enrichment Analysis Mapping with Disease Associated regions Test for HWE Ins Homo Heterozygote Del Homo HWE test pvalue Endemic case 29 41 3 0.018 Endemic control 64 18 0 0.586 Non-endemic case 56 11 17 7.95 × 10-9 Non-endemic control 51 25 5 0.508 Too many heterozygote s Loss of too many heterozygote s HWD generally indicates some kind of natural selection, after data quality control for genotyping error and population stratification Future direction SCA12 modifier genes GOLPH3 mTOR Pathway AUTOPHAGY Amplification Induction of mTOR pathway Autophagy Inhibition Aggregate formation Neurodegeneration