International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 154 Bioinformatics based confirmatory test for identification of Disease Putative Genes* BIKRAM NAYAK (Author) Email: vn.jobdb@gmail.com ABSTRACT In this paper ,several bioinformatics based approaches and methodologies are deployed to get confirmatory classification on genes of mouse chromosome 11 in the region from 69mb to 104Mb for lethality or viability, duplicacy or Singleton character and how their location determine their properties. DL genes found within these above mentioned regions are AK144590, AL591436, X63190, DQ832277, AL591436, X51983, X07750, X07751, X07752, BC046795, AL590963, CH466556, AK078233, AL590963, CH466556, AK078233, AL590963, CH466556 . The gene id having MGI ID 2448712 is not available in genetrap nor the 7 genes AF465352,AK039558,AK170258,BC052502,BC052734,CH466596,AL845465 having GO ID of 005737 are available in GO ontology.so these are disease unknown gene. IJOART No Matching Record for MGI:2137026 is also available at genetrap. But the go ID 005887 are available in plasma membrane so these are grouped in disease viable gene and they have very few or less than 1 or 2 edges available at PPI network. The max binding protein having id no 109150 starting at 74644422 and ending at 74659227 and entrez gene id of 17428 is purely a disease lethal gene as it’s go ontology id 0005634 suggest that it is located at nucleus and having tumerigenic property and listed as adenocarcinoma at MeSH dictionary and there are more than 5 edge connected to different hub. All genes AK144590,AL591436,X63190,DQ832277,AL591436,X51983,X07750,X07751, X07752,BC046795,AL590963,CH466556,AK078233,AL590963,CH466556,AK078233,AL590963, CH466556 having gene starting position and ending position is megablasted against Human/mouse and by freeing an e value (1020 ) highlighted duplicacy from human/mouse. Keywords : mutagenesis, duplicacy,lethality,viability, PPi Network,MeSH dictionary,e value, fdr 1 INTRODUCTION Generally there are two types of genes. Essential disease and non essential disease genes. An Essential gene is one that is necessary for the organism’s survival. Essential disease genes are those gene if the knockout of its mouse orthologs confers lethality and non essential disease genes are those genes where a mouse knock out is viable at birth and if there is no available data found at mouse knock out data that is treated as disease unknown gene. So these essential disease genes are termed as Disease Lethal and non essential disease gene as disease viable gene. 2. Procedure:Tools and methods for accessing wet lab datasets:When we collected the mouse gene from the following url (http://www.mouse-genome.bcm.tmc.edu/Bioinformatics/MouseGeneSearchList2.asp) by filling the submission form we got both 793 known and unknown locus starting with gene id O08826 to Mapt gene. Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 155 Materials and Methods: The human–mouse orthology and protein coding genes data from Mouse Genome Informatics (http://www.informatics.jax.org) are obtained from biomart product.Biomart is a simple and robust data integration system for large scale data querying and warehouse data extraction server..These data are an appropriate proxy for gene essentiality in humans and are herein mentioned as viable and lethal. http://biomart.informatics.jax.org/biomart/martview/28d343acfd5d3bf0896340a4965d54a9 If a gene id is same and we got 4 different transcript factor id .it was assume that it has 4 predicted transcript sites in its gene. Dataset Mus musculus genes (NCBIM37) Filters Chromosome: 11 Gene Start (bp): 69000000 Gene End (bp): 104000000 with EMBL ID(s): Only Ensembl Gene ID(s): [ID-list specified] Gene type : protein_coding Source : ensembl Status (gene) : KNOWN Evidence code (GO Cellular component) : IC Orthologous Human Genes: Only Attributes Ensembl Gene ID Ensembl Transcript ID Chromosome Name Ensembl Protein ID Gene Start (bp) Gene End (bp) GO Term Accession EntrezGene ID EMBL (Genbank) ID MGI ID IJOART After getting all the dataset ,all the genes were evaluated & analyzed according to the following functional parameters. a. Cellular Localization/Function b. Biological Function c. Physiological Function d. Protein protein interaction e. Mode of inheritance f. Evolutionary history/gene age g. Singleton/duplication event Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 156 1. Disease genes localise to different cellular compartments. Disease viable and disease lethal genes vary in the cellular compartments to which they are localised .DL genes are highly available in the nucleus. But DV genes are enriched for localisation to the plasma membrane. & in the extracellular region. That’s why DL genes show a greater number of PPIs due to their higher probability of localisation within the nucleus. Eg. All 18 genes AK144590,AL591436,X63190,DQ832277,AL591436,X51983,X07750,X07751, X07752,BC046795,AL590963,CH466556,AK078233,AL590963,CH466556,AK078233,AL590963, CH466556 having go Id 0005634 are present in nucleus and suggested as DL genes while gene bearing GO ID 0005887 are present in plasma membrane and the rest of the gene are disease unknown. IJOART 2. Disease viable and disease lethal genes perform different Biological Functions As we all know that the function of a protein is fully dependent on its cellular localisation. for example transcription factors must be present in the nucleus to activate gene expression. GO annotations suggest essential genes localise to the nucleus, DL are enriched for nucleic acid binding when compared to all genes.for ex from our biomart output gene MGI ID of 2150020 has nucleotide binding property. DV genes are enriched for calcium binding refers itself a role in signal transduction, DV are over-represented in signal transduction functions along with hydrolase activity than any of the other metabolic function categories indicates that hydrolase activity is a specific feature of DV genes. But DL genes are enriched for involvement in embryonic development as suggested by biological process annotations. Eg. As found in genetrap column/GO database Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 157 IJOART [Disease Lethal and Disease Viable gene involved in Biological Processes] [Differentiation on the basis of Molecular Function of Disease Lethal and Disease Viable gene] 3.Disease viable and disease lethal genes perform different Physiological Functions Disease symptoms generally show an irregular element in particular organ systems or physiological processes. Disease lethal genes are statistically over-represented for expression and behaves/work as an cancerous gene directly affecting cell growth and death mechanisms. DL genes have a higher representation in skeletal Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 158 disorders, bone disease. This difference may be due to essential genes being involved in developmental patterning of the body axis, and skeletal system, but not being involved in bone metabolism. However, the DV are also associated with some diseases but differ from DL gene .They are involve in nutritional, psychiatric and neurological disorders. And also are enriched in psychiatric and immune system diseases, but under-represented among cardiovascular diseases. 4. Protein protein interaction network distinguish Disease Lethal and Disease viable gene. Disease lethal genes are highly connected in Protein Protein network while disease viable genes have fewer connections. To create the human protein-protein interaction network, data were derived from BioGRID, BIND,HPRD, GeneRIF from ncbi and properly viewed and analyzed in “Cytoscape” and “Navigator” tool for portein protein interaction network and number of hubs and hub-hub connections in the network are recorded. IJOART Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 159 IJOART [Disease lethal gene cluster as graphically represented at PPI network from Cytoscape/netscape] Above graphical representation of PPI network taken from the various java based plugin of cytoscape/netscape suggest that Diseased Lethal groups have more complex networks than DV genes, with more interactions, few fragmentations and the rate of edge is more in the highest hub when interacted from the same datasets. Below is the interaction map of DV genes separated from DL group. Here the DV gene is less interconnected, more fragmented and the rate of edge is less/null in the hub. Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 160 IJOART [ DV genes are viewed in PPI network in Cytoscape/netscape plugin ] 5. Functional parameters:-Mode of Inheritance Disease lethal genes express a dominant mode of inheritance than DV. Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 161 IJOART It was observed that the DL gene set showed a higher proportion of autosomal dominant mutations While DV genes are overrepresented for autosomal recessive inheritance. 6. Disease genes vary in their evolutionary history/gene age/phylogenetic distance. The DL genes would have the oldest evolutionary history. Using reference genomes representative of each taxonomy category, orthologs are identified for all disease genes, representing the earliest ancestor gene for each human gene. The taxonomy categories are distributed according to evolutionary distance, with H. sapiens as the closest and Fungi/Metazoa as the category with the most distant evolutionary origin. DL genes show a higher frequency of orthologs originating in the most distant Fungi/Metazoa or Bilateria classes. As compared to all human genes, the DL genes have a much higher proportion with the oldest ancestor in the chordata class or earlier. However, DV have a higher proportion of genes with the oldest ortholog originating in one of the evolutionarily more recent categories: Tetrapoda, Amniota, Mammalia, Theria and Eutheria. When compared to all human genes, all genes in our annotated categories do not have the oldest ortholog arising in the most recent evolutionary lineages, such as Euarchontoglires, Primates, Catarrhini, Hominidae, Hominanae, or Homo sapiens .so finally DL genes have a more ancient evolutionary origin, and a greater number of orthologs, than the other gene classes analysed. Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 162 IJOART Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 163 IJOART Gene age: A lot of behavioral study and investigation in the context of essentiality is going on to provide insights for candidate gene analysis to identify new disease loci. One such eg is on gene age that was measured using the phylogenetic breadth of the distribution of homologous genes among different lineages.ex. old genes are those that are present in more distantly related species whereas young genes are those that are present only in the closely related species like chimpanzee and macaque. Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 164 IJOART [The comparison sheet of the gene age of Human beings and other lower ordered organisms ] When all the gene with MGI accession are again searched at genetrap database a lot of phenotypic characters having accession id of mammalian phenotypic browser,their orthology counterparts,ontological evidence on their location,omim relation etc are arranged. For example mgi:99423 gene suggests it’s expression is associated with tumor cell invasion and metastasis and go ic suffix GO:0005634 [nucleus] evidence: IC suggest that it is disease lethal gene. MGI:2150020 gene also show protein coding biotype.it has also gene tree in newwick format (((((((((((((ENSSTOP00000000315_Stri_:0.0330, ENSDORP00000000165_Dord_:0.1161):0.0048, ((ENSMICP00000005418_Mmur_:0.0116, ENSOGAP00000007601_Ogar_:0.0509):0.0047, ENSTBEP00000013197_Tbel_:0.0487):0.0053):0.0000, ((((((ENSPTRP00000051614_Ptro_:0.0000, ENSP00000372088_Hsap_:0.0013):0.0171, ENSGGOP00000013922_Ggor_:0.0506):0.0000, (ENSMMUP00000033829_Mmul_:0.0094, ENSCJAP00000038748_Cjac_:0.0157):0.0140):0.1059, ENSPPYP00000007198_Ppyg_:0.0000):0.0327, ENSTSYP00000000841_Tsyr_:0.0302):0.0016, ((((ENSDNOP00000013522_Dnov_:0.0076, ENSCHOP00000000528_Chof_:0.0292):0.0051, ((ENSPCAP00000000302_Pcap_:0.0307, ENSLAFP00000007349_Lafr_:0.0346):0.0072, ENSETEP00000002471_Etel_:0.0494):0.0072):0.0076, (((((((ENSSSCP00000005138_Sscr_:0.0000, ENSSSCP00000005140_Sscr_:0.0749):0.0251, ENSBTAP00000003788_Btau_:0.0210):0.0032, ENSTTRP00000010038_Ttru_:0.0127):0.0048, ENSCAFP00000013557_Cfam_:0.0348):0.0000, Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 165 (((ENSMLUP00000003893_Mluc_:0.0278, ENSPVAP00000001739_Pvam_:0.0280):0.0000, ENSFCAP00000013649_Fcat_:0.0830):0.0022, ENSVPAP00000003573_Vpac_:0.0463):0.0000):0.0000, ENSSARP00000002591_Sara_:0.0708):0.0018, ENSECAP00000015324_Ecab_:0.0119):0.0147):0.0057, ((ENSOCUP00000014527_Ocun_:0.0243, ENSOPRP00000014620_Opri_:0.0500):0.0179, ENSCPOP00000012705_Cpor_:0.0458):0.0069):0.0024):0.0063):0.0153, ((ENSRNOP00000053270_Rnor_:0.0305, ENSRNOP00000041615_Rnor_:0.1258):0.0000, ENSMUSP00000028795_Mmus_:0.0332):0.0374):0.0592, (ENSMEUP00000013631_Meug_:0.0268, ENSMODP00000000325_Mdom_:0.1465):0.0753):0.0737, ENSOANP00000014708_Oana_:0.2186):0.0000, (((ENSGALP00000032107_Ggal_:0.0011, ENSMGAP00000002659_Mgal_:0.0296):0.0878, ENSTGUP00000007594_Tgut_:0.1159):0.0332, ENSACAP00000013598_Acar_:0.1269):0.0290):0.0631, ENSXETP00000034829_Xtro_:0.1406):0.0470, (((ENSTRUP00000010366_Trub_:0.0517, ENSTNIP00000022389_Tnig_:0.0569):0.0251, (ENSGACP00000016197_Gacu_:0.0667, ENSORLP00000022299_Olat_:0.0997):0.0455):0.0658, (ENSDARP00000060705_Drer_:0.0000, ENSDARP00000102254_Drer_:0.2574):0.1346):0.0524):0.1086, ENSCSAVP00000002327_Csav_:0.3665):0.0401, FBpp0084956_Dmel_:0.3818):0.0988, ((((((ENSDARP00000053846_Drer_:0.2500, ENSGACP00000006597_Gacu_:0.4557):0.0000, ((((((((ENSSTOP00000014167_Stri_:0.0520, ENSCPOP00000009649_Cpor_:0.0895):0.0076, ENSTBEP00000008426_Tbel_:0.0614):0.0024, (ENSPCAP00000012864_Pcap_:0.0390, ENSLAFP00000012636_Lafr_:0.0471):0.0285):0.0035, (((((ENSBTAP00000025252_Btau_:0.0220, ENSSSCP00000002496_Sscr_:0.0599):0.0149, ENSECAP00000011815_Ecab_:0.0389):0.0045, (ENSCAFP00000024255_Cfam_:0.0519, ENSEEUP00000001317_Eeur_:0.0934):0.0066):0.0000, (((ENSFCAP00000007091_Fcat_:0.0588, ENSSARP00000001222_Sara_:0.1050):0.0077, ENSPVAP00000011016_Pvam_:0.0468):0.0030, (ENSTTRP00000014685_Ttru_:0.0213, ENSVPAP00000009741_Vpac_:0.0445):0.0107):0.0014):0.0183, ENSMICP00000009918_Mmur_:0.0348):0.0065):0.0000, ((((((((ENSPTRP00000039267_Ptro_:0.0000, ENSP00000419881_Hsap_:0.0000):0.0026, ENSPPYP00000006740_Ppyg_:0.0091):0.0026, ENSMMUP00000020875_Mmul_:0.0092):0.0108, ENSCJAP00000027894_Cjac_:0.0198):0.0241, ENSETEP00000008672_Etel_:0.1043):0.0017, (ENSDORP00000010143_Dord_:0.1035, ENSOPRP00000004397_Opri_:0.1484):0.0086):0.0000, (((ENSMUSP00000078490_Mmus_:0.0478, ENSRNOP00000016459_Rnor_:0.0548):0.1360, ENSOCUP00000000520_Ocun_:0.1224):0.0242, ENSTSYP00000006542_Tsyr_:0.0439):0.0022):0.0023, ENSMLUP00000000626_Mluc_:0.0776):0.0000):0.0078, IJOART Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 166 ENSCHOP00000012287_Chof_:0.0365):0.0795, ENSMODP00000012669_Mdom_:0.1692):0.0647, (((ENSMGAP00000012489_Mgal_:0.0076, ENSGALP00000015437_Ggal_:0.0793):0.0629, ENSTGUP00000011847_Tgut_:0.0781):0.1183, ENSACAP00000005463_Acar_:0.1986):0.0749):0.3462):1.0094, (((((ENSTNIP00000013172_Tnig_:0.0561, ENSTRUP00000026857_Trub_:0.1996):0.0783, ENSGACP00000027559_Gacu_:0.1800):0.0383, ENSORLP00000019089_Olat_:0.1402):0.1668, ENSDARP00000029740_Drer_:0.2887):0.1059, ((((ENSTGUP00000017411_Tgut_:0.0041, ENSTGUP00000005057_Tgut_:0.0070):0.1025, (ENSGALP00000003458_Ggal_:0.0182, ENSMGAP00000004024_Mgal_:0.1614):0.1141):0.1267, (((((((((ENSMUSP00000018985_Mmus_:0.0262, ENSRNOP00000036257_Rnor_:0.0300):0.1150, ENSCPOP00000017272_Cpor_:0.0890):0.0204, ((((ENSGGOP00000000358_Ggor_:0.0019, ENSP00000378090_Hsap_:0.0551):0.0194, ENSPTRP00000015369_Ptro_:0.0057):0.0109, ENSPPYP00000009193_Ppyg_:0.0097):0.0086, ENSCJAP00000027151_Cjac_:0.0261):0.0390):0.0065, ENSVPAP00000011247_Vpac_:0.1432):0.0005, (((ENSSTOP00000005497_Stri_:0.0783, ENSOCUP00000012289_Ocun_:0.1399):0.0040, ((ENSECAP00000012028_Ecab_:0.0522, ENSMLUP00000011374_Mluc_:0.0698):0.0053, ((ENSFCAP00000000230_Fcat_:0.0271, ENSCAFP00000027051_Cfam_:0.0436):0.0358, ENSBTAP00000025404_Btau_:0.0815):0.0060):0.0086):0.0017, ((ENSEEUP00000001339_Eeur_:0.1162, ENSSARP00000000796_Sara_:0.1880):0.0030, (ENSOGAP00000008835_Ogar_:0.0559, ENSOPRP00000010741_Opri_:0.1132):0.0216):0.0042):0.0016):0.0008, ((((ENSTBEP00000011110_Tbel_:0.0687, ENSMICP00000001136_Mmur_:0.0808):0.0253, ENSPVAP00000003694_Pvam_:0.0811):0.0023, ENSTSYP00000002243_Tsyr_:0.1576):0.0037, ENSTTRP00000014683_Ttru_:0.0614):0.0281):0.0276, ENSDNOP00000003927_Dnov_:0.0568):0.0029, ((ENSLAFP00000015357_Lafr_:0.0294, ENSPCAP00000006649_Pcap_:0.0888):0.0099, ENSETEP00000009798_Etel_:0.1368):0.0158):0.1028, (ENSMODP00000023889_Mdom_:0.0358, ENSMEUP00000007724_Meug_:0.0805):0.1221):0.1544):0.0638, ENSXETP00000019650_Xtro_:0.5555):0.1321):1.5927):0.3322, ((((((ENSMODP00000018294_Mdom_:0.0380, ENSMEUP00000006550_Meug_:0.1082):0.0896, (((ENSOGAP00000004411_Ogar_:0.0574, ENSOPRP00000000782_Opri_:0.0755):0.0076, ENSCHOP00000006029_Chof_:0.0557):0.0000, (((((((ENSP00000336701_Hsap_:0.0000, ENSPTRP00000016062_Ptro_:0.0039):0.0027, ENSPPYP00000009244_Ppyg_:0.0039):0.0084, ENSMMUP00000011981_Mmul_:0.0051):0.0192, ENSTSYP00000008137_Tsyr_:0.0329):0.0024, ((((ENSECAP00000003008_Ecab_:0.0270, ENSMLUP00000005665_Mluc_:0.0469):0.0012, IJOART Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 167 (ENSTTRP00000016289_Ttru_:0.0202, ENSSSCP00000018702_Sscr_:0.0425):0.0023):0.0000, (((ENSVPAP00000000334_Vpac_:0.0307, ENSBTAP00000020012_Btau_:0.0327):0.0072, ENSPVAP00000014496_Pvam_:0.0226):0.0016, ENSCAFP00000025883_Cfam_:0.0391):0.0027):0.0048, (ENSLAFP00000002822_Lafr_:0.0493, ENSDNOP00000003626_Dnov_:0.0580):0.0140):0.0069):0.0000, (ENSDORP00000001895_Dord_:0.0672, ENSTBEP00000013404_Tbel_:0.0695):0.0185):0.0000, (((((ENSCJAP00000011941_Cjac_:0.0020, ENSCJAP00000020026_Cjac_:0.0040):0.0058, ENSCJAP00000035050_Cjac_:0.0079):0.0266, ENSMICP00000014682_Mmur_:0.0356):0.0000, ENSEEUP00000001708_Eeur_:0.1161):0.0000, (((ENSOCUP00000001958_Ocun_:0.0000, ENSOCUP00000017133_Ocun_:0.0196):0.0597, ENSPCAP00000002180_Pcap_:0.2204):0.0000, ((((ENSRNOP00000008846_Rnor_:0.0309, ENSMUSP00000007790_Mmus_:0.1641):0.0791, ENSSTOP00000002592_Stri_:0.0494):0.0000, ENSCPOP00000004133_Cpor_:0.0808):0.0120, ENSETEP00000001932_Etel_:0.0886):0.0046):0.0027):0.0000):0.0044):0.1051):0.0621, ENSOANP00000013505_Oana_:0.1106):0.0631, (((ENSGALP00000038538_Ggal_:0.0067, ENSMGAP00000007189_Mgal_:0.0264):0.0699, (ENSTGUP00000007629_Tgut_:0.0050, ENSTGUP00000015360_Tgut_:0.0138):0.0967):0.0858, ENSACAP00000010810_Acar_:0.2435):0.0733):0.0699, ENSXETP00000002461_Xtro_:0.3031):0.1191, ((((ENSTRUP00000004854_Trub_:0.0975, ENSTNIP00000018013_Tnig_:0.1402):0.1259, ENSORLP00000024828_Olat_:0.2694):0.0330, ENSGACP00000000375_Gacu_:0.0683):0.2500, ENSDARP00000090614_Drer_:0.2005):0.1508):0.6278):0.2354, FBpp0084486_Dmel_:1.9909):1.2160, Y43C5A.6a_Cele_:0.3502):0.2385):0.0000, YER095W_Scer_:0.4878):0.0000; IJOART Or after the clustalw program if we prepare the cladogram sheet with distance node the gene age,inheritance would be measured and we could easily isolate the dl genes from dv. http://www.ensembl.org/Mus_musculus/Gene/Compara_Ortholog?db=core;g=ENSMUSG00000007646 http://www.ensembl.org/Mus_musculus/Gene/Compara_Tree?db=core;g=ENSMUSG00000007646 Below is the graphical format of it’s gene tree. Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 168 IJOART Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 169 IJOART Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 170 IJOART Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 171 IJOART Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 172 IJOART Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 173 IJOART Duplication and retention To examine the gene duplication events in our categories of disease genes, we have used similarity methods to identify paralogs of all human disease genes. The proportion of genes with paralogs, or duplicates, was analysed for each gene category. All the DL genes are much more likely to be duplicates . while DV genes are singletons. The high proportion of singleton genes in the DV class suggests a difference in retention in the human genome following whole genome duplications for these genes, with many duplicates or paralogs being lost after duplication. Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 174 Duplicate and singleton identification method Sequences are retrieved from biomart from Ensembl. BLAT v.32 was used for sequence similarity search. Freeing an evalue threshold of 1020 was used to identify duplicates and singleton genes. All genes AK144590,AL591436,X63190,DQ832277,AL591436,X51983,X07750,X07751, X07752,BC046795,AL590963,CH466556,AK078233,AL590963,CH466556,AK078233,AL590963, CH466556 having gene starting position and ending position is megablasted against Human/mouse and by freeing an e value 10 pow(20) highlights duplicate gene from human/mouse. IJOART [ Figure represents the frequency of duplicate/singleton nature of DL/DV ] MGI:2150020 Chr.11(-): 87190152-87218268 [NCBI37] Entrez Gene114714 Chr.11(-): 87190146-87217940 [NCBI37] GO:0000166 [nucleotide binding] evidence: IEA GO:0048476 [Holliday junction resolvase complex] evidence: IC lethality-prenatal/perinatal mgi 98742 not found suggest that it is a disease unknown gene according to genetrap. But it’s nucleotide binding property suggest it as Disease lethal gene. Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 175 7.1 Appendices: Below are all the mouse mutagenesis for development defect from the 69mb to 104mb of chromosome 11 of the Mus Musculus. • Category craniofacial eye fertility growth lethal Mutant 30 3 8 80 89 Category metabolism neurological skeletal skin and coat undefined Mutant 10 76 32 31 3 Category urogenital Mutant 1 CRANIOFACIAL: MGI Accession • Phenotype crf m02Jus craniofacial, affected testers are smaller, have shorter snouts. MGI:2671741. crf m05Jus craniofacial, patchy hair loss. MGI:2671742. crfm06Jus MGI:2671743. crf m08Jus craniofacial, testers have a subtle short snout phenotype. MGI:2671832. crf m18Jus smaller head, short snout, not completely penetrant. MGI:3046702. crfm26Jus testers are smaller, hydrocephalous, do not live very long past 8 or 12 weeks. MGI Accession Lab Name Phenotype MGI:2671711. infm02Jus male infertility, low sperm count, normal morphology. ref. clark et. al., biology of reproduction 70, 1317-1324, 2004.. MGI:2671710. infm03Jus female infertility. ref. clark et. al., biology of reproduction 70, 1317-1324, 2004.. MGI:2671707. inf m04Jus male infertility, low sperm count, not motile, unusual morphology. ref. clark et. al., biology of reproduction 70, 13171324, 2004. also in the same complementation group as inf08 and inf09.. MGI:2671706. inf m05Jus male infertility, ref. clark et. al., biology of reproduction 70, 1317-1324, 2004.. MGI:2671699. infm07Jus female infertility, ref. clark et. al., biology of reproduction 70, 1317-1324, 2004.. MGI:2671697. infm08Jus in the same complementation group as inf04 and inf09. ref. clark et. al., biology of reproduction 70, 1317-1324, 2004.. MGI:2671691. infm09Jus in the same complementation group as inf04 and inf08. ref. clark et. al., biology of reproduction 70, 1317-1324, 2004.. Lab Name Phenotype MGI:2671740. • Lab Name FERTILITY: craniofacial, testers have shorter faces and are smaller than carrier siblings. IJOART GROWTH: MGI Accession Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 • 176 m01Jus small animals seen in small litters of 5/4 pups. small runs about 7-9 gm, while other pups are 13-16gm size.. gro m22Jus affected testers 3/4 size of normal littermates, low serum cholesterol. MGI:2671721. grom40Jus testers are 1/2 size of unaffected siblings. MGI:2671722. grom41Jus 2 of 5 testers 3/4 size of carrier sibs, appears to be on chromosome 11 but not completely penetrant. MGI:2671723. grom42Jus testers small, 3/4 size of carrier siblings. MGI:3046714. m79Jus MGI:2671718. gro MGI:2671720. gro small, 3/4, 10gm vs 16gm at n1f1. LETHAL: MGI Accession Lab Name Time of Death MGI:2671871. l11Jus01 5.5 - 8.5 dpc. MGI:2671872. l11Jus02 5.5 - 8.5 dpc. MGI:2671873. l11Jus03 5.5 - 8.5 dpc. MGI:2671874. l11Jus04 5.5 - 8.5 dpc. MGI:2671876. l11Jus05 9.5 - 12.5 dpc. MGI:2671877. l11Jus06 9.5 - 12.5 dpc. MGI:2671878. l11Jus07 5.5 - 8.5 dpc. MGI:2671879. l11Jus08 9.5 - 12.5 dpc. MGI:2671880. l11Jus09 9.5 - 12.5 dpc. MGI:2671881. l11Jus10 peri-natal lethal. MGI:2671882. l11Jus11 5.5 - 8.5 dpc. MGI:2671883. l11Jus12 5.5 - 8.5 dpc. MGI:2671884. l11Jus13 peri -natal lethal. MGI:2671885. l11Jus14 9.5 - 12.5 dpc. MGI:2671886. l11Jus15 13.5 - 18.5 dpc. MGI:2671887. l11Jus16 peri-natal lethal. MGI:2671888. l11Jus17 9.5 - 12.5 dpc. MGI:2671889. l11Jus18 9.5 - 12.5 dpc. MGI:2671890. l11Jus19 9.5 - 12.5 dpc. MGI:2671891. l11Jus20 9.5 - 12.5 dpc. MGI:2671892. l11Jus21 peri-natal lethal. MGI:2671893. l11Jus22 peri-natal lethal. MGI:2671894. l11Jus23 peri-natal lethal. MGI:2671896. l11Jus24 peri-natal lethal. MGI:2671897. l11Jus25 post-natal lethal. MGI:2671898. l11Jus26 post-natal lethal. MGI:2671899. l11Jus27 9.5 - 12.5 dpc. MGI:2671900. l11Jus28 9.5 - 12.5 dpc. MGI:2671901. l11Jus29 13.5 - 18.5 dpc. MGI:2671902. l11Jus30 peri-natal lethal. MGI:2671903. l11Jus31 peri-natal lethal. MGI:2671904. l11Jus32 peri-natal lethal. IJOART Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 177 MGI:2671906. l11Jus33 peri-natal lethal. MGI:2671907. l11Jus34 5.5 - 8.5 dpc. MGI:2671908. l11Jus35 5.5 - 8.5 dpc. MGI:2671909. l11Jus36 9.5 - 12.5 dpc. MGI:2671910. l11Jus37 9.5 - 12.5 dpc. MGI:2671911. l11Jus38 5.5 - 8.5 dpc. MGI:2671912. l11Jus39 9.5 - 12.5 dpc. MGI:2671913. l11Jus40 peri-natal lethal. MGI:2671914. l11Jus41 9.5 - 12.5 dpc. MGI:2671915. l11Jus42 5.5 - 8.5 dpc. MGI:2671916. l11Jus43 peri-natal lethal. MGI:2671917. l11Jus44 peri-natal lethal. MGI:2671918. l11Jus45 9.5 - 12.5 dpc. MGI:2671919. l11Jus46 9.5 - 12.5 dpc. MGI:2671920. l11Jus47 9.5 - 12.5 dpc. MGI:2671921. l11Jus48 5.5 - 8.5 dpc. MGI:3034009. l11Jus49 5.5 - 8.5 dpc. MGI:3034010. l11Jus50 after 12.5 dpc. MGI:2671922. l11Jus51 Post-natal lethal. MGI:2671923. l11Jus52 Post-natal lethal. MGI:2671924. l11Jus53 post-natal lethal. MGI:2671925. l11Jus54 post-natal lethal. MGI:2671926. l11Jus55 post-natal lethal. MGI:2671927. l11Jus56 post-natal lethal. MGI:2671928. l11Jus57 post-natal lethal. MGI:3034011. l11Jus58 9.5 - 12.5 dpc. MGI:3043663. l11Jus59 still to be determined. Lab Name Phenotype IJOART • METABOLISM: MGI Accession MGI:2671716. • m04 hem low rbc/hgb/hct 11/17/52. MGI:2671712. hem1 low wbc, neutrophilic blasts. MGI:2671713. hem2 low wbc, cf c: 1-2 vs 8. MGI:2671715. hem3 low rbc/hgb/hct 6/10/32. NEUROLOGICAL: MGI Accession Lab Name Phenotype MGI:2671725. nurm01Jus tester animals are hyperactive, nervousness, tremors. previously known as jittery 1.. MGI:2671727. nurm02Jus testers are hyperactive. previously known as shaky 3.. MGI:2671729. nur m03Jus small, hyperactive, some affecteds show craniofacial abnormalities. previously known as small hyper 3.. MGI:2671730. nurm04Jus affected animals have a quivering phenotype, the phenotype was late onset. previously known as shaky 4.. Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 • MGI:2671731. nurm05Jus affected animals have a quivering phenotype, noticeable when they move. phenotype is late onset, do not see it until at least 3 months old. previously called shaky 5.. MGI:2671732. nurm06Jus affected animals hyperactive and 1/2 size of siblings, 6 of 9 testers, 2 of 24 carriers affected, may be outside of inversion on 11. previously called small hyper 5.. MGI:2671733. nurm07Jus smaller, lethargic, testers have reduced open field activity, develop late onset tremors upon movement. previously known as small lethargic.. MGI:2671734. nurm08Jus hyper, seizing, previously known as flicker. MGI:2671735. nur m09Jus hyperactive, jittery weaving gait, hearing loss. previously known as jittery 2.. SKIN AND COAT: MGI Accession 4 178 Lab Name MGI:2671724. skc MGI:3038892. Skcm02Jus CONCLUSION m01Jus Phenotype greasy looking hair, previously known as greasy coat. scruffy, hair sticks out straight, previously known as pete rose hair. IJOART Since so many years, mapping & identification of disease-causing genes in humans is being carried out in so many laboratories with different methodologies. Today, classical map-based gene discovery has been augmented by the sequence-based gene discovery, given that the human genome project has produced high-precision tools for disease gene locus mapping and identification. So far, the characterization of genetic defects has been successfully accomplished in more than 1600 human monogenic diseases. Mapping common & genetically complex human disease traits has proved more difficult but even in these more complex cases, a no of mutations associated with human complex diseases have been identified.Like most of the confirmatory test for different types of radicals in chemistry lab, there must be a systematic procedure for identification of disease gene. During many of the cases, mouse knockout gene dataset is taken as an alternative option to understand the role of disease genes in human because chromosome 11 of mouse is similar to the gene of human chromosome 17, so in this article it was taken as a suitable proxy model to find out disease loci. Here bioinformatics based analysis are carried out for classifying genes from mouse chromosome 11 in the mutagenesis screen (69mb-104Mb) and also other factors like how many genes are participating for duplicacy, Where is there actual position? Is there any function of the gene be affected with their location? These classification of DL and DV genes,their underexpressed and overexpressed characters help us in finding human disease,ageing and biosenescence etc. But bioinformatics based classification along with few statistical parameters help us to predict absolutely in accurate way. ACKNOWLEDGMENT I cordially thankful to all my students, laboratory staffs and especially to the Director Mr P.K.Boss.chemistry Head and prof John Pejjulo,PhD in biostatistics for his online support during my work. Copyright © 2014 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN 2278-7763 179 Glossary: 1. Cladogram:- Phylogenetic tree showing the relationship between species. 2. Homologues:- 2 genes are homologous if they evolved from the same common ancestor. 3. Pattern:-Conserved residues that one can use as a functional signature. 4. TrEMBL:-Translated EMBL which contains all the putative protein sequence contained in the nucleotide databases. It’s US counterpart is Non redundant. 5. E-value:-expected value. i.e how likely the similarity between your sequence and database sequence due to a chance.Less e value more suitable for research.Evalue of 10 to the power -32 is better than 10 to the power -4 6. P-value:-In statistical testing, the p value indicates whether some effect (like whether the difference in the average value of some quantity is different between two groups, or whether one numerical variable is correlated with another numeric variable) is statistically significant. Statistically significant generally means that the results one observed (the difference in some average between two groups) is very unlikely to have arisen only from random fluctuations in your observed sample, if there is truly no differ- IJOART ence between the two groups in the whole population. The p value is the probability of getting results at least as convincing as what one actually get, if there's really nothing going on, but only random fluctuations. If this p value is less than some small number (often set at 0.05), then the results are said to be statistically significant. References 1. Hentges, K.E., Pollock, D.D., Liu, B. and Justice, M.J. (2007) Regional variation in the density of essential genes in mice. PLoS Genet. 2 McKusick, V. (1998) Mendelian Inheritance in Man., A Catalog of Human Genes and Genetic Disorders. 3. Bult, C.J., Eppig, J.T., Kadin, J.A., Richardson, J.E. and Blake, J.A. (2008) The Mouse Genome Database (MGD): mouse biology and model systems. Nucleic Acids Res, 36, D724-8. 4. Maere, S., Heymans, K. and Kuiper, M. (2005) BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics, 21, 3448-9. 5. Lowe, H.J. and Barnett, G.O. (1994) Understanding and using the medical subject heading (MeSH) vocabulary 6. Smedley, D., Haider, S., Ballester, B., Holland, R., London, D., Thorisson, G. and Kasprzyk, A. (2009) BioMart-biological queries made easy. BMC Genomics, 10, 22. 7. Kent, W.J. (2002) BLAT--the BLAST-like alignment tool. Genome Res, 12, 656-64. Copyright © 2014 SciResPub. IJOART