Genetics of Autism Running head: Genetics of Autism Genetics of Autism: A Bioinformatics Analysis of Suspected Genes and Products Teresa M. LuPone Glendale Community College 1 Genetics of Autism 2 Abstract Autism Spectrum Disorders (ASDs) are being diagnosed at alarming rates in our modern world. There’s emerging evidence for genes that are involved in the phenotypic expression of ASDs. Scientists worldwide are racing to find out the genetic components of the disorders in an effort to find treatments or even preventative measures. The goal of this project is to use the tools of Bioinformatics to analyze suspected genes and products. It was found that of these genes there are a total of 195 ORF’s that are greater than 300nucleotides in length of which the longest for each gene was chosen and a proposed protein sequence for each was found and analyzed via BLASTp and Conserved Domains via NCBI. Four proteins were found and analyzed. The amino acid lengths for each protein was 494AA, 263AA, 359, and 364AA respectively. These proteins were analyzed for similarities to known proteins via the BLAST tool. Three proteins had similarity in a shared conserved domain. One protein did not share commonality with the others that was significant. Though there is similarity there is not much evidence yet for roles of proposed proteins in Autism. Future laboratory research is recommended based upon these findings. Genetics of Autism 3 Genetics of Autism: An Analysis of Suspected Genes and Products Background Autism spectrum disorders are a group of developmental disorders defined by a wide-range of psycho-social behavioral characteristics, skills, and cognitive development. The National Institute of Mental Health defines the symptoms of an ASD as social impairment, communication difficulties, repetitive and stereotyped behaviors. Social impairment includes many signs including, difficulty with eye contact, failure to respond or perceived inattention, social reclusivity during activities, and responding in an unusual manner when others show complex emotion such as anger, distress or affection. More recently research has shown support that ASD children “do not respond to emotional cues in human social interactions because they may not pay attention to the social cues that others typically notice” (NIMH 2011). Symptomatically speaking, ASD individuals have a range of minor to extreme difficulty in social settings. Communication difficulties also range in failures in communicative development or responsiveness to echolalia. Conversations also seem to be a major issue as well in that the back and forth of conversation with another person can be difficult. Many children with an ASD that have delayed development of purposeful gestures and language may rely upon exaggerated vocalizations such as screaming, or physical actions like grabbing until taught other means of self-expression, (NIMH 2011). These difficulties in children if not treated properly can cause issue with other mental illnesses later in life. Young adults, and adults who are aware of their difficulty with speech expression may find themselves with social anxiety or depression due to these difficulties if not addressed earlier in life. Genetics of Autism 4 Repetitive and stereotyped behaviors do not limit themselves only to speech expression as in the case of echolalia. NIMH states that unusual motions and behaviors characterize stereotypy such as: limb flapping, and specified patterns of walking. Furthermore ASD individuals have overly focused interests in that they become fascinated to something, which catches their attention. Examples can be anything from how the wheels on a car work to how dolphins swim. These interests grab the individual’s attention at the sake of everything else around them. As of 2008 the Center for Disease Control and Prevention lists on its ASD page that 1 in 88 children have been identified with an ASD with prevalence in boys as 1 in 54 individuals. This shows a steady increase in previous years where in 2006 it was 1 in 110 children, and in 2000 it was 1 in 150 children. However it is not known whether the alarming statistics of autism in the population are due to an increase of cases or just a better understanding of the signs and symptoms. Purpose In recent years there has been much discussion and research as to the basis for these disorders. Many suspect environmental factors, while others insist on a chromosomal basis. The goal of this project is to analyze the emerging evidence for the genetic basis of autism, using bioinformatics computational tools for similarities to support the following claim. Genes that may have a role within autism must have some similarity in their gene products, which may have some influence over the spectrum disorders of autism. This project also will include the recommendations for further laboratory experiments that may further the science. Literature Review Austism’s growing prevalence in the world population has been a serious cause for concern for not only pediatricians, but also researchers. There have been numerous studies and Genetics of Autism 5 upon which much has been learned about the disorders but not very much has been discerned as to the why or how ASDs occur. Since the success of the Human Genome Project a multitude of disorders have been brought into a crystal clear focus. Autism is one of those diseases in which its breadth and depth causes more of a grey area. Nevertheless many approaches have been employed to clear the air and find out the facts of autism. As of 2004, Muhle, Trentacoste and Rapin had done a review of all autism research being done. It was found that genome screens in which common genetic markers were being searched for as well as cytogenetic studies for inherited or spontaneous genetic anomalies on a case by case basis. Further investigations included linkage disequilibrium in reference to the inheritance of an allele more often found in ASD expressed individuals (Muhle, R., Trentacoste, S. V., & Rapin, I. 2004). The search for candidate genes and subsequent abnormalities within are important to weeding out the cause. Fragile X Syndrome was found to be statistically significant in its association with autism. FXS is caused by trinucleotide repeats (CGG) in genetic coding of the FXS protein that causes mental retardation. According to Muhl, Trentacoste and Rapin, about 7-8% of FXS cases were also on the ASD spectrum. Some studies even associate the disorder spectra with possible immune abnormalities. In the editorial by Antonio M. Persico, Judy Van de Water, and Carlos A. Pardo this team analyzes what other evidence is emerging for immune deficiency correlation with ASDs. In their analysis they summarize the findings of N. Momeni et. al and that evidence supports this claim in that elevated plasma levels of factor I have shown to be higher in a group of ASD children compared to control groups. Also, M. I. Waly et al. is mentioned in their directed research towards the Genetics of Autism 6 possible negative effects of enhanced oxidative stress upon the patients, which can contribute to DNA methylation. Furthermore it is important to look into genetic mutation and causal environmental factors. Lambertus Klei’s group of researchers from various institutions, UPMC, Yale, UCLA, Brown, etc, used quantitative genetic analysis techniques with contrasting ASD individuals and controls to estimate narrow-sense heritability. Common variants throughout the genome are used that show polymorphism. It was found that “by analyzing parents, unaffected siblings and alleles not transmitted from parents to their affected children, we conclude that the data [40% narrow sense heritability] for simplex ASD families follow the expectation for additive models closely” (Klei, L, et al 2012). The evidence is mounting for genetic causation of autism but the important piece that is missing is how are these genes involved in the causation of ASDs? This is what the project has attempted to address. Materials and Methods Computational Materials Many bioinformatics tools were employed in the analysis of the genes suspected of association with ASDs. Included amongst the list are Basic Local Alignment Search Tool for protein (BLASTp), Open Reading Frame Finder via NCBI, Conserved Domains via NCBI, and Cn3D protein visualizer. All information including sequences, background literature and images were gathered from the following databases. National Center for Biotechnology Information is a central hub of information that includes access to many databases. The following native NCBI databases were employed: Genetics of Autism 7 PubMed: Research journals, particularly biomedical research, life sciences and online books. Citations and abstracts include biomedicine, health sciences, behavioral sciences, chemical sciences and bioengineering. Medical Subject Headings also known as MeSH. A great resource for understanding what medical disorders are in terms of definitions, symptoms and overall pathophysiological clinical manifestations of any given condition. Nucleotide is database that includes a library of sequences of identified DNA and RNA sequences for thousands of organisms. NCBI’s Protein database is a proteomic database that covers the gene products that are studied and submitted. Protein Data Bank is a Library of visualized proteins including Rasmol and Jmol structures. European Molecular Biology Laboratory at European Bioinformatics institute also called, EMBL-EBI contains analysis tools. Additionally Centers for Disease Control and prevention, CDC for short was used based on their statistical data for Autism spectrum disorders. National Institute of Mental Health (NIMH), was used for its background information on ASDs. Genes of Interest The first gene of interest is a multivariant gene called FOXP2 in Homo sapiens and is characterized as follows from Refseq. Mutations in this gene can cause speech-language disorders such as speech language disorder 1 with orofacial dyspraxia. This gene’s product contains polyglutamine tract that is a conserved forkhead/winged-helix transcription factor, which may regulate a whole host of other genes. It is required for proper development of speech and language areas of the brain during embryogenesis, and may be involved in a variety of biological pathways and cascades that may influence language development (RefSeq, 2010). Genetics of Autism 8 Another suspected gene is MECP-2, a gene located on the X-chromosome that is involved in Rett syndrome in females. As mentioned by Persico, Van de Water and Pardo that in Waly’s study girls with Rett syndrome share a “similar abnormality that underlies autistic features.” This is intriguing in that it may be that at least for some cases there may be some correlation with autism. A gene that’s involved in ubiquitin ligase is UBE3A, a mutation in this gene can cause severe disorders characterized by severe motor and intellectual retardation, ataxia, hypotonia, epilepsy, and absence of speech. This gene encodes an E3 ubiquitin-protein ligase, part of the ubiquitin protein degradation system. It is a maternally expressed gene in brain. The protein also important in human papillomavirus types 16 and 18. (RefSeq, 2008). Reelin, RELN, is a large ECM protein that is believed to control cell-cell interactions and neuronal migration that is essential in brain development. Mutations of this gene are associated with autosomal recessive lissencephaly with cerebellar hypoplasia (RefSeq, 2008). Methodology The suspected Genes are run through the ORF finder in NCBI. From there the largest ORF’s and resulting proposed protein sequences were run through a BLASTp and analyzed for conserved domains. The longest ORF was chosen due to the strong likelihood of a functional protein. Finally the conserved domains or similar protein structures if any were visualized with Cn3d protein visualization tool. Results and Evaluations Open reading frames were obtained for each of the four genes analyzed within this project. Each gene was placed under the same parameters of no less than 300 nucleotides per ORF. This is due to the fact that each gene was rather large in size. The FOXP-2 gene yielded 23 ORF’s of Genetics of Autism 9 which the longest, 1485bp, was chosen and subsequently the gene product was placed through a BLASTp analysis to identify it by similar protein structures. The chosen proposed protein from the longest ORF was 494AA in length and held many conserved domains with the most interesting of them being, cd09076. Next the MECP-2 gene was put through the same ORF finder with a result of 57 ORF’s. The longest of which was 792bp in length and 263AA long. The third gene to be analyzed UBE3A had 99 ORF’s with greater than 300 nucleotides. The largest of which was 1080bp with 359AA length. The fourth and final gene to be analyzed RELN was 16 different ORF’s greater than 300 nucleotides with 1095bp as the longest ORF and a 364AA length (Appendix A Table 1). Interestingly enough all three out of the four genes contained proteins in their largest ORF that coded for a protein within a single conserved domain super family. The genes, FOXP2, UBE3A, and RELN all have a conserved domain cd09076 (Appendix C Figure 10 & 11). This was discovered when performing a BLASTp analysis of each chosen proposed protein, full distance tree analyses are located in Appendix D. Selected information from the BLASTp analysis is included in Appendix C including interesting similarities with known proteins. It was difficult to locate any known and fully characterized proteins for these three gene products. Thus there isn’t much known about what exactly these proteins are, however, they have similarities in their shared conserved domain. Whether this means there is a correlation between the genes and possible autism expression, it is not known. Further lab bench work would be optimal in fully characterizing each protein and their specific functions if any at all. Furthermore it is also interesting that MECP-2 did not share in this conserved domain. MECP-2 was unique in that it contained no putative conserved domains. It does share a large similarity with a methyl CPG binding protein with over 90% match identity and an e-value of 8e- Genetics of Autism 10 178. A low e-value, an expectancy number of how much of a random match should be expected is a good sign that these proteins are very similar if not completely identical sequences. The fact that there is some structural similarities between three of the genes protein products does not mean that there is definitive evidence for correlation or causation. Future bench work is suggested that analyzes functionality and structure of these proteins and their possible roles within the bigger picture that each gene plays. If these aren’t normally expressed genes then what if a mutation turns these proteins on? Simultaneously the opposite could also be a valid idea for future research. This project’s claim was that there had to be some similarity between gene products. Gene products from three suspected genes are somewhat related and share at least one conserved domain. However, the second half of the claim doesn’t have enough evidence to substantiate support and is fully recommended for subsequent wet lab work to be done that addresses this second half of the claim. Additionally further studies of the characterization of these proteins can discern more about the phylogenetic relationships based upon functionality and accurate models of 3-dimensional structures is also highly recommended. Thus in conclusion this project supports the first part of the claim in that some of the genes contain similarities in their gene products the second part cannot be supported at this time. Genetics of Autism 11 References Ameis, S. H., & Szatmari, P. (2012). Imaging-genetics in autism spectrum disorder: Advances, translational impact, and future directions. Frontiers in Psychiatry, 3(46), 16. Retrieved from www.frontiersin.org. Klei, L., Sanders, S. J., Murtha, M. T., & ET AL (2012). Common genetics variants, acting additively, are a major source of risk for autism. Molecular Autism, 3(9), 28. doi:10.1186/2040-2392-2-9. Muhle, R., Trentacoste, S. V., & Rapin, I. (2004, May). The Genetics of Autism.Pediatrics. Retrieved October 10, 2012, from http://www.pediatrics.org/cgi/content/full/113/e472 NIMH. (2011, October 26). A parent's guide to autism. website: http://www.nimh.nih.gov/health/publications/a-parents-guide-to-autism-spectrumdisorder/what-is-autism-spectrum-disorder-asd.shtml (n.d.). Autism spectrum disorders. website: http://www.cdc.gov/ncbddd/autism/index.html Perisco, A. M., Van de Water, J., & Pardo, C. A. (2012). Autism: Where Genetics Meets the Immune System. Autism Research and Treatment, 2012. doi:10.1155/2012/486359. Genetics of Autism 12 Appendix A A. Genes of Interest a. FOXP2 Map: http://tinyurl.com/cxbxdaf i. Sequence is located at the following webpage as it was too large to fit in this report: http://www.ncbi.nlm.nih.gov/nuccore/21322221 b. MECP-2 Map: http://tinyurl.com/btr9nec i. Sequence is located at the following webpage http://www.ncbi.nlm.nih.gov/nuccore/22830571 c. UBE3A Map: http://tinyurl.com/d2clzcf i. Sequence: http://www.ncbi.nlm.nih.gov/nuccore/21306876 d. RELN Map: http://tinyurl.com/blrc8c2 i. Sequence: http://www.ncbi.nlm.nih.gov/nuccore/1809222 Genetics of Autism 13 Appendix B Table 1 Gene ORF Analysis Gene #ORFs ORF Chosen Parameters Length of AA length ORF Proposed protein FOXP2 23 (-1) 70942- >300 nucleo 1485 494 >300 nucleo 792 263 >300 nucleo 1080 359 >300 nucleo 1095 364 72426 MECP-2 57 (-3) 2183922630 UBE3A 99 (+2) 5455155630 RELN 16 (+2) 1020511299 Genetics of Autism Appendix C Figure 1. FOXP2 Chosen ORF with Protein Sequence Figure 2: FOXP2 chosen ORF proposed protein sequence BLASTp Similar match 14 Genetics of Autism Figure 3: MECP-2 Chosen ORF with proposed protein sequence Figure 4: MECP-2 BLASTp Similar sequence match 15 Genetics of Autism Figure 5: MECP-2 Similar Protein Structure methyl CpG binding protein 2 transcript 1 Figure 6: UBE3A chosen ORF with Sequence 16 Genetics of Autism Figure 7: UBE3A proposed protein BLASTp Similar Match Figure 8: RELN Chosen ORF with Sequence 17 Genetics of Autism 18 Figure 9: RELN BLASTp Similar Match Figure 10: For FOXP2, UBE3A, RELN proposed protein Conserved domain cd09076 Sequence cluster map Genetics of Autism Figure 11: Conserved Domain structure as part of FOXP2, UBE3A, RELN proposed proteins 19 Appendix D BLASTp Analysis of FOXP2 BLASTp Analysis of MECP-2 Including BLASTp Analysis of UBE3A BLASTp Analysis of RELN