File - BIO 283: Bioinformatics Project

advertisement
Genetics of Autism
Running head: Genetics of Autism
Genetics of Autism: A Bioinformatics Analysis of Suspected Genes and Products
Teresa M. LuPone
Glendale Community College
1
Genetics of Autism
2
Abstract
Autism Spectrum Disorders (ASDs) are being diagnosed at alarming rates in our modern
world. There’s emerging evidence for genes that are involved in the phenotypic expression of
ASDs. Scientists worldwide are racing to find out the genetic components of the disorders in an
effort to find treatments or even preventative measures. The goal of this project is to use the tools
of Bioinformatics to analyze suspected genes and products. It was found that of these genes there
are a total of 195 ORF’s that are greater than 300nucleotides in length of which the longest for
each gene was chosen and a proposed protein sequence for each was found and analyzed via
BLASTp and Conserved Domains via NCBI. Four proteins were found and analyzed. The amino
acid lengths for each protein was 494AA, 263AA, 359, and 364AA respectively. These proteins
were analyzed for similarities to known proteins via the BLAST tool. Three proteins had
similarity in a shared conserved domain. One protein did not share commonality with the others
that was significant. Though there is similarity there is not much evidence yet for roles of
proposed proteins in Autism. Future laboratory research is recommended based upon these
findings.
Genetics of Autism
3
Genetics of Autism: An Analysis of Suspected Genes and Products
Background
Autism spectrum disorders are a group of developmental disorders defined by a wide-range
of psycho-social behavioral characteristics, skills, and cognitive development. The National
Institute of Mental Health defines the symptoms of an ASD as social impairment,
communication difficulties, repetitive and stereotyped behaviors.
Social impairment includes many signs including, difficulty with eye contact, failure to
respond or perceived inattention, social reclusivity during activities, and responding in an
unusual manner when others show complex emotion such as anger, distress or affection. More
recently research has shown support that ASD children “do not respond to emotional cues in
human social interactions because they may not pay attention to the social cues that others
typically notice” (NIMH 2011). Symptomatically speaking, ASD individuals have a range of
minor to extreme difficulty in social settings.
Communication difficulties also range in failures in communicative development or
responsiveness to echolalia. Conversations also seem to be a major issue as well in that the back
and forth of conversation with another person can be difficult. Many children with an ASD that
have delayed development of purposeful gestures and language may rely upon exaggerated
vocalizations such as screaming, or physical actions like grabbing until taught other means of
self-expression, (NIMH 2011). These difficulties in children if not treated properly can cause
issue with other mental illnesses later in life. Young adults, and adults who are aware of their
difficulty with speech expression may find themselves with social anxiety or depression due to
these difficulties if not addressed earlier in life.
Genetics of Autism
4
Repetitive and stereotyped behaviors do not limit themselves only to speech expression as
in the case of echolalia. NIMH states that unusual motions and behaviors characterize stereotypy
such as: limb flapping, and specified patterns of walking. Furthermore ASD individuals have
overly focused interests in that they become fascinated to something, which catches their
attention. Examples can be anything from how the wheels on a car work to how dolphins swim.
These interests grab the individual’s attention at the sake of everything else around them.
As of 2008 the Center for Disease Control and Prevention lists on its ASD page that 1 in 88
children have been identified with an ASD with prevalence in boys as 1 in 54 individuals. This
shows a steady increase in previous years where in 2006 it was 1 in 110 children, and in 2000 it
was 1 in 150 children. However it is not known whether the alarming statistics of autism in the
population are due to an increase of cases or just a better understanding of the signs and
symptoms.
Purpose
In recent years there has been much discussion and research as to the basis for these
disorders. Many suspect environmental factors, while others insist on a chromosomal basis. The
goal of this project is to analyze the emerging evidence for the genetic basis of autism, using
bioinformatics computational tools for similarities to support the following claim. Genes that
may have a role within autism must have some similarity in their gene products, which may have
some influence over the spectrum disorders of autism. This project also will include the
recommendations for further laboratory experiments that may further the science.
Literature Review
Austism’s growing prevalence in the world population has been a serious cause for
concern for not only pediatricians, but also researchers. There have been numerous studies and
Genetics of Autism
5
upon which much has been learned about the disorders but not very much has been discerned as
to the why or how ASDs occur. Since the success of the Human Genome Project a multitude of
disorders have been brought into a crystal clear focus. Autism is one of those diseases in which
its breadth and depth causes more of a grey area. Nevertheless many approaches have been
employed to clear the air and find out the facts of autism. As of 2004, Muhle, Trentacoste and
Rapin had done a review of all autism research being done. It was found that genome screens in
which common genetic markers were being searched for as well as cytogenetic studies for
inherited or spontaneous genetic anomalies on a case by case basis. Further investigations
included linkage disequilibrium in reference to the inheritance of an allele more often found in
ASD expressed individuals (Muhle, R., Trentacoste, S. V., & Rapin, I. 2004). The search for
candidate genes and subsequent abnormalities within are important to weeding out the
cause.
Fragile X Syndrome was found to be statistically significant in its association with
autism. FXS is caused by trinucleotide repeats (CGG) in genetic coding of the FXS protein that
causes mental retardation. According to Muhl, Trentacoste and Rapin, about 7-8% of FXS cases
were also on the ASD spectrum.
Some studies even associate the disorder spectra with possible immune abnormalities. In
the editorial by Antonio M. Persico, Judy Van de Water, and Carlos A. Pardo this team analyzes
what other evidence is emerging for immune deficiency correlation with ASDs. In their analysis
they summarize the findings of N. Momeni et. al and that evidence supports this claim in that
elevated plasma levels of factor I have shown to be higher in a group of ASD children compared
to control groups. Also, M. I. Waly et al. is mentioned in their directed research towards the
Genetics of Autism
6
possible negative effects of enhanced oxidative stress upon the patients, which can contribute to
DNA methylation.
Furthermore it is important to look into genetic mutation and causal environmental factors.
Lambertus Klei’s group of researchers from various institutions, UPMC, Yale, UCLA, Brown,
etc, used quantitative genetic analysis techniques with contrasting ASD individuals and controls
to estimate narrow-sense heritability. Common variants throughout the genome are used that
show polymorphism. It was found that “by analyzing parents, unaffected siblings and alleles not
transmitted from parents to their affected children, we conclude that the data [40% narrow sense
heritability] for simplex ASD families follow the expectation for additive models closely” (Klei,
L, et al 2012). The evidence is mounting for genetic causation of autism but the important piece
that is missing is how are these genes involved in the causation of ASDs? This is what the
project has attempted to address.
Materials and Methods
Computational Materials
Many bioinformatics tools were employed in the analysis of the genes suspected of
association with ASDs. Included amongst the list are Basic Local Alignment Search Tool for
protein (BLASTp), Open Reading Frame Finder via NCBI, Conserved Domains via NCBI, and
Cn3D protein visualizer.
All information including sequences, background literature and images were gathered
from the following databases.
National Center for Biotechnology Information is a central hub of information that includes
access to many databases. The following native NCBI databases were employed:
Genetics of Autism
7
PubMed: Research journals, particularly biomedical research, life sciences and online
books. Citations and abstracts include biomedicine, health sciences, behavioral sciences,
chemical sciences and bioengineering.
Medical Subject Headings also known as MeSH. A great resource for understanding what
medical disorders are in terms of definitions, symptoms and overall pathophysiological clinical
manifestations of any given condition. Nucleotide is database that includes a library of sequences
of identified DNA and RNA sequences for thousands of organisms. NCBI’s Protein database is a
proteomic database that covers the gene products that are studied and submitted. Protein Data
Bank is a Library of visualized proteins including Rasmol and Jmol structures. European
Molecular Biology Laboratory at European Bioinformatics institute also called, EMBL-EBI
contains analysis tools.
Additionally Centers for Disease Control and prevention, CDC for short was used based on
their statistical data for Autism spectrum disorders. National Institute of Mental Health (NIMH),
was used for its background information on ASDs.
Genes of Interest
The first gene of interest is a multivariant gene called FOXP2 in Homo sapiens and is
characterized as follows from Refseq. Mutations in this gene can cause speech-language
disorders such as speech language disorder 1 with orofacial dyspraxia. This gene’s product
contains polyglutamine tract that is a conserved forkhead/winged-helix transcription
factor, which may regulate a whole host of other genes. It is required for proper development
of speech and language areas of the brain during embryogenesis, and may be involved in a
variety of biological pathways and cascades that may influence language development (RefSeq,
2010).
Genetics of Autism
8
Another suspected gene is MECP-2, a gene located on the X-chromosome that is
involved in Rett syndrome in females. As mentioned by Persico, Van de Water and Pardo
that in Waly’s study girls with Rett syndrome share a “similar abnormality that underlies
autistic features.” This is intriguing in that it may be that at least for some cases there may be
some correlation with autism.
A gene that’s involved in ubiquitin ligase is UBE3A, a mutation in this gene can cause
severe disorders characterized by severe motor and intellectual retardation, ataxia, hypotonia,
epilepsy, and absence of speech. This gene encodes an E3 ubiquitin-protein ligase, part of the
ubiquitin protein degradation system. It is a maternally expressed gene in brain. The protein also
important in human papillomavirus types 16 and 18. (RefSeq, 2008).
Reelin, RELN, is a large ECM protein that is believed to control cell-cell interactions and
neuronal migration that is essential in brain development. Mutations of this gene are associated
with autosomal recessive lissencephaly with cerebellar hypoplasia (RefSeq, 2008).
Methodology
The suspected Genes are run through the ORF finder in NCBI. From there the largest
ORF’s and resulting proposed protein sequences were run through a BLASTp and analyzed for
conserved domains. The longest ORF was chosen due to the strong likelihood of a functional
protein. Finally the conserved domains or similar protein structures if any were visualized with
Cn3d protein visualization tool.
Results and Evaluations
Open reading frames were obtained for each of the four genes analyzed within this project.
Each gene was placed under the same parameters of no less than 300 nucleotides per ORF. This
is due to the fact that each gene was rather large in size. The FOXP-2 gene yielded 23 ORF’s of
Genetics of Autism
9
which the longest, 1485bp, was chosen and subsequently the gene product was placed through a
BLASTp analysis to identify it by similar protein structures. The chosen proposed protein from
the longest ORF was 494AA in length and held many conserved domains with the most
interesting of them being, cd09076. Next the MECP-2 gene was put through the same ORF
finder with a result of 57 ORF’s. The longest of which was 792bp in length and 263AA long.
The third gene to be analyzed UBE3A had 99 ORF’s with greater than 300 nucleotides. The
largest of which was 1080bp with 359AA length. The fourth and final gene to be analyzed
RELN was 16 different ORF’s greater than 300 nucleotides with 1095bp as the longest ORF and
a 364AA length (Appendix A Table 1).
Interestingly enough all three out of the four genes contained proteins in their largest ORF
that coded for a protein within a single conserved domain super family. The genes, FOXP2,
UBE3A, and RELN all have a conserved domain cd09076 (Appendix C Figure 10 & 11). This
was discovered when performing a BLASTp analysis of each chosen proposed protein, full
distance tree analyses are located in Appendix D. Selected information from the BLASTp
analysis is included in Appendix C including interesting similarities with known proteins. It was
difficult to locate any known and fully characterized proteins for these three gene products. Thus
there isn’t much known about what exactly these proteins are, however, they have similarities in
their shared conserved domain. Whether this means there is a correlation between the genes and
possible autism expression, it is not known. Further lab bench work would be optimal in fully
characterizing each protein and their specific functions if any at all.
Furthermore it is also interesting that MECP-2 did not share in this conserved domain.
MECP-2 was unique in that it contained no putative conserved domains. It does share a large
similarity with a methyl CPG binding protein with over 90% match identity and an e-value of 8e-
Genetics of Autism
10
178. A low e-value, an expectancy number of how much of a random match should be expected
is a good sign that these proteins are very similar if not completely identical sequences.
The fact that there is some structural similarities between three of the genes protein
products does not mean that there is definitive evidence for correlation or causation. Future
bench work is suggested that analyzes functionality and structure of these proteins and their
possible roles within the bigger picture that each gene plays. If these aren’t normally expressed
genes then what if a mutation turns these proteins on? Simultaneously the opposite could also be
a valid idea for future research.
This project’s claim was that there had to be some similarity between gene products. Gene
products from three suspected genes are somewhat related and share at least one conserved
domain. However, the second half of the claim doesn’t have enough evidence to substantiate
support and is fully recommended for subsequent wet lab work to be done that addresses this
second half of the claim. Additionally further studies of the characterization of these proteins can
discern more about the phylogenetic relationships based upon functionality and accurate models
of 3-dimensional structures is also highly recommended. Thus in conclusion this project supports
the first part of the claim in that some of the genes contain similarities in their gene products the
second part cannot be supported at this time.
Genetics of Autism
11
References
Ameis, S. H., & Szatmari, P. (2012). Imaging-genetics in autism spectrum disorder: Advances,
translational impact, and future directions. Frontiers in Psychiatry, 3(46), 16. Retrieved
from www.frontiersin.org.
Klei, L., Sanders, S. J., Murtha, M. T., & ET AL (2012). Common genetics variants, acting
additively, are a major source of risk for autism. Molecular Autism, 3(9), 28.
doi:10.1186/2040-2392-2-9.
Muhle, R., Trentacoste, S. V., & Rapin, I. (2004, May). The Genetics of Autism.Pediatrics.
Retrieved October 10, 2012, from
http://www.pediatrics.org/cgi/content/full/113/e472
NIMH. (2011, October 26). A parent's guide to autism. website:
http://www.nimh.nih.gov/health/publications/a-parents-guide-to-autism-spectrumdisorder/what-is-autism-spectrum-disorder-asd.shtml
(n.d.). Autism spectrum disorders. website: http://www.cdc.gov/ncbddd/autism/index.html
Perisco, A. M., Van de Water, J., & Pardo, C. A. (2012). Autism: Where Genetics Meets the
Immune System. Autism Research and Treatment, 2012. doi:10.1155/2012/486359.
Genetics of Autism
12
Appendix A
A. Genes of Interest
a.
FOXP2 Map: http://tinyurl.com/cxbxdaf
i. Sequence is located at the following webpage as it was too large to fit in
this report: http://www.ncbi.nlm.nih.gov/nuccore/21322221
b. MECP-2 Map: http://tinyurl.com/btr9nec
i. Sequence is located at the following webpage
http://www.ncbi.nlm.nih.gov/nuccore/22830571
c.
UBE3A Map: http://tinyurl.com/d2clzcf
i. Sequence: http://www.ncbi.nlm.nih.gov/nuccore/21306876
d. RELN Map: http://tinyurl.com/blrc8c2
i. Sequence: http://www.ncbi.nlm.nih.gov/nuccore/1809222
Genetics of Autism
13
Appendix B
Table 1
Gene ORF Analysis
Gene
#ORFs
ORF Chosen
Parameters
Length of
AA length
ORF
Proposed
protein
FOXP2
23
(-1) 70942-
>300 nucleo
1485
494
>300 nucleo
792
263
>300 nucleo
1080
359
>300 nucleo
1095
364
72426
MECP-2
57
(-3) 2183922630
UBE3A
99
(+2) 5455155630
RELN
16
(+2) 1020511299
Genetics of Autism
Appendix C
Figure 1. FOXP2 Chosen ORF with Protein Sequence
Figure 2: FOXP2 chosen ORF proposed protein sequence BLASTp Similar match
14
Genetics of Autism
Figure 3: MECP-2 Chosen ORF with proposed protein sequence
Figure 4: MECP-2 BLASTp Similar sequence match
15
Genetics of Autism
Figure 5: MECP-2 Similar Protein Structure methyl CpG binding protein 2 transcript 1
Figure 6: UBE3A chosen ORF with Sequence
16
Genetics of Autism
Figure 7: UBE3A proposed protein BLASTp Similar Match
Figure 8: RELN Chosen ORF with Sequence
17
Genetics of Autism
18
Figure 9: RELN BLASTp Similar Match
Figure 10: For FOXP2, UBE3A, RELN proposed protein Conserved domain cd09076 Sequence
cluster map
Genetics of Autism
Figure 11: Conserved Domain structure as part of FOXP2, UBE3A, RELN proposed
proteins
19
Appendix D
BLASTp Analysis of FOXP2
BLASTp Analysis of MECP-2 Including
BLASTp Analysis of UBE3A
BLASTp Analysis of RELN
Download