NIHR/BRC Vacation Studentship Summer 2014 Student: Supervisors: Title of Project:

advertisement
NIHR/BRC Vacation Studentship Summer 2014
Student: Noeline Nadarajah
Supervisors: Hywel Williams, Vijeya Ganesan
Title of Project: Elucidating the genetic basis of Moyamoya Disease
Introduction
Moyamoya Disease (MMD) is a cerebrovascular disease which can be characterised by the
spontaneous occlusion of the circle of Willis. This is caused by the vascular occlusion of the
carotid artery which can result in arterial ischemic stroke. Previous research has found that
MMD can be inherited in an autosomal dominant or X-linked manner. However, this project
focused primarily on the autosomal dominant mode of inheritance. In particular, exome
sequences were analysed across 5 affected families (Family 4-8). For some analyses family 4
was analysed separately because it is a large multi-affected pedigree that may be segregating
a unique mutation compared to family 5-8 which are smaller sporadic families.
This project utilised Next Generation Sequencing (NGS) which is a technique for highthroughput genome sequencing through the process of parallel sequencing reactions. NGS
describes a number of related techniques and for this project exome sequencing was chosen
because it is a cost-effective method to search for rare disease variants. Exome sequencing
describes a technique that sequences just the exons of genes which are protein-coding regions
of the genome therefore focusing the analysis process. Since MMD can express such a severe
phenotype, it has been suggested that this is most likely due to a mutation in a protein-coding
region.
Methods
1. Using specific software programs, exome sequence output data was converted into a
VCF file which is a Variant Call Format file. This identifies all the variants which
differ from the reference human genome and therefore could be potentially damaging.
To analyse these VCF files across 5 families, Ingenuity Variant Analysis software was
used by applying various filters to decrease the number of possible variants to detect a
disease-causing candidate. Possible variants were assessed across families.
2. There is an assumption that a causal variant in a disease must be a rare variant
therefore to test this, the software tool PLINK-SEQ was used. Potential candidate
genes were entered into PLINK-SEQ. This allows an in-depth search of the GOSgene
internal database of 385 past patients to distinguish between polymorphisms which
are present in >1% of the population and rare variants which are present in <1% of the
population.
3. In the case that there are no common variants across families, protein pathways in
which variants may hold a role are examined using protein analysis tools such as
DAVID, DAPPLE and Ingenuity Pathway Analysis (IPA). DAVID Functional
Annotation Tool identifies and quantifies the genes involved in a pathway from the
input gene list. DAPPLE stands for Disease Association Protein-Protein Link
Evaluator and builds direct and indirect interaction networks from proteins encoded
by input seed genes. IPA recognises the function of the protein encoded and possible
protein interactions. This program can also determine the predominant pathway that
the majority of genes are in.
Results
1. Fig. 1 shows the initial filters applied to Ingenuity Variant Analysis and the genes
TRPM6, CAMKK2 AND NF1 were found to be consistent across the sporadic
family 5 – 8. By using PLINK-SEQ I identified that these variants were not rare
variants and therefore unlikely to be disease-causing.
2. After altering the filters on Ingenuity Variant Analysis (Fig.2) to shift the focus to
potentially damaging variants, it was found that there were no common variants
across families. Therefore I next used the protein analysis software tools and in
family 4 the CELA1 pathway was highlighted in DAVID for occupying a role in
the conversion of proepithelin to epithelin and wound repair control. This same
gene was also highlighted in IPA as being present in the pathway, displayed in
fig.3.
3. Finally, I proceeded to apply additional filters on Ingenuity Variant Analysis (Fig.
4) to allow the identification of the most damaging list of genes. CELA1 was the
only remaining gene in family 4 and therefore I checked the variant in
PLINKSEQ. There were 2 mutations which affected CELA1 on chromosome 12
in PLINKSEQ. These were at positions 51740413 and 51740415 with the
mutation at position 51740415 having a significantly lower number of variant
alleles than the mutation at 51740413. Therefore this was a potentially interesting
candidate and so to visually check the integrity of the variant in the raw data I
used the software IGV Genome Browser. Due to the high number of variant calls
in this region, we can deduce that this region is most likely a result of
misalignment of the sequence reads during the initial alignment stage of the data
analysis (fig. 5). Whilst assessing this area on the human genome browser UCSC,
a 20bp insertion was identified at this location which is common in the population
therefore the variant I discovered cannot be a rare disease variant.
Discussion
The aim of this project was to identify if there were any damaging variants present in families
4-8 and to analyse how these variants can affect sufferers of MMD. The families were tested
under the autosomal dominant mode of inheritance and it was found that there were not any
variants that were shared across families 4-8. This suggests that there is no single gene that is
causing MMD. Through the use of protein analysis software the CELA1 gene was
highlighted in family 4, however, further in-depth analysis of the raw data identified that this
was a sequencing artefact that was due to misalignment of the sample to the reference
genome. This suggests there is no predominant protein pathway that is being affected. MMD
is a disease which is not well understood at the phenotype level therefore it is difficult to
focus analysis on well-defined patient populations. Due to the rarity of the disease it is also
difficult to determine the mode of inheritance and analyse the data to its full extent. In the
future, with a better understanding of the phenotype and more patients to study it should
become easier to identify patient sub-groups that are more amenable to genetic study which
will aid in the identification of causative genetic factors.
Acknowledgements
This studentship opportunity has enabled me to develop my analytical skills by using
different programs to come to a valid conclusion. I have learnt that organisational and
presentation skills are vital to visualise findings in a clear and concise manner. By attending
lab meetings I am now more aware of current research occurring at Institute of Child Health
and I now appreciate the huge amount of time and the large cohort of people that are involved
in the progress of a research project. Most importantly, I would like to express my thanks to
GOSgene for giving me the chance to work in their team during my summer and it has been
thoroughly enjoyable.
Variants
In Gene
Variant Allele Frequency 1%
Pathogenic/ likely pathogenic phenotype
LOF associated with frameshift, missense
Shared between families
Low variant no. on internal database
TRPM6
CAMKK2
NF1
Figure 1: This represents the primary filters applied to the exome sequence data on Ingenuity
Variant Analysis software. The output of this analysis produced common variants, TRPM6,
CAMKK2, NF1 which were SNPs or sequencing variants.
Variants
In Gene
Variant Allele Frequency 1%
LOF associated with frameshift, missense + damaging polyphen
Input variants into protein analysis software
Pathways shared between families
CELA1
Figure 2: This represents the additional filters applied to the exome sequence data on
Ingenuity Variant Analysis to increase the focus of the search for a rare disease variant. It was
found that none of the pathways were shared between families. The CELA1 pathway was
detected in family 4 by more than one software program.
Acts on
Peptidase
Binding
Enzyme
Transcription
regulator
Kinase
Figure 3: This is the Ingenuity Pathway Analysis output for the variants of family 4. There
are protein interactions occurring between the damaging variants of family 4 and the TCF4
transcription regulator is regulating CELA1.
Variants
In Gene
Variant Allele Frequency 1%
LOF associated with splice, frameshift, nonsense
Low variant no. on internal database
Check on IGV for confirmation
Figure 4: This is the most extreme filter applied to the exome sequence data on Ingenuity
Variant Analysis. I focused particularly on family 4 and investigated the CELA1 gene on IGV
Genome Browser.
Homozygous unaffected
Heterozygous parent
Homozygous affected
Heterozygous parent
Figure 5: IGV Genome Browser shows there are heterozygous parents for this CELA1
variant where both of the parents have one normal chromosome and one chromosome with a
deletion. The affected individual has inherited both of the deletion variants from each parent.
However this is most likely a misalignment due to there being a number of other variants
present in this region. UCSC genome browser has identified this region as a common variant
therefore unlikely to be damaging.
Download