Evaluation of Mapping Algorithms for Next Generation Sequencing

advertisement
Evaluation of Mapping Algorithms for Next Generation Sequencing Data
Master Thesis Project in Bioinformatics
Genomics Core Facility
Project description
The technologies for sequencing of DNA have developed from the first generation gel based systems,
to the semi- automated capillary systems and now to the next generation systems in which the single
DNA molecules are cloned and sequenced in parallel on a solid support. This is performed in very large
scale, and renders the possibility to sequence the whole human genome within 10 days. A single
experiment can generate several hundreds of gigabases of sequencing data. The next generation
sequencing market is largely driven by targeted re-sequencing efforts aimed at finding genetic
variations and rare mutations that contribute to complex diseases.
The aim of this project is to investigate the performance of different alignment software available for
Next Generation Sequencing (NGS) data and compare their sensitivity of detecting genomic alterations
in the human genome. This will be done by implementation of a complete bioinformatic workflow,
using data generated from Illumina targeted re-sequencing. This includes managing of the data through
quality control, mapping and SNP calling, where focus will be on the mapping part. The performance
of the different alignment algorithms will then be evaluated by introducing insertions and deletions of
different lengths in the human reference genome.
Specific aims
The aims of this master thesis project is:
1. Review different algorithms for mapping data from NGS to a reference genome
2. Implement a bioinformatics pipeline for of sequence reads against the human reference genome.
3. Use data from targeted resequencing and different mapping algorithms to investigate the
performance of
a. detecting insertions and deletions of different lengths
b. mapping reads of varying length
The project requires experience in the Linux/Unix computer environment and good programming skills
in Python, Perl or some similar language. Programming is a substantial part of the projects, so interest
in this field is an advantage.
The master thesis project will be performed at the Genomics Core Facility, which is a non-profit
laboratory within the Gothenburg University, providing services to researchers mainly from across the
University and Hospital campuses. Genomics performs advanced analyses on all kinds of genetic
materials, e.g. DNA extraction, DNA sequencing, real-time PCR, fragment analysis, and SNP
genotyping. Genomics CF is currently purchasing a next generation sequencing instrument which will
be accessible for all interested researchers and clinicians in the region. In connection with the purchase
of the equipment there is a need for development of the post-experiment bioinformatics data processing
and establishment of data analysis pipelines. Performing this master thesis project at Genomics CF will
support the interdisciplinary bridging between laboratory analysis and downstream bioinformatics data
handling. The project will provide valuable experience of working with the downstream bioformatics
of next generation sequencing data at the resource center.
The project will be supervised by Maria Nethander and Frida Abel at the Genomics Core Facility and
examined by Erik Kristiansson at the Department of Mathematical Statistics, Chalmers. For more
information, contact Maria Nethander (maria.nethander@gu.se) or Frida Abel (frida.abel@gu.se).
Download