Evaluation of Mapping Algorithms for Next Generation Sequencing Data Master Thesis Project in Bioinformatics Genomics Core Facility Project description The technologies for sequencing of DNA have developed from the first generation gel based systems, to the semi- automated capillary systems and now to the next generation systems in which the single DNA molecules are cloned and sequenced in parallel on a solid support. This is performed in very large scale, and renders the possibility to sequence the whole human genome within 10 days. A single experiment can generate several hundreds of gigabases of sequencing data. The next generation sequencing market is largely driven by targeted re-sequencing efforts aimed at finding genetic variations and rare mutations that contribute to complex diseases. The aim of this project is to investigate the performance of different alignment software available for Next Generation Sequencing (NGS) data and compare their sensitivity of detecting genomic alterations in the human genome. This will be done by implementation of a complete bioinformatic workflow, using data generated from Illumina targeted re-sequencing. This includes managing of the data through quality control, mapping and SNP calling, where focus will be on the mapping part. The performance of the different alignment algorithms will then be evaluated by introducing insertions and deletions of different lengths in the human reference genome. Specific aims The aims of this master thesis project is: 1. Review different algorithms for mapping data from NGS to a reference genome 2. Implement a bioinformatics pipeline for of sequence reads against the human reference genome. 3. Use data from targeted resequencing and different mapping algorithms to investigate the performance of a. detecting insertions and deletions of different lengths b. mapping reads of varying length The project requires experience in the Linux/Unix computer environment and good programming skills in Python, Perl or some similar language. Programming is a substantial part of the projects, so interest in this field is an advantage. The master thesis project will be performed at the Genomics Core Facility, which is a non-profit laboratory within the Gothenburg University, providing services to researchers mainly from across the University and Hospital campuses. Genomics performs advanced analyses on all kinds of genetic materials, e.g. DNA extraction, DNA sequencing, real-time PCR, fragment analysis, and SNP genotyping. Genomics CF is currently purchasing a next generation sequencing instrument which will be accessible for all interested researchers and clinicians in the region. In connection with the purchase of the equipment there is a need for development of the post-experiment bioinformatics data processing and establishment of data analysis pipelines. Performing this master thesis project at Genomics CF will support the interdisciplinary bridging between laboratory analysis and downstream bioinformatics data handling. The project will provide valuable experience of working with the downstream bioformatics of next generation sequencing data at the resource center. The project will be supervised by Maria Nethander and Frida Abel at the Genomics Core Facility and examined by Erik Kristiansson at the Department of Mathematical Statistics, Chalmers. For more information, contact Maria Nethander (maria.nethander@gu.se) or Frida Abel (frida.abel@gu.se).