Rhesus Alignment and Variant Calling Summary Jessica M. Maia September 25, 2014. This document describes the process of alignment and variant calling for a single genome sample. Reference Sequence We downloaded from the UCSC genome browser individual chromosomes belonging to the rhesus macaque genome rheMac2 (v.1.0 Mmul_051212 rhesus genome assembly, Jan. 2006) .The reference sequence we created is composed of chromosomes 1-20,X, and Ur. Alignment We aligned reads from a given sequencing lane to the reference sequence with BWA (version 0.5.10). We trimmed low quality bases with parameter '-q 15'. To fix mate pair tags we used Picard (version 1.59). We then merged each sample’s lane level alignment files with Picard creating a sample level coordinate sorted alignment file. This was followed by the removal of duplicate reads from this merged alignment file with Picard. Variant calling & annotation To improve variant calling, we re-aligned reads around known rhesus snps and indels downloaded from Ensembl (version 73) with GATK (version 1.6). Variants were identified with SAMtools (version 0.1.17) with in the following way: ‘samtools mpileup -EDS -C50 -d 1000’, followed by ‘bcftools view -vcgN' . Variants were filtered with vcftools (version 0.1.11) and tabix (tabix-0.2.6) with the command 'vcfannotate -f'. We annotated the filtered variants using SnpEff (version 3.3) with Ensembl (MMUL_1.72) annotations. References 1. BWA Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-60 2. SAMtools Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. 3. GATK McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303. 4. Picard http://picard.sourceforge.net 5. SnpEff A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. Fly (Austin). 2012 6. VCFtools The Variant Call Format and VCFtools, Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert Handsaker, Gerton Lunter, Gabor Marth, Stephen T. Sherry, Gilean McVean, Richard Durbin and 1000 Genomes Project Analysis Group, Bioinformatics, 2011. 7. Tabix Tabix: fast retrieval of sequence features from generic TAB-delimited files. Li H. Bioinformatics. 2011 Mar 1;27(5):718-9. 8. Rhesus Base annotation file http://www.rhesusbase.org/download/download.jsp RB2 Gene Models in gpe format. 9. Ensembl Snps and Indels http://e73.ensembl.org/macaca_mulatta 10. Reference Chromosomes http://hgdownload.cse.ucsc.edu/goldenPath/rheMac2/bigZips/chromFa.tar.gz 11. Reference Sequence http://hgdownload.cse.ucsc.edu/goldenPath/rheMac2/bigZips/chromFa.tar.gz