Genome Sequencing: Harmonia axyridis Isabel Risch University of Memphis W. Harry Feinstone Center for Genomic Research May 28, 2013-June 14, 2013 Ladybugs (Ladybirds) Tennessee state insect Coccinella septempunctata Seven-Spotted Lady Beetle Native to North America; being outcompeted by Harmonia Harmonia axyridis Asian/harlequin lady beetle Large coccinellid beetle Dome-shaped; smooth transition between head and thorax/abdomen Adults colored anywhere from yellow to bright red Spots on back can be anywhere from zero to twenty Native to Asia; introduced to North America and Europe in order to control aphid populations; now crowding out other species (invasive) Carries a fungus that kills other species of ladybugs Harmonia creates the chemical ‘harmonine’ which prevents the fungus from infecting it Genome: What Is It? An organism’s hereditary information, coded in DNA/chromosomes; in eukaryotes, includes introns and exons Chromosomes: DNA wrapped around histones Human Chromosome Painting Genome: What Is It Made Of? DNA (deoxyribonucleic acid) Called the “molecule of life” Made of deoxyribose, three phosphate groups, and a nitrogen base Double-stranded molecule; covalent bonds between ribose/phosphate backbone on outside; hydrogen bonds between nitrogen bases on inside Codes for all proteins that make cells (life) possible Allows for the breaking of hydrogen bondsreplication and expression through RNA Bases: Adenine, Thymine, Guanine, Cytosine; A-T, G-C Order of nitrogen bases codes for specific amino acids polypeptide chains protein In eukaryotes, contains both introns (non-coding sections) and exons (coding sections) Genome: What Is It Made Of? DNA and Heredity Heredity: the passing of traits from one generation to the next– basis of genetics and evolution Determined by genes on chromosomes; variations of a gene are alleles Sexually-reproducing animals get two alleles (one from each parent) Mendel’s Law of Segregation Alleles express themselves as phenotypic traits; thus, DNA determines heredity Genome: What Can We Do With This Information? By determining the sequence of genomes, we can… Compare them to other genomes Study phylogeny and evolution Use them to understand diseases and better create potential treatments; also better predict the body’s response to certain treatments Genetic diseases Somatic diseases Use them for forensic science Research deeper into genetic engineering of plants and animals (biotechnology) Genome Mapping Can be done once a genome is sequenced Determines the physical order of the sequence features of the entire DNA of an individual Places certain DNA fragments onto chromosomes by identifying the fragments Identify by certain markers or by the exact base pair sequence of DNA Traditional maps mapped millions of base pairs at once (low resolution), but modern ones can map in SNPs (one or two base pairs at a time) for higher resolutions Can be used to identify a certain genetic marker with a certain disease Somatic diseases Ex: cancer can occur when a tumor-suppressing gene is inactivated or blocked; genome mapping can be used to identify the genes and research ways to reactivate them Genetic diseases Ex: sickle cell anemia is related to a mutation in the beta hemoglobin gene DNA Sequencing: Background Sanger Method Used to determine nucleotide order in DNA Rapid DNA sequencing Uses modified, labeled nucleotides to stop DNA strand elongation at specific bases Scientists treat each DNA sample with one labeled base DNA can then be run on a gel and tracked to where it was terminated; nucleotides separated by size and nucleotide type Results photographed on an X-ray or gel image Dye-terminator sequencing: revised method Uses fluorescent dyes to visualize all bases on one lane DNA Sequencing: Background Illumina Technologies Next-generation sequencing A single strand of DNA fragment provides a template for the DNA to be re-synthesized Signals are emitted and interpreted by the sequencing machine Unlike Sanger, next-gen can be applied to millions of base pairs at once via a flow cell Fragmented reads are then re-assembled by alignment whole genome MiSeq “Personal” tabletop sequencer Capable of many of the functions of a large sequencer Uses fluorescence and LED light while previous machines used lasers Cheaper– now many universities can afford sequencers DNA Extraction The process of separating pure genomic DNA from the rest of the contents of cells and tissues Steps: Lysing cells (breaking them to get to DNA) Removing contaminants from DNA (proteins, RNA, lipids, etc.) Pelleting DNA (precipitating and compacting it to separate it from everything else) Washing away solutions used to purify DNA Genome Sequencing The process of determining the nucleotide order of a specific genome DNA extraction DNA prep Tagmentation, amplification, etc. Run on a sequencer Alignment and re-assembly Genome Sequencing Harmonia Why we sequenced it: What we used to sequence it: To better understand the insect and other beetles close to it G Biosciences DNA extraction/prep kits Illumina sequencer (MiSeq) Blue Pippin to run gels and size selections QuBit to measure DNA concentrations in samples Genome had very low diversity; difficult to sequence May be due to transposon activity/repetitive elements in the genome Steps of Sequencing DNA Extraction Harmonia pupa homogenized Proteinase K added Precipitates waste Isopropanol added Strips DNA of any more waste Precipitation Solution added Precipitates waste from DNA DNA Stripping Solution added Breaks down proteins surrounding the DNA (purifies) Chloroform added Lyse cells reach DNA inside Precipitates DNA so it can be separated from other parts of mixture Ethanol wash Washes DNA to further purify (remove excess salt) Steps of Sequencing Paired End Prep Followed Nextera XT DNA Prep Kit (Illumina, San Diego, CA) Tagmentation PCR Amplification DNA is “amplified” in a polymerase chain reaction Amplification: DNA is replicated many times over so the sequencer can read it PCR Clean-up DNA is fragmented and “tagged” (adapters added to DNA ends) allows DNA to be PCR amplified DNA is purified using AMPure Beads (unusable bits of DNA are washed out) Library Normalization Makes sure that the DNA quantities from each sample are equal in the final pooled library Steps of Sequencing Mate Pair Prep Followed the Nextera Mate Pair DNA Prep Kit (Illumina, San Diego, CA). Two versions of the mate pair were run Gel-plus/size selection Used a Blue Pippin Prep machine (Sage Sciences, Beverly, MA) Yielded fragments 10kb-17kb Gel-free Yielded 3kb-15kb fragments Steps of Sequencing Mate Pair Prep: Gel-Free Tagmentation Strand Displacement Reaction Polymerase is used to fill gaps in DNA caused by tagmentation AMPure Purification Usable DNA binds to AMPure Beads; anything unwanted in the solution, including small DNA fragments, is washed away Steps of Sequencing Mate Pair Prep: Gel-Free Circularization Fragments ligation are circularized with blunt-ended Exonuclease Digestion Any remaining linear DNA is broken down, removed from the circularized fragments Fragmentation of Circularized Fragments Circularized DNA is sheared to smaller fragments by sonication Steps of Sequencing Mate Pair Prep: Gel-Free Purification of Mate Pair Fragments Usable DNA fragments bind to streptavidin beads; everything else is washed away Usable DNA= fragments containing biotinylated adapters End Repair/A-Tailing Overhangs from DNA shearing are blunted 3’ overhangs are removed; 5’ are filled in with polymerase An ‘A’ nucleotide is added to the 3’ ends Steps of Sequencing Mate Pair Prep: Gel-Free Adapter Ligation Indexing adapters are added to the ends of the fragments Contain a ‘T’ nucleotide that ligates to the ‘A’ tail Prepares the fragments for amplification and flow cell hybridization PCR Amplification PCR Clean-up Steps of Sequencing Mate Pair Prep: Gel-Plus Tagmentation Strand Displacement Reaction AMPure Purification Size Selection Used a Blue Pippin Prep machine (Sage Sciences, Beverly, MA) Specific range of DNA fragment sizes are chosen and separated from rest of DNA 10-17kb Circularization Exonuclease Digestion Fragmentation of Circularized Fragments Purification of Mate Pair Fragments End Repair/A-Tailing Adapter Ligation PCR Amplification PCR Clean-up Steps of Sequencing Sequencing Paired Ends Sample was diluted with hybridization buffer and paired-end sequenced in the MiSeq 2x250 run Sequencer reads 250bp at a time Run yielded poor-quality data (low diversity) Spiked with PhiX, re-run Steps of Sequencing Sequencing Mate Pairs Gel and non-gel libraries diluted to 2 nM with Tris-Cl 10 mM, pH 8.5 with 0.1 Tween 20 2nM of DNA from each library was pooled Pooled library was diluted with 0.2N NaOH and hybridization buffer Mixture was diluted again with hybridization buffer Placed on the MiSeq for mate pair sequencing Run yielded poor-quality data (low diversity) Sample was spiked with PhiX, re-run Assembly First, DNA quality is charted and basic stats are reviewed (FastQC) Use charts to find which bases to trim Trim first and last bases (bad quality– unusable) Aligned reads to reference genome (or similar genome in de novo assembly) in BWA (Burrows-Wheeler Aligner) BWA output files are imported into Integrative Genome Viewer (IGV) Overlaps in read sequences allow whole genome to be re-assembled IGV: viewing depth of coverage and fragment lengths Paired ends give 100x coverage Mate pairs provide scaffold Results H. axyridis genome is about 300 million bp long After trimming, we ended with… 628,908 paired end reads 4,038,064 singletons 1,454,689 mate pair reads (non-gel) 199,700 mate pair reads (gel) Low diversity suggests transposon activity in genome Genome full of long ‘A’ sequences Acknowledgements Thanks to the W. Harry Feinstone Center for Genomic Research and especially the Sutter Lab for allowing me to intern with them. Special thanks to Dr. Shirlean Goodwin, Dr. Thomas Sutter, and Dr. Michael Dickens for their help during my time in the lab. Disclaimer This is an informal presentation; information taken from various print and Internet sources Images are not mine Google, Illumina Technologies