Comparative Genomics Ben Dan Deepak Esha Kelly Pramod Raghav Smruthy Vartika Will Background Check Vibrio navarrensis An aquatic bacterium First isolated from sewage in Navarra, Spain in 1982 Gram negative Non-spore forming rods Motile by means of single polar flagellum Questions to be Addressed 1. Sixteen strains clustered with V. navarrensis type strain LMG15976 • • • 16S rRNA, pyrH, recA and rpoA Four formed a distinct cluster V. vulnificus Closest relative to both lineages of V. navarrensis “Is it a different species or biotype?” 2. V. navarrensis strains isolated from various sources. • nav_2423 (VN1) : Blood • nav_2462 (VN2) : Surface Wound • nav_2541 (VN3) : Sewage • nav_2756 (VN4) : Water “Is Vibrio navarrensis pathogenic?” Red and blue indicate an available genome sequence. Red indicates it was isolated in blood; blue indicates it was isolated in an environmental setting (water or sewage) 75 2421-86 55 08-2466 1397-6T 76 98 48 Vibrio navarrensis LMG 15976T 2544-86 66 2232 or 2541-90 55 2422-86 0053-83 99 99 2578-87 L1 08-2461 48 Vibrio navarrensis 08-2462 38 48 AM 37820 08-2467 99 2462-79 30 60 AM 36848 31 2543-80 99 2481-86 1048-83 2756-81 99 44 54 2538-88 L2 2423-01 Vibrio vulnificus LMG 13545T Vibrio vulnificus CMCP6 99 99 Vibrio vulnificus Vibrio vulnificus YJ016 0.01 Concatenated pryH ,recA,rpoA; 16S was not used Neighbor-joining method , Kimura2P, pairwise deletion and 1000 interior branch tests. 1443 nt total pyrH (321nt), recA(606 nt), and rpoA(516 nt) New Species?? Strategy for Defining/Distinguishing Species • ANI (average nucleotide identity) • Robustly assessing phylogenetic relationships between strains – Supertree approach – Supermatrix approach • If there is interest: – Genes under positive selection (DN/DS) – Rates of Divergence Old School Method for Defining Species DNA / DNA Hybridization – Tedious, hard to have good reproducibility – Coherent group of strains sharing > 70% DDH considered a species • Still need to have a phenotype associated with the group Genomics Approach to DDH • Developed by Dr. Konstantinidis – (Konstantinidis and Tiedje et al. IJSEM, 2005) • We’re employing a modified version of his script for whole genome ANI comparisons Original Script: – Takes two genomes as input – Parses genomes into 1kb fragments, and uses blastn to find reciprocal orthologs – Takes average nucleotide idenity (ANI) for all reciprocal orthologs for each pair of draft genomes • Coherent groups sharing – >95% Same Species – <95% to sister group/subgroup Candidate New Species Whole Genome Tree A. First required identification of all orthologous proteins common to all strains (should we exclude VN2?) Perl script: uses reciprocal blastp, keeps top hit, >70% length of reference genes, >40% ID • Outputs a file that can be used for interograting presence/absense of metabolic/virulence genes later on OrthoMCL • Genome scale algorithm for grouping orthologous protein sequences B. Align all orthologous genes Clustal Muscle C. Supertree approach Generally considered more robust and allows further investigation of HGT Make separate tree for each gene, find consensus tree D. Supermatrix approach Concatenate all alignments Generate tree Tree Building Approaches • Neighbor-joining – Fast, decently robust when bootstrapping • Utilizing complex substitution models – Maximum Likelihood – Bayesian Analysis – Computationally demanding, thought to do better with missing data, generally work better for divergent organisms. To publish we’ll probably need to generate one of these trees to confirm NJ topology Tree Building Software • MEGA – Easy to use GUI – Not very customizable, but very quick • PHYLIP – Command-line based – Very customizable • PAUP – Command-line – Customizable • Mr Bayes – Uses MCMC to generate bayesian trees – Has >11,000 citations… Strategy for Defining Species Draft Genome Gene Predictions Translated Genes Custom Script ANI Dendrogram ANI Identifying Core Genome Dendrogram Multiple Alignment Super Matrix Super Tree MEGA New Species?? OrthoMCL PHYLIP PAUP Mr Bayes Consensus Tree ClustalΩ MUSCLE PATHOGEN?? Pathogenecity Challenges: 1. Well known databases and tools are lacking a complete list of virulence factors. 2. Non-human pathogenic Vibrios are sometimes pathogenic in their marine hosts. As a result, some non-human pathogenic Vibrios share virulence factors with the human pathogenic Vibrios. 3. The plasticity of Vibrio genome: Many virulence factors are present in mobile elements and they can be shared through Horizontal Gene Transfer (HGT) . Hence, its difficult to draw a line between pathogenic (to humans) and non-pathogenic Vibrios. Types of Infection 1. Gastroenteritis 2. Septicemia 3. Wound Infection Association of Vibrio species with different clinical symptoms Vibrio sp. Wound Infection Gastroenteritis Septicemia Vibrio cholerae O1 ** Vibrio cholerae non O1 * ** * Vibrio parahemolyticus * ** (*) Vibrio vulnificus ** * ** Vibrio mimicus (*) * (*) Vibrio alginolyticus ** * Vibrio fluvialis (*) ** (*) Photobacterium damsela ** Grimontia hollisae (*) ** (*) Vibrio furnissi ** Alivibrio fischeri Vibrio splendidus Vibrio harveyi Vibrio anguillarum * Less common presentation, ** common presentation, (*) rare presentation Pathogenic Potentially Pathogenic Non-Pathogenic (Daniels et al., 2000) Genomic Islands • Discrete DNA segments differing between closely related bacterial strains Usually some past or present mobility is attributed. • Why of our interest?? Virulence factors are often associated with GEIs!! • Features of GEIs: GEIs are relatively large segments of DNA, usually between 10 and 200 kb detected by comparisons among closely related strains. GEIs may be recognized by nucleotide statistics that usually differ from the rest of the chromosome, such as 1. GC content 2. Cumulative GC skew 3. Codon usage GEIs are often inserted at tRNA genes. • GEIs are often flanked by 16– 20bp perfect or almost perfect direct repeats (DR) • GEIs often harbor functional or cryptic genes encoding integrases or factors related to plasmid conjugation systems or phages involved in GEI transfer. • GEIs often carry insertion elements or transposons • GEIs often carry genes offering a selective advantage for host bacteria. According to their gene content, GEIs are often described as pathogenicity, symbiosis, metabolic, fitness or resistance islands. General features of GEIs (Mario Juhas et al., 2009) Integration, development and excision of GEIs. (Mario Juhas et al., 2009) Virulence Factors in Vibrio Colonization Immunosuppression Immunoevasion Virulence factors Obtaining nutrition from host Entry into and Exit out of the cell Original strategy (proposed by Lee Katz) • Possible ways – Discover homologous genes to other Vibrio virulence factors (esp. V. vulnificus) – Uncover genes that appear in closely-related pathogenic species but do not appear in closely-related non-pathogenic species Sweet spot V. navarrensis V. pathogenic V. nonpathogenic Distribution of virulence-associated orthologous groups across eleven Vibrionaceae genomes (Lilburn et al., 2010) Virulence-Related Protein Collection NMPDR + VFDB Literature Survey PAI DB MvirDB Strategy to Determine Pathogenecity • Checking for Presence/Absence of – Toxins – Adherence factors • Type IV pilus system – Secretion systems – Siderophores Strategy for Pathogenicity Annotated Dataset Presence Absence Existence of Toxins Machinery for Incorporation (Pili/Adherence Factors) Machinery for Incorporation (Pili/Adherence Factors) Yes Correlation with Pathway (KEGG) Connecting the dots Pathogenic or Putatively Pathogenic Potentially Pathogenic No Unlikely Pathogenic Road ahead • Environment v/s clinical strains comparison • All v/s All within our nine strains • Core genes v/s best genes trees (Morrison et al., 2012)