1 Supporting Information 2 Table S1. Sequencing Assay Attributes Sequencing Field Name Assay Field ID SA1 Sample ID Sequencing Facility SA2 Nucleic Acid Extraction Method Data Categories Description Sample Shipment Unique identifier used by the relevant http://purl.obolibrary.org/ sequencing center to identify the sample obo/OBI_0001901 submitted by the sample provider. Experimental procedure used to derive http://purl.obolibrary.org/ sample the nucleic acid fraction from the obo/OBI_0666667 material submitted sample used for the processing sequencing reaction. Details about the preparation of DNA http://purl.obolibrary.org/ sample samples for sequencing including if obo/OBI_0001902 material amplification was used (e.g., in the case processing of sequencing a single mosquito), and any other relevant molecular biology protocols done prior to sequencing. Experimental procedure used to derive http://purl.obolibrary.org/ sequencing sequence data from the input assay obo/OBI_0600047 method sample including both method and device. Type of sequencing used based on approach (pyrosequencing) and technology (454). http://purl.obolibrary.org /obo/OBI_0001948 Computational algorithm used to http://purl.obolibrary.org/ assembly assemble individual sequence reads into obo/OBI_0001522 larger contigs. Assembly details including but not limited to assembler type (overlap-layout-consensus, deBrujn), assembler version, any relevant QC information such as % known genes/ESTs captured. Depth of sequence coverage based both http://purl.obolibrary.org/ finishing on external (e.g. Cot-based size obo/OBI_0001618 strategy estimates) and internal (average coverage in the assembly) measures of genome size. The name of the responsible person, http://purl.obolibrary.org group or institution providing the set of /obo/OBI_0001947 annotated features for a genome sequences that is submitted to a resource such as GenBank. The names and versions of the software http://purl.obolibrary.org/ and databases used in creating the set of obo/OBI_0001944 annotated features that is submitted to a resource such as GenBank. Unique identifier of the submitted http://purl.obolibrary.org/ GenBank sequence record(s). obo/OBI_0001614 Sequencing Sample Preparation SA3 Nucleic Acid Sequencing Preparation Sample Method Preparation SA4 Sequencing Sequencing Technology Assay SA5 Assembly Name Assembly Method Data Transformation Data Transformation SA7 Genome Coverage Data Transformation SA8 Annotation Provider Data Transformation SA9 Annotation Method Data Transformation SA10 GenBank Record ID Data Transformation SA6 OBO Foundry URL MIxS Equivalent 3 4 1 1 Table S2. Bacterial Pathogen-Specific Attributes Pathogen Field Name Specific Field ID BAC1 Bacteria Antibiotic Sensitivity BAC2 BAC3 BAC4 BAC5 BAC6 BAC7 Data Categories Description OBO Foundry URL BioSample MIxS Synonym Synonym Pathogen Results of tests for http://purl.obolibrary. Characteristic antibiotic resistance, org/obo/IDO_000047 usually measured in 0 minimum inhibitory concentration (MIC). Format: name of an antibiotic followed by 'MIC' other name of other metric and a measure of the quantity of antibiotic in ug/ml Bacteria Pathogen Commonly used Biovar Characteristic descriptor of distinguishing physical or biochemical characteristics of a bacterial population. Bacteria Pathogen Number of chromosomes http://purl.obolibrary. Chromosome Characteristic in bacteria. org/obo/GO_000569 Content 4 Bacteria Extra Pathogen Number of http://purl.obolibrary. Chromosomal Characteristic extrachromosomal org/obo/OBI_000043 Elements elements in the organism. 0 Bacteria Pathogen Commonly used pathovar Pathovar Characteristic descriptor of distinguishing physical or biochemical characteristics of a bacteria. Bacteria Pathogen Serotype of the bacteria serovar Serotype Characteristic identified in the isolate sample. This is an identity determined by the data generated by the GSCID Bacteria Pathogen Experimental technique Serotyping Characteristic used to determine the Method serotype of the pathogen species in the isolate sample 2 3 2 encoded traits subspecific genetic lineage number of replicons extrachromosomal elements subspecific genetic lineage subspecific genetic lineage 1 Table S3. Eukaryotic Pathogen- and Vector-Specific Attributes Pathogen Field Name Specific Field ID CE1 Umbrella project ID(s) Data Categories OBO Foundry URL Investigation http://purl.obolibrary.org/ obo/OBI_0001628 CE2 Intended Sequence Repository(s) Investigation CE3 Submitter Name Investigation CE4 subspecies/Subtype CE5 Common name Pathogen Characteristic Pathogen Characteristic CE6 CE17 Individuals (Number of males and females) Isolation, sampling or growth conditions (xenic/axenic culture,abcess aspirates,cysts) Co-isolated organisms (in case this is a mixed culture) Developmental Growth Stage (ie. sporozoite, male/female, or mixture of stages?) Date of sample collection for shipment to genomic sequencing center Host Additional Classification genotype Host Additional Classification Strain Host Additional Classification subtype Development stage Ploidy (ie. haploid, diploid, allopolyploid, polyploid or 1N, 2N, 3N etc...) Number of replicons (chromosomes or segments) Genome size estimate CE18 Nucleic Acid Extraction Date CE19 Extrachromosomal elements CE20 Quantification (host/parasite; concentration and vol provided) Relevant Standard Operating Procedures (SOPs) Assembled genome size Relevant electronic resources Number of assembled contigs/scaffolds CE7 CE8 CE9 CE10 CE11 CE12 CE13 CE14 CE15 CE16 CE21 CE22 CE23 CE24 BioSample MIxS Synonym Synonym submitted_to_ insdc http://purl.obolibrary.org/ obo/OBI_0000068 Specimen Isolation Specimen Isolation Specimen Isolation Host Classification Host Classification Host Classification http://purl.obolibrary.org/ obo/OBI_0001305 Pathogen Characteristic http://purl.obolibrary.org/ obo/PATO_0001374 ploidy Pathogen Characteristic Pathogen Characteristic Sequencing Sample Preparation Pathogen Characteristic num_replicons Data Transformation sop estimated_size extrachrom_ elements url Data Transformation 3 1 Table S4. Project Specific Attributes ProjectSpecific Field ID PS1 PS2 PS3 PS4 PS5 Field Name Isolate Host Disease Stage Host Disease Outcome Host Description Specimen Voucher NCBI Component Name BioSample BioSample BioSample BioSample NCBI Component Synonym isolate host_disease_ stage host_disease_ outcome host_description BioSample specimen_ voucher PS6 PS7 Genotype Serotype BioSample BioSample genotype serotype PS8 Serovar BioSample serovar PS9 Pathotype BioSample pathotype PS10 BioSample passage_history PS11 Passage History Lab Host BioSample lab_host PS12 Subgroup BioSample subgroup PS13 Subtype BioSample subtype NCBI Component Definition Stage of disease at the time of sampling Final outcome of disease, e.g., death, chronic disease, recovery Additional information not included in other defined vocabulary fields Formal identifier of the Type Specimen of the source organism, usually stored in an institute collection. Name of the Institution and their internal Collection and/or sample codes. Observed genotype Taxonomy below subspecies; a variety (in bacteria, fungi or virus) usually based on its antigenic properties. Same as serovar and serogroup. e.g. serotype="H1N1" in Influenza A virus CY098518. Taxonomy below subspecies; a variety (in bacteria, fungi or virus) usually based on its antigenic properties. Same as serovar and serotype. Sometimes used as species identifier in bacteria with shaky taxonomy, e.g. Leptospira, serovar saopaolo S76607 (65357 in Entrez). Some bacterial specific pathotypes (example Escherichia coli - STEC, UPEC) Number of passages and passage method Host on which a parasite is maintained in the lab, which may not be the same as the natural host (example hamster cells used to support a parasite normally found in mouse in the wild). Also used to list the bacterial strain in which a plasmid construct library is maintained in the lab (e.g. ID:167152 labhost DH10B T1-resistant). Taxonomy below subspecies; sometimes used in viruses to denote subgroups taken from a single isolate. Used as classifier in viruses (e.g. HIV type 1, Group M, Subtype A). 2 3 4 4