Supporting Information

advertisement
1
Supporting Information
2
Table S1. Sequencing Assay Attributes
Sequencing Field Name
Assay Field
ID
SA1
Sample ID Sequencing
Facility
SA2
Nucleic Acid
Extraction
Method
Data
Categories
Description
Sample
Shipment
Unique identifier used by the relevant http://purl.obolibrary.org/
sequencing center to identify the sample obo/OBI_0001901
submitted by the sample provider.
Experimental procedure used to derive http://purl.obolibrary.org/ sample
the nucleic acid fraction from the
obo/OBI_0666667
material
submitted sample used for the
processing
sequencing reaction.
Details about the preparation of DNA http://purl.obolibrary.org/ sample
samples for sequencing including if
obo/OBI_0001902
material
amplification was used (e.g., in the case
processing
of sequencing a single mosquito), and
any other relevant molecular biology
protocols done prior to sequencing.
Experimental procedure used to derive http://purl.obolibrary.org/ sequencing
sequence data from the input assay
obo/OBI_0600047
method
sample including both method and
device. Type of sequencing used based
on approach (pyrosequencing) and
technology (454).
http://purl.obolibrary.org
/obo/OBI_0001948
Computational algorithm used to
http://purl.obolibrary.org/ assembly
assemble individual sequence reads into obo/OBI_0001522
larger contigs. Assembly details
including but not limited to assembler
type (overlap-layout-consensus,
deBrujn), assembler version, any
relevant QC information such as %
known genes/ESTs captured.
Depth of sequence coverage based both http://purl.obolibrary.org/ finishing
on external (e.g. Cot-based size
obo/OBI_0001618
strategy
estimates) and internal (average
coverage in the assembly) measures of
genome size.
The name of the responsible person,
http://purl.obolibrary.org
group or institution providing the set of /obo/OBI_0001947
annotated features for a genome
sequences that is submitted to a resource
such as GenBank.
The names and versions of the software http://purl.obolibrary.org/
and databases used in creating the set of obo/OBI_0001944
annotated features that is submitted to a
resource such as GenBank.
Unique identifier of the submitted
http://purl.obolibrary.org/
GenBank sequence record(s).
obo/OBI_0001614
Sequencing
Sample
Preparation
SA3
Nucleic Acid Sequencing
Preparation Sample
Method
Preparation
SA4
Sequencing Sequencing
Technology Assay
SA5
Assembly
Name
Assembly
Method
Data
Transformation
Data
Transformation
SA7
Genome
Coverage
Data
Transformation
SA8
Annotation
Provider
Data
Transformation
SA9
Annotation
Method
Data
Transformation
SA10
GenBank
Record ID
Data
Transformation
SA6
OBO Foundry URL MIxS
Equivalent
3
4
1
1
Table S2. Bacterial Pathogen-Specific Attributes
Pathogen Field Name
Specific
Field ID
BAC1
Bacteria
Antibiotic
Sensitivity
BAC2
BAC3
BAC4
BAC5
BAC6
BAC7
Data
Categories
Description
OBO Foundry URL BioSample MIxS
Synonym Synonym
Pathogen
Results of tests for
http://purl.obolibrary.
Characteristic antibiotic resistance,
org/obo/IDO_000047
usually measured in
0
minimum inhibitory
concentration (MIC).
Format: name of an
antibiotic followed by
'MIC' other name of other
metric and a measure of
the quantity of antibiotic
in ug/ml
Bacteria
Pathogen
Commonly used
Biovar
Characteristic descriptor of
distinguishing physical or
biochemical
characteristics of a
bacterial population.
Bacteria
Pathogen
Number of chromosomes http://purl.obolibrary.
Chromosome Characteristic in bacteria.
org/obo/GO_000569
Content
4
Bacteria Extra Pathogen
Number of
http://purl.obolibrary.
Chromosomal Characteristic extrachromosomal
org/obo/OBI_000043
Elements
elements in the organism. 0
Bacteria
Pathogen
Commonly used
pathovar
Pathovar
Characteristic descriptor of
distinguishing physical or
biochemical
characteristics of a
bacteria.
Bacteria
Pathogen
Serotype of the bacteria
serovar
Serotype
Characteristic identified in the isolate
sample. This is an identity
determined by the data
generated by the GSCID
Bacteria
Pathogen
Experimental technique
Serotyping
Characteristic used to determine the
Method
serotype of the pathogen
species in the isolate
sample
2
3
2
encoded traits
subspecific
genetic
lineage
number of
replicons
extrachromosomal
elements
subspecific
genetic
lineage
subspecific
genetic
lineage
1
Table S3. Eukaryotic Pathogen- and Vector-Specific Attributes
Pathogen Field Name
Specific
Field ID
CE1
Umbrella project ID(s)
Data
Categories
OBO Foundry URL
Investigation
http://purl.obolibrary.org/
obo/OBI_0001628
CE2
Intended Sequence Repository(s) Investigation
CE3
Submitter Name
Investigation
CE4
subspecies/Subtype
CE5
Common name
Pathogen
Characteristic
Pathogen
Characteristic
CE6
CE17
Individuals (Number of males
and females)
Isolation, sampling or growth
conditions (xenic/axenic
culture,abcess aspirates,cysts)
Co-isolated organisms (in case
this is a mixed culture)
Developmental Growth Stage
(ie. sporozoite, male/female, or
mixture of stages?)
Date of sample collection for
shipment to genomic sequencing
center
Host Additional Classification genotype
Host Additional Classification Strain
Host Additional Classification subtype
Development stage
Ploidy (ie. haploid, diploid,
allopolyploid, polyploid or 1N,
2N, 3N etc...)
Number of replicons
(chromosomes or segments)
Genome size estimate
CE18
Nucleic Acid Extraction Date
CE19
Extrachromosomal elements
CE20
Quantification (host/parasite;
concentration and vol provided)
Relevant Standard Operating
Procedures (SOPs)
Assembled genome size
Relevant electronic resources
Number of assembled
contigs/scaffolds
CE7
CE8
CE9
CE10
CE11
CE12
CE13
CE14
CE15
CE16
CE21
CE22
CE23
CE24
BioSample MIxS
Synonym Synonym
submitted_to_
insdc
http://purl.obolibrary.org/
obo/OBI_0000068
Specimen
Isolation
Specimen
Isolation
Specimen
Isolation
Host
Classification
Host
Classification
Host
Classification
http://purl.obolibrary.org/
obo/OBI_0001305
Pathogen
Characteristic
http://purl.obolibrary.org/
obo/PATO_0001374
ploidy
Pathogen
Characteristic
Pathogen
Characteristic
Sequencing
Sample
Preparation
Pathogen
Characteristic
num_replicons
Data
Transformation
sop
estimated_size
extrachrom_
elements
url
Data
Transformation
3
1
Table S4. Project Specific Attributes
ProjectSpecific
Field ID
PS1
PS2
PS3
PS4
PS5
Field Name
Isolate
Host Disease
Stage
Host Disease
Outcome
Host
Description
Specimen
Voucher
NCBI
Component
Name
BioSample
BioSample
BioSample
BioSample
NCBI
Component
Synonym
isolate
host_disease_
stage
host_disease_
outcome
host_description
BioSample
specimen_
voucher
PS6
PS7
Genotype
Serotype
BioSample
BioSample
genotype
serotype
PS8
Serovar
BioSample
serovar
PS9
Pathotype
BioSample
pathotype
PS10
BioSample
passage_history
PS11
Passage
History
Lab Host
BioSample
lab_host
PS12
Subgroup
BioSample
subgroup
PS13
Subtype
BioSample
subtype
NCBI Component Definition
Stage of disease at the time of sampling
Final outcome of disease, e.g., death, chronic
disease, recovery
Additional information not included in other
defined vocabulary fields
Formal identifier of the Type Specimen of
the source organism, usually stored in an
institute collection. Name of the Institution
and their internal Collection and/or sample
codes.
Observed genotype
Taxonomy below subspecies; a variety (in
bacteria, fungi or virus) usually based on its
antigenic properties. Same as serovar and
serogroup. e.g. serotype="H1N1" in
Influenza A virus CY098518.
Taxonomy below subspecies; a variety (in
bacteria, fungi or virus) usually based on its
antigenic properties. Same as serovar and
serotype. Sometimes used as species
identifier in bacteria with shaky taxonomy,
e.g. Leptospira, serovar saopaolo S76607
(65357 in Entrez).
Some bacterial specific pathotypes (example
Escherichia coli - STEC, UPEC)
Number of passages and passage method
Host on which a parasite is maintained in the
lab, which may not be the same as the natural
host (example hamster cells used to support a
parasite normally found in mouse in the
wild). Also used to list the bacterial strain in
which a plasmid construct library is
maintained in the lab (e.g. ID:167152 labhost DH10B T1-resistant).
Taxonomy below subspecies; sometimes
used in viruses to denote subgroups taken
from a single isolate.
Used as classifier in viruses (e.g. HIV type 1,
Group M, Subtype A).
2
3
4
4
Download