version June 2008 - ISA tools

advertisement
ISA-TAB specification v0.3 http://isatab.sourceforge.net
Specification documentation for the ISA-TAB v0.3
Status
ISA-TAB v0.3 specification document working draft: version 12 June 2008.
Authors
Initially drafted by one group (1), this document and the work described herein now incorporate input from many
contributors, listed at the ISA-TAB project page (2) under ‘Contributors’.
Documentation and contacts
The current document and example files are available from the ISA-TAB project page (2). A mailing list has been
made available to facilitate discussion. Comments and suggestions on this document should be addressed to
Philippe Rocca-Serra (rocca@ebi.ac.uk), Susanna-Assunta Sansone (sansone@ebi.ac.uk) and the mailing list.
Abstract
This document describes ISA-TAB, a proposal addressing the pressing need of a group of collaborating
repositories for a common structured representation with which communicate the metadata required to interpret
an experiment and the associated data files. Sections from 1 to 4 introduce the ISA-TAB proposal, the rationale
behind its development, an overview of the structure and the relation to other formats; while section 5 describe
the specification in details. ISA-TAB builds on the existing paradigm that is MAGE-TAB - a tab-delimited
format to exchange microarray data. ISA-TAB v0.3 is largely the result of aligning MAGE-TAB v1.1 and the
previous ISA-TAB (v0.2) format. ISA-TAB v0.3 necessarily maintains backward compatibility with existing
MAGE-TAB files to facilitate adoption; it preserves the simplicity of the MAGE-TAB for simple experimental
designs, while incorporating new features to capture the full complexity of experiments employing a combination
of technologies, where this is necessary. Like MAGE-TAB before it, ISA-TAB is simply a format; the decision
on how to regulate its use (i.e. enforcing completion of mandatory fields or use of a controlled terminology) is left
to individual implementers.
N.B. Before reading this document, an in-depth knowledge of the MAGE-TAB specification is required (3).
1 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Table of Contents
1. ISA-TAB introduction .........................................................................................................................................3
1.1 Rationale ........................................................................................................................................................3
1.3 Definitions .......................................................................................................................................................3
2. ISA-TAB v0.3 structure overview ......................................................................................................................4
2.1 Investigation file ..............................................................................................................................................4
2.2 Study file .........................................................................................................................................................4
2.3 Assay file ........................................................................................................................................................4
3. Relating ISA-TAB to other formats and requirements ....................................................................................5
3.1 Commonalities and difference with MAGE-TAB v1.1 ....................................................................................5
3.2 Relation to FuGE: rendering FuGE in ISA-TAB via XSL transformations ......................................................5
3.3 Relation to biomedical tabular formats ...........................................................................................................5
3.4 Minimal content and terminology ...................................................................................................................5
4. ISA-TAB development process .........................................................................................................................6
4.1 Communities and workshops .........................................................................................................................6
5. ISA-TABv0.3 detailed structure .........................................................................................................................6
5.1 Investigation file ..............................................................................................................................................6
5.1.1 Ontology source section ..............................................................................................................................9
5.1.2 Investigation section ....................................................................................................................................9
5.1.3 Study section ...............................................................................................................................................9
5.1.4 Study Factor section................................................................................................................................. 10
5.1.5 Study Assay section ................................................................................................................................. 10
5.1.6 Protocol section ........................................................................................................................................ 10
5.1.7 Study Contact section .............................................................................................................................. 11
5.2 Study file ...................................................................................................................................................... 12
5.3 Assay file ..................................................................................................................................................... 13
5.3.1 Technology Type: DNA microarray hybridization ..................................................................................... 13
5.3.2 Technology Type: Gel electrophoresis ..................................................................................................... 14
5.3.3 Technology Type: Mass Spectrometry (MS) ............................................................................................ 16
5.3.4 Technology Type: Nuclear Magnetic Resonance spectroscopy (NMR) .................................................. 17
5.3.5 Technology Type: High throughput sequencing ...................................................................................... 17
6. References ....................................................................................................................................................... 19
Figure 1 - ISAarchive ........................................................................................................................................... 20
Table 2 – Type of value in the Investigation File .............................................................................................. 21
Table 3 – Number of value in the Investigation File ......................................................................................... 23
Table 4- Node headers in the Study and Assay files ....................................................................................... 25
Table 5 – Attribute headers in the Study and Assay files ............................................................................... 26
2 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
1. ISA-TAB introduction
1.1 Rationale
The Investigation / Study / Assay (ISA) tab-delimited (TAB) format is a general purpose framework with which to
communicate both metadata (i.e., contact details, sample characteristics, technologies used, etc.) and data from
(meta)genomics-, transcriptomics-, proteomics- and metabol/nomics-based experiments (hereafter referred as
‘omics-based’ experiments). The general motivation for this work stems from the needs of:
 the BioInvestigation Index project at EBI (4) to create a common structured representation of the
metadata that is required to interpret an experiment with which to facilitate combined submission to
ArrayExpress (5), Pride (6), and in the near future, a metabolomics repository;
 a group of collaborative systems (7,8) either committed to pipelining omics-based experimental data into
EBI public repositories or willing to exchange data among themselves, or to enable their users to import
data from public repositories into their local systems.
ISA-TAB could be viewed as an extended version of the MicroArray Gene Expression (MAGE) TAB, a format
that supports the management, exchange and submission of microarray-based experiment data and metadata
(9). MAGE-TAB was designed for use by laboratories with little or no bioinformatics support, who cannot
therefore deal with the complexity of the MAGE Markup Language (ML) format (10). ISA-TAB builds on the
MAGE-TAB paradigm, and shares its motivation for the use of tab-delimited text files; i.e., that they can easily be
created, viewed and edited by researchers, using spreadsheet software such as Microsoft Excel. ISA-TAB also
employs MAGE-TAB syntax as far as possible, to ensure backward compatibility with existing MAGE-TAB files,
a described further in section 3.1. However, ISA-TAB has a number of additional features that make it a more
general framework that can comfortably accommodate multi-domain experimental designs. It was also important
to align the concepts in ISA-TAB with some of the objects in the Functional Genomics Experiment (FuGE) model
(11, 12), partly as the FuGE model is integral to the development of MAGE-ML v2. These additional features are
described further in section 3.2.
In particular, the rationale behind ISA-TAB’s extended structure is built on the need to capture the complexity of
an experiment employing a combination of technologies. For example, experiments looking at changes both in
(i) the metabolite profile of urine, and (ii) gene expression in the liver, in subjects treated with a compound
inducing liver damage (by mass spectrometry and microarray technologies, respectively).
1.3 Definitions
Investigation, Study and Assay are the three key entities (9) around which the ISA-TAB framework is built. They
assist in structuring and classifying information relevant to the subject under study and the different technologies
employed. Note that ‘subject’ as used above could to refer inter alia to an organism, or tissue, or an
environmental sample. Study is the central unit, containing information on the subject under study, its
characteristics and any treatments applied.
A Study has associated Assays. These are tests performed either on material taken from the subject, (hereafter
referred to as a ‘specimen’) or on the whole initial subject, which produce qualitative or quantitative
measurements (data). Assays can be characterized as the smallest complete unit of experimentation; i.e. one
hybridization equals one assay; each technical replicate represents an additional assay; one LC-MS run equals
one assay; a single clinical chemistry assay is (of course) one assay; a multiplexed (^n) microarray equals n
assays; and a MALDI MS chip with n spots could perform up to n assays (i.e. all spots analyzed). Investigation is
a higher-order object that helps to group related Studies.
It should be noted that the word ‘experiment’ has been deliberately avoided. A comparison of ArrayExpress and
Pride revealed that ‘experiment’ is used to refer to objects at different levels of granularity in each; i.e. to refer to
a set of multiple related hybridizations in ArrayExpress, but only a single gel-based separation run in Pride.
Following the abstractions proposed here, an experiment in ArrayExpress would be equivalent to a Study.
The choice of Study as the central unit of the ISA-TAB proposal is supported by its use in existing biomedical
formats, such as the Study Data Tabulation Model (SDTM), which encompasses both the Standard for
Exchange of Nonclinical Data (SEND, 13) and the Clinical Data Interchange Standards Consortium (CDISC, 14).
SDTM has been endorsed by the US Food and Drug Administration (FDA) as the preferred way to organize,
structure and format both clinical and nonclinical (toxicological) data submissions (15, 16). See also section 3.3.
3 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
2. ISA-TAB v0.3 structure overview
The ISA-TAB uses three types of files to capture the experimental metadata:
 Investigation file
 Study file
 Assay file (with associated data files)
The Investigation file contains all the information needed to interpret the experiment, while the experimental
steps (or sequence of events) are described in the Study and in the Assay file(s). For each Investigation file
there will be one or more Study files; for each Study file there will be one or more Assay files. For submission or
transfer, files can be packaged into an ISArchive as shown in Figure 1. Each file has a predefined structure,
with fields being organized on a per-column or per-row basis; each file is described briefly in the subsections
below and more fully in the section 5.
2.1 Investigation file
In this file the fields are organized on a per-column basis and divided in sections. The Investigation file is
intended to meet three needs: (1) to record all declarative information referenced in the other files; (2) to relate
Assay files to Study files; and optionally, (3) to relate Study file to an Investigation (this only becomes necessary
when two or more Study files need to be grouped). The declarative sections cover general information such as
contacts, protocols and equipment, and also –where applicable- the description of terminologies (controlled
vocabularies or ontologies) and other annotation resources that were used. The optional Investigation section is
a flexible solution to group two or more Study files, as required by several use cases. In the toxicogenomics
domain, for example, acute toxicity studies are followed by long term toxicity studies and in vitro toxicity studies.
For clarity, the users would link these Study files by filling the Investigation section. Another example comes from
the environmental genomics domain, where several studies carried out in the same area can be usefully related
under the same Investigation. See also section 5.1.
2.2 Study file
In this file the fields are organized on a per-row basis. The Study file contains contextualizing information for one
or more assays, for example; the subjects studied; their source(s); the sampling methodology; their
characteristics; and any treatments or manipulations performed to prepare the specimens. See also section 5.2.
2.3 Assay file
In this file the fields are again organized on a per-row basis. The Assay file represents a branch of the
experimental graph, containing information about a certain set (or type) of assays, defined by the endpoint
measured (i.e. gene expression) and the technology employed (i.e. DNA microarray). Assay-related information
includes protocols, additional information in part relating to the execution of those protocols and references to
data files (whether raw, processed or normalized). Note that in the case of microarray assays the Array
Description Format (ADF) and Final Gene Expression Data Matrix (FGDM) are also referenced. See also section
5.3.
For example, in a study looking at the effect of a compound inducing liver damage in subjects by characterizing
the metabolic profile of urine (by NMR spectroscopy) and measuring protein and gene expression in the liver (by
mass spectrometry and DNA microarrays, respectively):
 The Study file will contain information on the subjects studied, their source(s) and characteristics, the
description of the treatment with the hepatotoxicant and the steps undertaken to take the specimens
(urine and liver) from the treated subjects for the assays.
 The Assay file for the urine metabolic profile (endpoint) by NMR spectroscopy (technology) will contain
the (stepwise) description of the methods by which the urine was processed for the assay, subsequent
steps and protocols, and the link to the resultant data files.
 The Assay file for the gene expression measurements by DNA microarray, and similarly the Assay file
for protein expression by mass spectrometry, will contain the (stepwise) description of how the RNA or
protein extract were prepared from the liver (or a section) and prepared for assaying, subsequent steps
and protocols, and the link to the resultant data files.
4 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
3. Relating ISA-TAB to other formats and requirements
3.1 Commonalities and difference with MAGE-TAB v1.1
The ISA-TAB v0.3 proposal results from the alignment between MAGE-TAB v1.1 and ISA-TAB v0.1. This
alignment process aimed to fulfill the need for a more general framework, that nonetheless maintained backward
compatibility with existing MAGE-TAB files, ultimately to facilitate the adoption of one common format. ISA-TAB
v0.2 contains all the fields in MAGE-TAB, plus others required by the Pride repository, which supports the
Proteomics Standards Initiative (PSI, 17).
ISA-TAB introduces additional structure to accommodate complex, multi-domain experiments, and to place the
concepts in line with the FuGE objects. While no changes have been made to the ADF and the FGDM files, the
changes affect the MAGE-TAB Sample and Data Relationship Format (SDRF) and Investigation Description
Format (IDF) are as follows:
 The content of MAGE-TAB SDRF has been divided between the ISA-TAB Study and Assay files and
their relationship is declared in the ISA-TAB Investigation file. The content of the Study file corresponds
to the SDRF fields (on the left) describing the subjects studied, source(s), characteristics etc, while the
content of the Assay file can be mapped to the SDRF fields (on the right) containing assay-related
information and the link to the resultant data files. Although, the SDRF can be split into two or more files,
MAGE-TAB does not require a particular division and does not identify their content or regulate their
order during the parsing process.
 The MAGE-TAB IDF corresponds largely to the ISA-TAB Investigation file. The latter reuses many fields
from the IDF but in a slightly different order and divided into sections that can be repeated accordingly;
the intent being to enhance readability and presentation, and possibly to help with parsing.
 In the MAGE-TAB IDF, Investigation is used as a synonym for ‘experiment’ (sensu MIAME, 18) to link
related hybridizations. In the ISA-TAB Investigation file, the homonymous section is used sensu FuGE,
to relate (list and order) related Study files and will also serve to facilitate the alignment with FuGE
investigation object.
For more details refer to the section 5.
3.2 Relation to FuGE: rendering FuGE in ISA-TAB via XSL transformations
The ISA-TAB format could be seen as competing with XML-based formats, whether existing or under
development, such as the FuGE-ML. However, ISA-TAB addresses the immediate need for a framework to
communicate for multi-omics experiments, whereas all existing FuGE-based modules are still under
development. When these become available, ISA-TAB could continue serving those with little or no
bioinformatics support, as well as finding utility as a user-friendly presentation layer for XML-based formats (via
an XSL transformation); i.e. in the manner of the HTML rendering of MAGE-ML documents (19).
Initial work has been carried out to evaluate the feasibility of rendering FuGE-ML files (and FuGE-based
extensions, such as GelML and Flow-ML) in ISA-TAB layout. Examples are available at the ISA-TAB website
under the document section (2), along with a report, detailing the issues faced during these transformations.
When finalized, the xsl templates will also be released, along a Xpath expressions and a table ‘mapping’ FuGE
objects and ISA-TAB labels.
3.3 Relation to biomedical tabular formats
Where omics-based technologies are used in clinical or nonclinical studies, ISA-TAB will complement existing
biomedical formats such as the SDTM. It is inevitable that some information will be duplicated between the two
frameworks, but this is not generally seen as a major problem. A reference system, probably based on the
subject source, will be needed to link the two files.
3.4 Minimal content and terminology
As with MAGE-TAB, ISA-TAB is no more than the format with which to communicate information. Both ‘minimum
information’ requirements and the use of controlled vocabularies/ontologies are beyond the scope of this
proposal, and are the focus of related efforts. ‘Minimal information’ checklists are under development both by
individual communities for their particular domains of interest and collaboratively through the Minimal Information
for Biological and Biomedical Investigations (MIBBI, 20) project. Efforts proceeding under the Open Biomedical
Ontology (OBO, 21) Foundry umbrella will provide both 'universal' terms to describe studies and assays, via the
5 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Ontology for Biomedical Investigations (OBI, 22), and ‘domain-specific’ terms relevant to a particular domain (i.e.
chemical, biological, clinical).
Therefore, the decision on how to regulate the use of the ISA-TAB (marking certain fields mandatory, or
enforcing the use of controlled terminology) is a matter for those who will implement the format in their system.
Although certain fields would benefit from the use of controlled terminology, ISA-TAB files with all fields left
empty are syntactically valid, as are those where all fields are filled with free text values rather than controlled
vocabulary or ontology terms.
4. ISA-TAB development process
4.1 Communities and workshops
Created to fulfill the need of the Bio Investigation Index project, the first straw-man proposal, ISA-TAB v0.1, was
presented and discussed in a workshop held at the European Bioinformatics Institute (EBI), Cambridge, UK, on
6-8 December, 2007, funded by the UK Biotechnology and Biological Sciences Research Council (BBSRC
BB/E025080/1). The goal of this first workshop was the creation of an ‘exchange network test bed’, including
heterogeneous nodes, such as public and proprietary repositories, public and commercial software, by
academic, industrial and governmental groups. Participants included a group of collaborative systems interested
in leveraging on common standards to report omics-based experiments, also with experience of building
standards and actively involved or leading relevant efforts. The workshop report (23) is also available from the
ISA-TAB website (2). Overall, this workshop produced a general consensus on the importance of the ISA-TAB to
address the immediate need for a communication framework for multi-omics experiments. A series of technical
challenges were also identified - along with the need to align the ISA-TAB v0.1 with MAGE-TAB- the majority of
which have been already resolved in the current ISA-TAB v0.3.
Next workshop is planned for 16-17-18 June, 2008 at EBI; in this occasion the current version will be revised – if
necessary- in the light of a series of real case examples that are being produced by the participating
communities. The agenda will include also a discussion on and examples of implementations and tools that
need to be developed to assist the users in the creation of ISA-TAB files.
5. ISA-TABv0.3 detailed structure
As stated in the introduction, a number of design decisions are carried over from MAGE-TAB work. In particular:
 Field separator
Tabulation is the official separator accepter in ISA-TAB and MAGE-TAB files.
 Node Names used as identifier
All strings found in Node fields (identified by a header with a ‘Name’ tag) will be regarded as object
identifiers. Therefore, they should locally unique within an ISA-TAB submission. As long as this
requirement is met, there are specific requirements about formatting. However, we encourage following
the guidelines detailed in the MAGE-TAB specifications in section 3.1.3.
 Referencing
As described in MAGE-TAB specifications in section 3.1.4, Objects identified in the Investigation File can
be referenced in either the Study or Assay files. A REF tag is used to indicated that fact, such Protocol
REF and Term Source REF. Parameters and Factors declared in the IDF are referenced in a different
way as their values will be specified during data entry relying to [] mechanism instead.
Each file is described in details in the sections below. As guidance, Table 2 shows the type of values allowed
per each field or header and comments some implementers may find useful.
5.1 Investigation file
In this file the fields are organized on a per-column basis and divided in sections. Table 3 shows the number of
value allowed per each field, although, as stated in section 3.4, ISA-TAB files with all fields left empty are
syntactically valid. The example shows how to record one Study file and two Assay files:
Ontology Source Reference
Term Source Name
Term Source File
CTO
MO
http://obo.sourceforge.net/cgibin/detail.cgi?cell
http://mged.sourceforge.net/ontol
ogies/MGEDontology.php
6 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Term Source Version
Term Source Description
1.3.0.1
The Cell Type Ontology
The Microarray Ontology
Investigation
Investigation Title
Investigation description
Date of Investigation submission
Investigation Public Release Date
(#start of repeatable block)
Studies
Study Identifier
Study Title
STD1
The Influence of
Pharmacogenetics on Fatty Liver
Disease in the Wistar and Kyoto
Rats: A Combined Transcriptomic
and Metabonomic Study
Date of Study Submission
DD/MM/YYYY
Study Public Release Date
PubMed ID
Publication DOI
13/12/2006
17203948
10.1021/pr0601640
Publication Author list
Publication Title
Publication Status
Griffin JL, Scott J, Nicholson JK.
The influence of pharmacogenetics on fatty liver disease in the wistar
and kyoto rats: a combined transcriptomic and metabonomic study.
Study Description
indexed for MEDLINE
Analysis of liver tissue from rats
exposed to orotic acid for 1, 3,
and 14 days was performed by
DNA microarrays and high
resolution 1H NMR spectroscopy
based metabonomics of both
tissue extracts and intact tissue
(n ) 3).
Study Design
time course design
Study Design Term Accession
OBI_11215
Study Design Term Source REF
Study File Name
OBI
study-1.txt
Study Factors
Factor Name
Time
Treatment
Factor Type
Time
Stressor
Factor Type Term Accession
OBI_XXXx
OBI_XXXx
Factor Type Term Source REF
OBI
OBI
Measurements/Endpoints Name
Gene Expression
Metabolite Characterization
Measurements/Endpoints Term Accession
OBI_XXXx
OBI_XXXx
Measurements/Endpoints Term Source REF
OBI
OBI
DNA microarray
1D 1H NMR spectroscopy
OBI_XXXx
OBI_XXXx
OBI
OBI
Assay-Tx.txt
Assay-Mx.txt
standard procedure 1
griffin procedure 2
Study Assays
Technology Type
Technology Type Term Accession
Technology Type Term Source REF
Assay File Name
Study Protocols
Protocol Name
7 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Protocol Type
animal procedure
nucleic acid extraction
Protocol Type Term Source REF
OBI
OBI
Protocol Type Term Accession
OBI_XXXx
All animal procedures conformed
to Home Office, UK, guidelines
for animal welfare. Male Wistar
rats (n ) 3 for each time point;
control animals fed control diet
for the same time period; Charles
River UK Ltd.) were fed either
standard laboratory chow, or
chow supplemented with 1%
orotic acid (Sigma Aldrich, UK)
ad libitum.5-6 Rats were killed by
cervical dislocation at days 0, 1, 3
and 14, and the left lateral lobe of
the liver excised. Tissues were
snap frozen and stored at -80 °C.
OBI_XXXx
diet;population density
extracted product; amplification
Person Last Name
Griffin
Fleckenstein
Person First Name
Julian
Scott
Protocol Description
Total RNA was extracted by RNA
Isolation Kit (Stratagene) from
the livers of Wistar rats at day 0
(n ) 3), day 1 and 3 (n ) 2), and
day 14 (n ) 3).
Protocol Contact
Protocol Parameter
Protocol Parameter Type Term Accession
Protocol Parameter Type Term Source REF
Protocol Instrument Name
Protocol Instrument Components
Protocol Instrument Components Term Accession
Protocol Instrument Components Term Source REF
Protocol Instrument Parameter
Protocol Instrument Parameter Term Accession
Protocol Instrument Parameter Term Source REF
Protocol Software
Protocol Software Name
Protocol Software Version
Processing Method Parameter
Processing Method Parameter Term Accession
Processing Method Parameter Term Source REF
Study Contacts
Person Mid Initial
Person Email
jlg40@mole.bio.cam.ac.uk
Person Phone
44*(0)*1223766002
Person Fax
Person Address
University of Cambridge
Imperial College London
Person Affiliation
Department of Molecular Biology
Genetics and Genomics
Research Institute
Person Role
submitter;investigator
investigator
Person Role Term Accession
OBI_XXXx
OBI
OBI_XXXx
OBI
Person Role Term Source REF
(#end of repeatable block)
The following paragraph details the organization of the individual sections.
8 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
5.1.1 Ontology source section
This annotation section is common to the MAGE-TAB format. Table 2 shows the type of values allowed per each
field or header including those that would benefit from the use of a controlled terminology –. However, this
decision is left to individual implementers and, as stated in section 3.4, ISA-TAB files with all fields filled with free
text values rather than controlled vocabulary or ontology terms are valid.
Term Source Name
The names of the Term Sources (ontologies or databases) used within the ISA-TAB document. This
name will be used in all corresponding "Term Source REF" fields. Examples: OBI, GO, DO. Used as an
identifier within the ISA-TAB document.
Term Source Namespace Location
A file name or valid pointer to an official resource to allow cross validation and version tracking of terms
used in a submission.
Term Source Version
The version of the Term Source used throughout the ISA-TAB document.
5.1.2 Investigation section
In this section the fields are organized on a per-column basis. The optional Investigation section is a flexible
solution to group two or more Study files, where required. The name of the Study file, however, should be used
even when one study is performed.
Investigation Title
A concise name given to the investigation
Investigation Description
A textual description of the investigation
Date of Investigation Submission
To provide the date on which the investigation was reported to the repository.
Investigation Public Release Date
To provide the date on which the investigation should be released publicly.
5.1.3 Study section
[#start of repeatable block]
This section can be repeated according to the number of Study files. This repeatable blocks are arranged
vertically; the intent being to enhance readability and presentation, and possibly to help with parsing.
Study Identifier
A unique identifier: temporary identifier supplied by users or generated by repository / database.
It could be (but no necessarily) an identifier complying with LSID specification.
Study Title
A concise phrase used to encapsulate the purpose and goal of the study.
Study Public Release Date
To provide the date on which the study should be released publicly.
PubMed ID
The PubMed IDs of the publication(s) associated with this investigation (where available).
Publication DOI
A Digital Object Identifier (DOI) for each publication (where available).
Publication Author List
The list of authors associated with each publication.
Publication Title
The title of publication associated to the investigation.
Publication Status
A term describing the status of each publication (i.e. "submitted", "in preparation", "published"). Terms
for this field should come from OBI.
Publication Status Term Accession
The accession number from the Source associated to the selected term.
Publication Status Term Source REF
Source REF have to match one the Term Source Names declared in the in the annotation section.
Study Description
A textual description of the study, including section such as objective or goals.
Study Design
9 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
A term allowing the classification of the study.
Study Design Term Accession
The accession number from the Source associated to the selected term.
Study Design Term Source REF
Study Design Term Source REF has to match one the Term Source Names declared in the annotation
section.
Study File Name
A field to indicate the Study file, to ensure validation of the ISAarchive. For each study, provide a Study
file formatted according to the specifications detailed in section 5.2. Multiple files may be reported using
a semi-colon (;) within cell between file names. In a valid ISArchive, this field should always have at least
one value. In that case, it simply indicates that there is only one study and the other fields for
investigation would be left empty
5.1.4 Study Factor section
Factor Name
The name of the Factors used in the Study and/or Assays files. Factors should correspond to the
independent variable manipulated by experimentalists and intended to affect biological systems in such
a way that an assay is devised to measure the responses of the biological system to the perturbation by
following a response variable (also known as dependent variable).
Factor Type
A term allowing the classification of Factor in categories.
Factor Type Term Accession
The accession number from the Source associated to the selected term.
Factor Type Term Source REF
Source REF have to match one the Term Source Names declared in the annotation section.
5.1.5 Study Assay section
It has been devised as a way to declare the various types of assays and to determine also the number of Assay
file reported.
Measurements/Endpoint Name
Terms to qualify the endpoint, what is being measured, i.e. are gene expression profiling, protein
identification.
Measurements/Endpoint Term Accession
The accession number from the Source associated to the selected term.
Measurements/Endpoint Term Source REF
Source REF have to match one the Term Source Names declared in the annotation section.
Technology Type Term Accession
The accession number from the Source associated to the selected term.
Technology Type
Term to identify the technology used to perform the measurement, i.e. DNA microarray, mass
spectrometry.
Technology Type Term Source REF
Source REF have to match one the Term Source Names declared in the annotation section.
Assay File Name
A field to list the Assay file(s), to ensure validation of the ISA archive. For each assay, provide an Assay
file formatted according to the specifications detailed in section 5.3. Multiple files may be reported using
a semi-colon (;) within cell between file names.
5.1.6 Protocol section
Columns are used to report information on protocol: one protocol requires 1 column to be reported
Protocol Name
The names of the protocols used within the ISA-TAB document. These will be referenced in the Study
and Assay files in the "Protocol REF" columns. Used as an identifier within the ISA-TAB document,
these can also be accession values. In such case decisions about how to deal with the protocol
information are left to data curators and individual implementers. For instance, an importer tool could be
designed so that, in case an existing protocol is mentioned by means of a public accession, only those
fields which are non-empty in the ISA-TAB are updated in the target repository.
10 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Protocol Type
Term to classify the protocol.
Protocol Type Term Accession
The accession number from the Source associated to the selected term.
Protocol Type Term Source REF
Source REF have to match one the Term Source Names declared in the annotation section.
Protocol Description
A free-text description of the protocol. This text is included in a single tab-delimited field. If you wish to
include tab or newline characters as part of this text, you must enclose the whole text within double
quotes (").
Protocol Contact
If used, the contact should be declared in the Contact section and referenced here.
Protocol Parameters
A semicolon-delimited list of parameter names; these names are used in the Study and Assay files (as
"Parameter Value [<parameter name>]" headings) to list the values used for each protocol parameter. If
more than one parameter was used for a given protocol, they should be separated with semicolons (";").
Used as an identifier within the ISA-TAB document.
Protocol Parameter Term Accession
The accession number from the Source associated to the selected term.
Protocol Parameter Term Source REF
Source REF have to match one the Term Source Names declared in the annotation section.
Protocol Instrument Name
The instrument used by the protocol.
Protocol Instrument Components
Terms to indicate key part of an instrument set up.
Protocol Instrument Components Term Accession
The accession number from the Source associated to the selected term.
Protocol Instrument Components Term Source REF
Source REF have to match one the Term Source Names declared in the annotation section.
Note: Protocol Instrument Components is a special case as it is thought as capable of holding semicolon
(";") separated values to allow reporter of several instrument components. The field should be used in
combination with keywords such as detector type, analyzer type, source type which would allow
sufficient information to be reported when describing a mass spectrometer (and meet so minimal
reporting standards as defined for instances by MIBBI). Refer to the example for illustration of this
mechanism
Protocol Instrument Parameter
To indicate an important parameter attached to the Instrument.
Protocol Instrument Parameter Term Accession
The accession number from the Source associated to the selected term.
Protocol Instrument Parameter Term Source REF
Source REF have to match one the Term Source Names declared in annotation section.
Protocol Software Name
The name of the software used in a protocol.
Protocol Software Version
The version of the software used in a protocol.
5.1.7 Study Contact section
Columns are used to report information on protocol: one protocol requires 1 column to be reported
Person Last Name
The last name of each person associated with the investigation or study.
Person First Name
The first name of each person associated with the investigation or study.
Person Mid Initials
The middle initials of each person associated with the investigation or study.
Person Email
The email address of each person associated with the investigation or study. Email could be used as
internal identifier to reference persons within the TABs. However, if this is effective and practical, there
might be privacy issues related to relying on email as person tracker.
11 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Person Phone
The telephone number of each person associated with the investigation or study.
Person Fax
The Fax number of each person associated with the investigation or study.
Person Address
The street address of each person associated with the investigation or study.
Person Affiliation
The organization affiliation for each person associated with the investigation or study.
Person Roles
Term to classify the role(s) performed by each person. Multiple annotations or values attached to one
person may be provided by using a semicolon (";") a separator, for example:
"submitter;funder;sponsor”.
Person Roles Term Accession
The accession number from the Source associated to the selected term.
Person Roles Term Source REF
Source REF have to match one the Term Source Names declared in the annotation section.
[#end of repeatable block]
5.2 Study file
In this file the fields are organized on a per-row basis. The Study file contains contextualizing information for one
or more assays. Example of few field is shown below:
Source Name
Characteristics[Material
Type]
Term Source
REF
MO
Term
Accession
xxxx
Study1.animal1
whole_organism
Study1.animal2
whole_organism
{Term Name} Rattus norvegicus
MO
xxxx
Study1.animal3
whole_organism
{Term Name} Rattus norvegicus
MO
xxxx
{Term Name} Rattus norvegicus
{Term Name} Rattus norvegicus
Characteristics[Organism]
Study1.animal4
whole_organism
MO
xxxx
Study1.animal5
whole_organism
MO
xxxx
{Term Name} Rattus norvegicus
{Term Name} Rattus norvegicus
Study1.animal6
whole_organism
MO
xxxx
Study1.animal7
whole_organism
MO
xxxx
{Term Name} Rattus norvegicus
Study1.animal8
whole_organism
MO
xxxx
{Term Name} Rattus norvegicus
{Term Name} Rattus norvegicus
Study1.animal9
whole_organism
MO
xxxx
Study1.animal10
whole_organism
MO
xxxx
{Term Name} Rattus norvegicus
Study1.animal11
whole_organism
MO
xxxx
{Term Name} Rattus norvegicus
MO
xxxx
{Term Name} Rattus norvegicus
Study1.animal12
whole_organism
The following fields describe the column headers:
Source Name
Sources are considered as the starting biological material used in a study. Source items can be qualified
using the following headers: Characteristics [], Term Source REF, Unit, Provider, Description, and
Comment [].
Sample Name
Samples represent major outputs resulting from a protocol application but which cannot be treated as an
Extract or a Labeled Extract. Sample items can be qualified using the following headers: Characteristics
[], Term Accession, Term Source REF, Unit and Comment [].
Characteristics [<category term>]
Used as a qualifying field following Source Name, Sample Name. This column contains terms describing
each material according to the characteristics category indicated in the column header. For example, a
column header "Characteristics [OrganismPart]" would contain individual OrganismPart terms. If these
terms come from a controlled terminology/ontology, the accession number could be supplied using a
Term Accession header. The source would then be indicated using a Term Source REF column. If the
characteristic being reported is a measurement, a Unit [] column may be used to qualify further.
Protocol REF
12 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
This column contains references to Protocol Names defined in the Investigation File, or accession
numbers of protocols already present in public repositories, such ArrayExpress or Pride. Valid qualifying
headers for Protocol REF item are Parameter [Value], Performer, Date, and Comment [] (which is
optional)
Factor Value [<factor name>]
Factor Value column is key to provide the actual values taken by independent variables manipulated by
the operator. Those study variables also known as Factors should have been declared in the
Investigation File and the Factor Name should be recalled between the square brackets of the Factor
column header. In order to fully qualify the Factor Values, the following headers can be used to refine
annotation and description Unit, Unit Term Source REF in case of quantitative values and Term
Accession, Term Source REF in case of qualitative values.
5.3 Assay file
In this file, the fields are again organized on a per-row basis and define records. The Assay file represents a set
of assays, defined by the endpoint measured or overall purpose of the assay (e.g. metabolite profiling) and the
technology employed (i.e. Mass spectrometry).
The following subsections provide a list of fields focusing on microarray hybridization, gel electrophoresis, mass
spectrometry and high throughput sequencing technologies. However, additional work is required to complete
this section and to define the format of the resulting data matrices for the different technologies, like it has been
done for the microarray (FGDM in MAGE-TAB), peptide and protein identification (and used in PRIDE Harvest).
5.3.1 Technology Type: DNA microarray hybridization
When dealing with DNA microarray technology, the allowed fields essentially correspond to those defined by the
MAGE-TAB format.
Extract Name
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
Extract material. Valid qualifying headers for Extract item are Characteristics[], Material Type,
Description, Comment[]
Labeled Extract Name
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
Labeled Extract material
Valid qualifying header for LabeledExtract item is Label (which is mandatory) and Characteristics [],
Material Type, Description, Comment [] (which are optional)
Material Type
Used as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract
Name. This column contains terms describing the type of each material. Examples: whole_organism,
organism_part, cell, total_RNA. Valid qualifying header for Material Type item are Term Accession and
Term REF.
Label
Used as an attribute column following Labeled Extract Name to indicate the compound linked to an
Extract to create the Labeled Extract. Examples: Cy3, Cy5, biotin, alexa_546.
Valid qualifying headers for Label item are Term REF.
Hybridization Name
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
Hybridization. Valid qualifying headers for Hybridization Name item are ArrayDesign REF (which is
mandatory) and Comment [] (which is optional)
Array Design REF
This column contains references to the array design used for individual hybridizations. For ArrayExpress
submissions this should be a valid accession number, e.g. "A-AFFY-33" but for the purpose of data
exchange, it should be an unambiguous name such as a commercial name HG-U133A-2 in the case of
an Affymetrix array.
Valid qualifying header for Array Design REF item is Comment [] (which is optional).
The values in this field are used as identifiers. They must match the references provided in the array
description file (ADF). Submitters, curators and software tools should consider the use of public
accessions for this value.
Scan Name
13 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Used as an identifier within the ISA-TAB document. This optional column contains user-defined names
for each Scan Event
Valid qualifying headers for Scan Name item are Description and Comment [] (which are optional)
Image File
This optional column contains a list of image files, one for each row of the Assay file, linking these image
files to their corresponding hybridizations. Note that ArrayExpress does not store image data due to
storage constraints on the database. However, in the context of an infrastructure intending to use the
format for data exchange purposes, this column is valuable to include links to image files stored on local
web server. Valid qualifying headers for Image File item is Comment [] (which is optional)
Normalization Name
Used as an identifier within the ISA-TAB document. This optional column contains user-defined names
for each Normalization event. Valid qualifying headers for Normalization Name item are Description and
Comment [] (which are optional)
Array Data File (*)
This column contains a list of raw data files, one for each row of the Assay file, linking these data files to
their respective hybridizations. Valid qualifying header for Array Data File item is Comment [] (which is
optional)
Derived Array Data File (*)
This column contains a list of processed data files, one for each row of the SDRF file, linking these data
files to their respective hybridizations. Valid qualifying header for Derived Array Data File item is
Comment [] (which is optional)
Array Data Matrix File (*)
This column contains a list of raw data matrix files, where data from multiple hybridizations is stored in a
single file, and the data mapped to individual hybridization via the Data Matrix format itself. Valid
qualifying header for Array Data Matrix File item is Comment [] (which is optional)
Derived Array Data Matrix File (*)
This column contains a list of processed data matrix files, where data from multiple hybridizations is
stored in a single file, and the data mapped to each hybridization (or scan, or normalization) via the Data
Matrix format itself.
Valid qualifying header for Derived Array Data Matrix File item is Comment [] (which is optional)
Factor Value [<factor name>]
Factor Value column is key to provide the actual values of the independent variables when manipulated
by the experimentalists. Those study variables aka as Factors should have been declared in the
Reference File and the Factor Name should be recalled between the square brackets of the column
header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation
and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full
description of numerical values. For usability purposes, it could be possible to cascade FactorValues
declared in the Study file at the level of the Assay file for facilitate association between datafiles and
factor values.
Examples:
In ISA_TAB-V2_Griffin-PMID17203948 .xls available from http://isatab.sf.net, the Assay-Tx worksheet
illustrates a typical usage of the fields to compose a report detailing a series of hybridization event
carried out on Affymetrix platform on a series of samples whose origin and treatement are described on
the Study worksheet.
Discussion:
Note that On the Assay-Tx,Factor Values have not been cascaded from the Study worksheet but could
have been, in order to allow for sorting to easily identify datafiles corresponding to a particular treatment
group
5.3.2 Technology Type: Gel electrophoresis
Extract Name
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
Extract material. Valid qualifying headers for Extract item are Characteristics[], Material Type,
Description, Comment[]
Labeled Extract Name (optional, where relevant)
Labeled Extract should be reported when Differential Gel Electrophoresis (DIGE) is employed. The
field is used as an identifier within the ISA-TAB document. This column contains user-defined names for
14 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
each Labeled Extract. Valid qualifying headers for LabeledExtract item are Label (which is mandatory)
and Characteristics [], Material Type, Description, Comment [].
Material Type
Used as an attribute column following Sample Name, Extract Name, or Labeled Extract Name. This
column contains terms describing the type of each material. Examples: whole_organism, organism_part,
cell, fraction.
Label (optional, where relevant)
Used as an attribute column following Labeled Extract Name to indicate the compound linked to an
Extract to create the Labeled Extract. Examples: Cy2, Cy3, Cy5. A valid qualifying header for Label item
is Comment []
Gel Electrophoresis Name
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
electrophoresis gel. Valid qualifying headers for Electrophoresis Gel Name item are Protocol REF,
Parameter [] Comment [] (which is optional). For specific 2D applications, the following headers can be
used instead:
First Dimension
This column hold a control term e.g. Isoelectrofocusing from an identified ontological resource
If the case, Term accession and Term Source REF should be used
Second Dimension
This column hold a control term e.g. SDS-PAGE from an identified ontological resource
If the case, Term accession and Term Source REF should be used
Electrophoresis Type
(To be discussed…..)
Scan Name
Used as an identifier within the ISA-TAB document. This optional column contains user-defined names
for each Scan event. Valid qualifying headers for Scan Name item are Protocol REF, Parameter [],
Performer, Date and Comment [] (which is optional)
Normalization Name
Used as an identifier within the ISA-TAB document. This optional column contains user-defined names
for each Normalization event. Valid qualifying headers for Normalization Name item are Protocol REF,
Parameter [], Performer, Date and Comment [] (which is optional)
Image File
This optional column contains a list of image files, one for each row of the Assay file, linking these image
files to their respective electrophoresis events.
Raw Data File (*)
This column contains a list of raw data files, one for each row of the Assay file, linking these data files to
their respective gels. A valid qualifying header for Raw Data File item is Comment [] (which is optional)
Processed Data File (*)
This column contains a list of processed data files, one for each row of the Assay file, linking these data
files to their respective gel runs. Valid qualifying headers for Processed Data File item is Comment []
(which is optional)
Spot Picking File
This column contains a file name pointing to files hosting protein spot coordinates and metadata for use
by spot picking instruments, typically for downstream analysis by Mass spectrometry.
Factor Value [<factor name>]
Factor Value column is key to provide the actual values of the independent variables when manipulated
by the experimentalists. Those study variables aka as Factors should have been declared in the
Investigation File and the Factor Name should be recalled between the square brackets of the column
header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation
and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full
description of numerical values. For usability purposes, it is be possible to cascade FactorValues
declared in the Study file at the level of the Assay file for facilitate association between datafiles and
Factor Values.
Examples:
An example available from http://isatab.sourceforge.net/docs/DIGEexample_from_amersham_v4.xml
shows a rendering of GelML encoded data in a ISA-TAB Gel electrophoresis Assay tab.
15 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
5.3.3 Technology Type: Mass Spectrometry (MS)
Extract Name
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
Extract material. Valid qualifying headers for Extract Name node are Characteristics [], Material Type,
Description, Comment []
Labeled Extract Name (where relevant)
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
Labeled Extract material. Valid qualifying headers for LabeledExtract node are Label (which is
mandatory) and Characteristics [], Material Type, Description, Comment [] (which are optional)
Material Type
Used as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract
Name. This column contains terms describing the type of each material. Examples: column fraction, gel
excised spot.
Label
Used as an attribute column following Labeled Extract Name to indicate the compound linked to an
Extract to create the Labeled Extract. Examples: P33, C14. Valid qualifying headers for Label item are
and Term Source REF
Mass Spectrometry Run Name
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
Assay. Valid qualifying headers are Protocol, Parameter [], Performer, Date and Comment [] (which is
optional)
Under Discussion: Could be declared in the Protocol Instrument Section
The following columns can be used to annotate Assay Name columns:
Source
A term to classify the type of source. Valid qualifying header are Term Accession, Term Source REF
Analyzer
A term to classify the type of analyzer. valid qualifying header are Term Accession, Term Source REF.
Detector
A term to classify the type of detector. Valid qualifying header are Term Accession, Term Source REF
Raw Spectral Data File (MzData File for Proteomics applications)
This column contains a list of raw data files, one for each row of the Assay file.
Processed Spectral Data File
This column contains a list of raw data files, one for each row of the Assay file.
Normalized Spectral Data File
This column contains a list of raw data files, one for each row of the Assay file.
Factor Value [<factor name>]
Factor Value column is key to provide the actual values of the independent variables when manipulated
by the experimentalists. Those study variables aka as Factors should have been declared in the
Investigation File and the Factor Name should be recalled between the square brackets of the column
header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation
and description: Term accession, Term Source REF, Unit, Unit Term Source REF. The latter two fields
enable full description of numerical values. For usability purpose, it could be possible to cascade
FactorValues declared in the Study file at the level of the Assay file for facilitate association between
datafiles and FactorValues.
When Mass Spectrometry is used in proteomics the following information will be required.
Peptide Identification File (*)
This data file should be formatted following the PSI-MS specifications and according to Pride submission
requirements (8).
Protein Identification File (*)
This data file should be formatted following the PSI-MS specifications and according to Pride submission
requirements (8).
Post Translational Modification File (*)
This data file should be formatted following the PSI-MS specifications and according to Pride submission
requirements (8).
Metabolite Identification File (*)
16 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
This data file should be formatted following the MSI-MS specifications and according to MSI submission
requirements (24).
When Mass Spectrometry is used in metabol/nomics a list of requirements is under development in
collaboration with the Metabolomics Standards Initiative (MSI, 24) representatives.
5.3.4 Technology Type: Nuclear Magnetic Resonance spectroscopy (NMR)
Extract Name
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
Extract material. Valid qualifying headers for Extract item are Characteristics [], Material Type,
Description, Comment []
Material Type
Used as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract
Name. This column contains terms describing the type of each material. Examples: whole_organism,
organism_part, cell, protein_extract. Valid qualifying header for Material is Term REF
NMR Run Name
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
NMR run. The following columns can be used to annotate NMR Run Name columns:
Instrument
This column contains a reference to instruments declared in the protocol section.
Free Induction Decay Data File
This column contains a list of raw data files, one for each row of the Assay file. Valid qualifying header
for Free Induction Decay Data File (Assay Data File) item is Comment [] (which is optional). Refer to the
Annex for a list of data format generated by most commonly used NMR instruments.
Acquisition Parameter Data File [NMR pulse sequence]
This column contains a list of files detailing acquisition parameters in particular, a file containing the
NMR pulse sequence must be provided. Refer to the Annex for a list of data format generated by most
commonly used NMR instruments.
Processed Spectral Data File
This column contains a list of processed spectral data files, one for each row of the Assay file. Valid
qualifying header for Processed Spectral Data File item is Comment [] (which is optional)
Normalized Data File
This column contains a list of raw data files, one for each row of the Assay file. Valid qualifying header
for Raw Spectral Data File(Assay Data File) Name item is Comment [] (which is optional)
Factor Value [<factor name>]
Factor Value column is key to provide the actual values of the independent variables when manipulated
by the experimentalists. Those study variables aka as Factors should have been declared in the
Reference File and the Factor Name should be recalled between the square brackets of the column
header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation
and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full
description of numerical values. For usability purpose, it could be possible to cascade FactorValues
declared in the Study file at the level of the Assay file for facilitate association between datafiles and
FactorValues.
When NMR Spectroscopy is used in metabol/nomics the requirements for the final list of metabolites
(identification and quantification) is under development in collaboration with the MSI representatives.
5.3.5 Technology Type: High throughput sequencing
When dealing with DNA microarray technology, the allowed fields essentially correspond to those
defined by the MAGE-TAB format.
Extract Name
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
Extract material. Valid qualifying headers for Extract item are Characteristics[], Material Type,
Description, Comment[]
Extract material
Valid qualifying header for LabeledExtract item is Label (which is mandatory) and Characteristics [],
Material Type, Description, Comment [] (which are optional)
Material Type
17 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Used as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract
Name. This column contains terms describing the type of each material. Examples: whole_organism,
organism_part, cell, total_RNA. Valid qualifying header for Label item is Term REF.
Sequencing Run Name (Assay Name)
Used as an identifier within the ISA-TAB document. This column contains user-defined names for each
Hybridization. Valid qualifying headers for Hybridization Name item are ArrayDesign REF (which is
mandatory) and Comment [] (which is optional)
Raw Data File
An archive containing short read data generated by the instruments. It could also be an archive of fasta
formatted sequences. Refer to the data file section for a more detailed review of accepted format
generated from high throughput sequencing machines.
Derived Data File
A file containing the interpretation of sequence counts and assembly depending on application (i.e.
genomics or transcriptomics).
Examples:
In
http://isatab.sf.net/docs/ISA-TAB-v0.2-MIMS-Gilbert-et-al-2008v3.xlsavailable
from
http://isatab.sourceforge.net/examples.html the PML-Gilbert-MetaTx-Assay.txt worksheet illustrates a
typical usage of the fields to compose a report detailing a series of sequencing event carried out on 454
Sequencer on a series of samples whose origin and treatement are described on the Study worksheet.
Discussion:
Note that On the Assay-Tx,Factor Values have not been cascaded from the Study worksheet but could
have been, in order to allow for sorting to easily identify datafiles corresponding to a particular treatment
group
When this technology is used in metagenomics applications, the requirements are under development in
collaboration with the Genomics Standards Consortium (GSC, 25) representatives.
18 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
6. References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
NET project members: http://www.ebi.ac.uk/net-project
ISA-TAB project page and documents: http://isatab.sourceforge.net
MAGE-TAB specification: http://www.mged.org/mage-tab
BioInvestigation Index project: http://www.ebi.ac.uk/net-project/projects.html#BioInvestigation Index
ArrayExpress: www.ebi.ac.uk/arrayexpress
Pride: www.ebi.ac.uk/pride/
Reporting
Structure
for
Biological
Investigation
(RSBI)
working
group:
http://www.mged.org/Workgroups/rsbi/index.html
Sansone SA, Rocca-Serra P, Tong W, Fostel J, Morrison N, Jones AR; RSBI Members. A strategy
capitalizing on synergies: the Reporting Structure for Biological Investigation (RSBI) working group. OMICS.
2006 Summer;10(2):164-71.
Rayner TF, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Irizarry RA, Liu J, Maier DS,
Miller M, Petersen K, Quackenbush J, Sherlock G, Stoeckert CJ Jr, White J, Whetzel PL, Wymore F,
Parkinson H, Sarkans U, Ball CA, Brazma A. A simple spreadsheet-based, MIAME-supportive format for
microarray data: MAGE-TAB. BMC Bioinformatics. 2006 Nov 6;7:489.
Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage
M, Swiatek M, Marks WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A, White J, Hubley R,
Deutsch E, Senger M, Aronow BJ, Robinson A, Bassett D, Stoeckert CJ, Brazma A. Design and
implementation of microarray gene expression markup language (MAGE-ML). Genome Biol.
2002;3:RESEARCH0046. doi: 10.1186/gb-2002-3-9-research0046.
Jones AR, Miller M, Aebersold R, Apweiler R, Ball CA, Brazma A, Degreef J, Hardy N, Hermjakob H,
Hubbard SJ, Hussey P, Igra M, Jenkins H, Julian RK Jr, Laursen K, Oliver SG, Paton NW, Sansone SA,
Sarkans U, Stoeckert CJ Jr, Taylor CF, Whetzel PL, White JA, Spellman P, Pizarro A. The Functional
Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics. Nat
Biotechnol. 2007 Oct;25(10):1127-1133.
FuGE working group: http://fuge.sourceforge.net
SEND: http://www.cdisc.org/models/send/v2.3
CDISC: http://www.cdisc.org/standards
FDA Data Standard Council: http://www.fda.gov/oc/datacouncil
Pharmacogenomic Data Submissions - Companion Guidance http://www.fda.gov/cder/guidance/7735dft.pdf
PSI: http://www.psidev.info
Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA,
Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H,
Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M. Minimum information
about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001
Dec;29(4):365-71.
HTML
rendering
of
MAGE-ML
documents:
http://www.ebi.ac.uk/~rocca/MAGEXSLT/HTML%20rendering%20of%20MAGE.htm
MIBBI: http://mibbi.sourceforge.net
Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall
CJ; OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N,
Whetzel PL, Lewis S. The OBO Foundry: coordinated evolution of ontologies to support biomedical data
integration. Nat Biotechnol. 2007 Nov;25(11):1251-5.
OBI: http://obi.sourceforge.net
Sansone SA, Rocca-Serra P, Brandizi M, Brazma A, Field D, Fostel J, Garrow AG, Gilbert J, Goodsaid F,
Hardy N, Jones P, Lister A, Miller M, Morrison N, Rayner T, Sklyar N, Taylor C, Tong W, Warner G,
Wiemann S; Members of the Rsbi Working Group. The First RSBI (ISA-TAB) Workshop: "Can a Simple
Format Work for Complex Studies?". OMICS. 2008, ahead of print.
Sansone SA, Fan T, Goodacre R, Griffin JL, Hardy NW, Kaddurah-Daouk R, Kristal BS, Lindon J, Mendes
P, Morrison N, Nikolau B, Robertson D, Sumner LW, Taylor C, van der Werf M, van Ommen B, Fiehn O. The
metabolomics standards initiative. Nat Biotechnol. 2007 Aug;25(8):846-8.
GSC: http://gensc.org
19 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Figure 1 - ISAarchive
For submission or transfer, files can be packaged into an ISArchive, as shown in this figure.
ISArchive
Assay(s)
Study 1
Data
Investigation
Note that in the simple case there is
only 1 Study file or 1 Assay file only
Study 2..n
Assay(s)
Data
The Investigation
file
- records all
declarative
information
referenced in the
other files such as
contacts, protocols
and equipment;
- relates Assay files
to Study files;
- relates Study files
to an Investigation
(only when two or
more Study files
exist, as in this
example).
The Assay file
contains information
about a certain set (or
The Study file
type) of assays,
contains
defined by the endpoint
contextualizing
information for one or measured (i.e., gene
more assays; i.e., the expression) and the
technology employed
subjects studied,
(i.e., DNA microarray),
their source(s), the
including information
sampling
on protocols, additional
methodology, their
information in part
characteristics, and
relating to the
any treatments or
execution of those
manipulations
performed to prepare protocols and
references to data files
the specimens.
20 of 27
All Data files resulting
from assays, including
raw data files, processed
and normalized data files.
For DNA gene expression
assays employing
microarray technology,
the set of data files,
should include an Array
Description File (ADF)
and a Final Gene
Expression Data Matrix
(FGDM), as required by
the MAGE-TAB
specification.
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Table 2 – Type of value in the Investigation File
Type of values allowed per each field in the Investigation File only and comments some implementers may find
useful. However, as stated in section 3.4 ISA-TAB files with all fields left empty are syntactically valid, as are
those where all fields are filled with free text values rather than controlled vocabulary or ontology terms.
Investigation file fields
Ontology Source Reference
Term Source Name
Type of value
Term Source File
Term Source Version
URI
text
Term Source Description
text
Comments for implementations
text
Investigation
Investigation Title
text
Investigation description
text
Date of Investigation submission
Investigation Public Release Date
date (DD/MM/YYYY)
date (DD/MM/YYYY)
Studies
Globally unique ID, i.e. LSID could be
useful
Study Identifier
text
Study Title
text
Date of Study Submission
Study Public Release Date
date (DD/MM/YYYY)
date (DD/MM/YYYY)
PubMed ID
accession number
Publication DOI
accession number
Publication Author list
text
Publication Title
text
Publication Status
Study Description
text
text
controlled terminology could be useful
Study Design
text
controlled terminology could be useful
Study Design Term Accession
text
Study Design Term Source REF
Term Source Name
Study File name
URI/text
Study Factors
Factor Name
text
Factor Type
text
Factor Type Term Accession
text
Factor Type Term Source REF
term source Name
Study Assays
text
Measurements/Endpoints Name
Measurements/Endpoints Term Accession
text
text
Measurements/Endpoints Term Source REF
Term source name
Technology Type
text
Technology Type Term Accession
text
Technology Type Term Source REF
Term Source Name
Assay File Name
Study Protocols
URI/text
Protocol Name
text
Protocol Type
text
Protocol Type Term Accession
text
Protocol Type Term Source REF
Term Source Name
Protocol Description
Protocol Contact
text
text
Protocol Parameter
text
Protocol Parameter Type Term Accession
text
Protocol Parameter Type Term Source REF
Term Source Name
controlled terminology could be useful
controlled terminology could be useful
controlled terminology could be useful
controlled terminology could be useful
21 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Protocol Instruments
text
Protocol Instrument Name
Protocol Instrument Component
text
text
Protocol Instrument Components Term Accession
Protocol Instrument Components Term Source
REF
Protocol Instrument Parameter
text
Protocol Software
text
Protocol Software Name
text
Protocol Software Version
Processing Method Parameter
text
text
controlled terminology could be useful
Term Source Name
text
Study Contacts
Person Last Name
text
Person First Name
text
Person Mid Initial
text
Person Email
Person Phone
text
text or numeric
Person Fax
text
Person Address
text
Person Affiliation
text
Person Role
text
Person Role Term Accession
Person Role Term Source REF
text
Term Source Name
controlled terminology could be useful
22 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Table 3 – Number of value in the Investigation File
Number of value allowed per each field in the Investigation File only, although, as stated in section 3.4, ISA-TAB
files with all fields left empty are syntactically valid.
Number of value(s) per column
To report multiple values
1
1
1
1
As many column as necessary
As many column as necessary
As many column as necessary
As many column as necessary
1
1
1
1
n/a
n/a
n/a
n/a
[# start of repeatable block]
Only 1 set within a block
Only 1 set within a block
Only 1 set within a block
Only 1 set within a block
Only 1 set within a block
Only 1 set within a block
Only 1 set within a block
Only 1 set within a block
Only 1 set within a block
Only 1 set within a block
Only 1 set within a block
Ontology Source Reference
Term Source Name
Term Source File
Term Source Version
Term Source Description
Investigation
Investigation Title
Investigation description
Date of Investigation submission
Investigation Public Release Date
Studies
Study Identifier
Study Title
Date of Study Submission
Study Public Release Date
PubMed ID
Publication DOI
Publication Author list
Publication Title
Publication Status
Study Description
Study Design
Study Desgin Term Accession
Study Design Term Source REF
Study File name
Study Factors
Factor Name
Factor Type
Factor Type Term Accession
Factor Type Term Source REF
Study Assays
Measurements/Endpoints
Measurements/Endpoints Term
Accession
Measurements/Endpoints Term Source
REF
Technology Type
Technology Type Term Accession
Technology Type Term Source REF
Assay File Name
Study Protocols
Protocol Name
Protocol Type
Protocol Type Term Accession
Protocol Type Term Source REF
Protocol Description
Protocol Contact
Protocol Parameter
Protocol Parameter Type Term
Accession
1
1
1
1
1
1
1
1
1
1
1..n semi-colon (;) separated
1
1
1
Only 1 set within a block
1
1
1
1
As many column as necessary
As many column as necessary
1
As many column as necessary
As many column as necessary
As many column as necessary
1
1
As many column as necessary
1
1
As many column as necessary
As many column as necessary
1
1
As many column as necessary
As many column as necessary
1
1
1
1..n semi-colon (;) separated
As many column as necessary
As many column as necessary
As many column as necessary
1
23 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Protocol Parameter Type Term Source
REF
Protocol Instrument Name
Protocol Instrument Components
Protocol Instrument Components Term
Accession
Protocol Instrument Components Term
Source REF
Protocol Instrument Parameter
Protocol Software
Protocol Software Name
Protocol Software Version
Processing Method Parameter
Study Contacts
Person Last Name
Person First Name
Person Mid Initial
Person Email
Person Phone
Person Fax
Person Address
Person Affiliation
Person Role
Person Role Term Accession
Person Role Source REF
As many column as necessary
1
1
1..n semi-colon (;) separated with keywords
As many column as necessary
1..n semi-colon (;) separated with keywords
1..n semi-colon (;) separated with keywords
1..n semi-colon (;) separated
1
1
1
1..n semi-colon (;) separated
1
1
1
1
1
1
1
1
1..n semi-colon (;) separated
1
1
24 of 27
As many column as necessary
As many column as necessary
As many column as necessary
As many column as necessary
As many column as necessary
As many column as necessary
As many column as necessary
As many column as necessary
As many column as necessary
As many column as necessary
As many column as necessary
As many column as necessary
[# end of repeatable block]
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Table 4- Node headers in the Study and Assay files
List of all valid fields for describing nameable/ identifiable objects = nodes which can be used to produce an ISATAB archive. The table shows the nodes, their possible attributes, the tab file in which the node should be used,
number of allowed values, data type and if they require declaration of parent node (which could be inferred if
absent).
Node Name
Source Name
Sample
Name
Extract Name
Labeled
Extract Name
Assay Name
TAB
location
Node possible attributes
Characteristics, Provider, Material Type,
Description, Comment
Characteristics, Material Type, Protocol REF,
Description, Comment
Characteristics, Material Type, Protocol
REF,Description, Comment
Characteristics, Material Type, Protocol REF,
Description, Label, Comment
Protocol REF, Raw Data File, Processed Data
File
Accept
multiple
values
per field
Data
type
Parent node
dependency
Study
NO
string
Study
NO
string
Source Name
Assay
NO
string
Sample Name
Assay
NO
string
Extract Name
Assay
NO
string
Hybridization
Name
Protocol REF, Array Data File, Derived Array
Data File, Array Design, Comment
Assay
NO
string
Labeled Extract Name
NMR Run
Name
Protocol REF, Free Induction Decay Data File,
Acquisition Parameter Data File [NMR pulse
sequence], Processed Spectral Data File
Assay
NO
string
Sample or Extract or Labeled
Extract
Gel
Electrophores
is Name
Protocol REF, Raw Data File, Processed Data
File, Spot Picking File
Assay
NO
string
Sample or Extract or Labeled
Extract
Mass
Spectrometry
Run Name
Protocol REF, Analyzer Type, DetectorRaw
Spectral Data File, Processed Spectral Data
File, Protein identification File, Peptide
Identification File, Metabolite Identification File
Assay
NO
string
Sample or Extract or Labeled
Extract
Sequencing
Run Name
Protocol REF, Raw Data File, Processed Data
File
Assay
NO
string
Labeled Extract
Scan Name
Protocol REF, Image File, Array DataFile,
Derived Array Data File, Comment
Assay
NO
string
Hybridization Name or Gel
Name
Normalization
Name
Protocol REF, Derived Array Data File,
Comment
Assay
NO
string
Scan Name or NMR Run
Name or Mass Spectrometry
Run Name
25 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
Table 5 – Attribute headers in the Study and Assay files
Attribute fields which can be used to qualify nodes and provide annotation to entities. The table shows also the
tab file in which the node should be used, number of allowed values, data type and if they require declaration of
parent node.
Attributes
Attribute
qualifiers
TAB location
Accept
multiple
values
Data type
Dependency
Label
Term Source REF,
Term Accession
Assay
NO
string
LabeledExtract
Material Type
Term Source REF,
Term Accession
Study, Assay
NO
string
Source,Sample,Extract,Labele
dExtract
Characteristics [ ]
Unit, Term Source
REF, Term
Accession
Study, Assay
NO
string or
number
Source,Sample,Extract,Labele
dExtract
Factor Value [ ] ( )
Unit, Term Source
REF, Term
Accession
Study, Assay
NO
string or
number
Parameter Value [
]
Unit, Comment
Study, Assay
NO
string or
number
Protocol REF
Unit [ ]
Term Source REF,
Term Accession
Study, Assay
NO
string
Characteristics[ ], FactorValue[
], ParameterValue[ ]
Term Source REF
Study, Assay
NO
string
Characteristics[ ], FactorValue[
], ParameterValue[ ], Unit[ ]
Term Accession
Study, Assay
NO
string
Characteristics[ ], FactorValue[
], ParameterValue[ ], Unit[ ]
Study, Assay
NO
string
Source,Sample,Extract,Labele
dExtract, Assay and Subtypes
Assay
NO
string
Mass Spectrometry Run
Assay
NO
string
Mass Spectrometry Run
Protocol REF
Analyzer Type
Detector
Parameter Value [ ],
Performer, Date,
Comment
Term Source REF,
Term Accession
Term Source REF,
Term Accession
Array Design File /
REF
Comment
Assay
NO
string
Hybridization
Image File
Comment
Assay
NO
string
Hybridization, Gel
Electrophoresis
Raw Data File
Comment
Assay
NO
string
Assay
Raw Spectral Data
File
Comment
Assay
NO
string
NMR Run, Mass Spectrometry
Run
Free Induction
Decay Data File
Comment
Assay
NO
string
NMR Run
Comment
Assay
NO
string
NMR Run, Mass Spectrometry
Run
Comment
Assay
NO
string
Hybridization
Comment
Assay
NO
string
Hybridization
Comment
Assay
NO
string
Hybridization
Comment
Assay
NO
string
Hybridization
Comment
Assay
NO
string
Sequencing Run
Acquisition
Parameter Data
File
Array Data File
Derived Array Data
File
Array Data Matrix
File
Derived Array Data
Matrix File
Processed Data
26 of 27
Specification documentation for the ISA-TAB v0.3 - 17 February 2016
File
Processed
Spectral Data File
Comment
Assay
NO
string
NMR Run, Mass Spectrometry
Run
Comment
Assay
NO
string
Gel Electrophoresis
Comment
Assay
NO
string
Mass Spectrometry Run
Comment
Assay
NO
string
Mass Spectrometry Run
Metabolite
Identification File
Comment
Assay
NO
string
NMR Run, Mass Spectrometry
Run
Performer
Comment
Study, Assay
NO
string
Protocol REF
Provider
Comment
Study, Assay
NO
string
Source
Date
Study, Assay
NO
date
Protocol REF
Description
Study, Assay
NO
string
Material
Comment [ ]
Study, Assay
NO
string
ALL
Spot Picking File
Protein
Identification File
Peptide
Identification File
27 of 27
Download