ISA-TAB specification v0.3 http://isatab.sourceforge.net Specification documentation for the ISA-TAB v0.3 Status ISA-TAB v0.3 specification document working draft: version 12 June 2008. Authors Initially drafted by one group (1), this document and the work described herein now incorporate input from many contributors, listed at the ISA-TAB project page (2) under ‘Contributors’. Documentation and contacts The current document and example files are available from the ISA-TAB project page (2). A mailing list has been made available to facilitate discussion. Comments and suggestions on this document should be addressed to Philippe Rocca-Serra (rocca@ebi.ac.uk), Susanna-Assunta Sansone (sansone@ebi.ac.uk) and the mailing list. Abstract This document describes ISA-TAB, a proposal addressing the pressing need of a group of collaborating repositories for a common structured representation with which communicate the metadata required to interpret an experiment and the associated data files. Sections from 1 to 4 introduce the ISA-TAB proposal, the rationale behind its development, an overview of the structure and the relation to other formats; while section 5 describe the specification in details. ISA-TAB builds on the existing paradigm that is MAGE-TAB - a tab-delimited format to exchange microarray data. ISA-TAB v0.3 is largely the result of aligning MAGE-TAB v1.1 and the previous ISA-TAB (v0.2) format. ISA-TAB v0.3 necessarily maintains backward compatibility with existing MAGE-TAB files to facilitate adoption; it preserves the simplicity of the MAGE-TAB for simple experimental designs, while incorporating new features to capture the full complexity of experiments employing a combination of technologies, where this is necessary. Like MAGE-TAB before it, ISA-TAB is simply a format; the decision on how to regulate its use (i.e. enforcing completion of mandatory fields or use of a controlled terminology) is left to individual implementers. N.B. Before reading this document, an in-depth knowledge of the MAGE-TAB specification is required (3). 1 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Table of Contents 1. ISA-TAB introduction .........................................................................................................................................3 1.1 Rationale ........................................................................................................................................................3 1.3 Definitions .......................................................................................................................................................3 2. ISA-TAB v0.3 structure overview ......................................................................................................................4 2.1 Investigation file ..............................................................................................................................................4 2.2 Study file .........................................................................................................................................................4 2.3 Assay file ........................................................................................................................................................4 3. Relating ISA-TAB to other formats and requirements ....................................................................................5 3.1 Commonalities and difference with MAGE-TAB v1.1 ....................................................................................5 3.2 Relation to FuGE: rendering FuGE in ISA-TAB via XSL transformations ......................................................5 3.3 Relation to biomedical tabular formats ...........................................................................................................5 3.4 Minimal content and terminology ...................................................................................................................5 4. ISA-TAB development process .........................................................................................................................6 4.1 Communities and workshops .........................................................................................................................6 5. ISA-TABv0.3 detailed structure .........................................................................................................................6 5.1 Investigation file ..............................................................................................................................................6 5.1.1 Ontology source section ..............................................................................................................................9 5.1.2 Investigation section ....................................................................................................................................9 5.1.3 Study section ...............................................................................................................................................9 5.1.4 Study Factor section................................................................................................................................. 10 5.1.5 Study Assay section ................................................................................................................................. 10 5.1.6 Protocol section ........................................................................................................................................ 10 5.1.7 Study Contact section .............................................................................................................................. 11 5.2 Study file ...................................................................................................................................................... 12 5.3 Assay file ..................................................................................................................................................... 13 5.3.1 Technology Type: DNA microarray hybridization ..................................................................................... 13 5.3.2 Technology Type: Gel electrophoresis ..................................................................................................... 14 5.3.3 Technology Type: Mass Spectrometry (MS) ............................................................................................ 16 5.3.4 Technology Type: Nuclear Magnetic Resonance spectroscopy (NMR) .................................................. 17 5.3.5 Technology Type: High throughput sequencing ...................................................................................... 17 6. References ....................................................................................................................................................... 19 Figure 1 - ISAarchive ........................................................................................................................................... 20 Table 2 – Type of value in the Investigation File .............................................................................................. 21 Table 3 – Number of value in the Investigation File ......................................................................................... 23 Table 4- Node headers in the Study and Assay files ....................................................................................... 25 Table 5 – Attribute headers in the Study and Assay files ............................................................................... 26 2 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 1. ISA-TAB introduction 1.1 Rationale The Investigation / Study / Assay (ISA) tab-delimited (TAB) format is a general purpose framework with which to communicate both metadata (i.e., contact details, sample characteristics, technologies used, etc.) and data from (meta)genomics-, transcriptomics-, proteomics- and metabol/nomics-based experiments (hereafter referred as ‘omics-based’ experiments). The general motivation for this work stems from the needs of: the BioInvestigation Index project at EBI (4) to create a common structured representation of the metadata that is required to interpret an experiment with which to facilitate combined submission to ArrayExpress (5), Pride (6), and in the near future, a metabolomics repository; a group of collaborative systems (7,8) either committed to pipelining omics-based experimental data into EBI public repositories or willing to exchange data among themselves, or to enable their users to import data from public repositories into their local systems. ISA-TAB could be viewed as an extended version of the MicroArray Gene Expression (MAGE) TAB, a format that supports the management, exchange and submission of microarray-based experiment data and metadata (9). MAGE-TAB was designed for use by laboratories with little or no bioinformatics support, who cannot therefore deal with the complexity of the MAGE Markup Language (ML) format (10). ISA-TAB builds on the MAGE-TAB paradigm, and shares its motivation for the use of tab-delimited text files; i.e., that they can easily be created, viewed and edited by researchers, using spreadsheet software such as Microsoft Excel. ISA-TAB also employs MAGE-TAB syntax as far as possible, to ensure backward compatibility with existing MAGE-TAB files, a described further in section 3.1. However, ISA-TAB has a number of additional features that make it a more general framework that can comfortably accommodate multi-domain experimental designs. It was also important to align the concepts in ISA-TAB with some of the objects in the Functional Genomics Experiment (FuGE) model (11, 12), partly as the FuGE model is integral to the development of MAGE-ML v2. These additional features are described further in section 3.2. In particular, the rationale behind ISA-TAB’s extended structure is built on the need to capture the complexity of an experiment employing a combination of technologies. For example, experiments looking at changes both in (i) the metabolite profile of urine, and (ii) gene expression in the liver, in subjects treated with a compound inducing liver damage (by mass spectrometry and microarray technologies, respectively). 1.3 Definitions Investigation, Study and Assay are the three key entities (9) around which the ISA-TAB framework is built. They assist in structuring and classifying information relevant to the subject under study and the different technologies employed. Note that ‘subject’ as used above could to refer inter alia to an organism, or tissue, or an environmental sample. Study is the central unit, containing information on the subject under study, its characteristics and any treatments applied. A Study has associated Assays. These are tests performed either on material taken from the subject, (hereafter referred to as a ‘specimen’) or on the whole initial subject, which produce qualitative or quantitative measurements (data). Assays can be characterized as the smallest complete unit of experimentation; i.e. one hybridization equals one assay; each technical replicate represents an additional assay; one LC-MS run equals one assay; a single clinical chemistry assay is (of course) one assay; a multiplexed (^n) microarray equals n assays; and a MALDI MS chip with n spots could perform up to n assays (i.e. all spots analyzed). Investigation is a higher-order object that helps to group related Studies. It should be noted that the word ‘experiment’ has been deliberately avoided. A comparison of ArrayExpress and Pride revealed that ‘experiment’ is used to refer to objects at different levels of granularity in each; i.e. to refer to a set of multiple related hybridizations in ArrayExpress, but only a single gel-based separation run in Pride. Following the abstractions proposed here, an experiment in ArrayExpress would be equivalent to a Study. The choice of Study as the central unit of the ISA-TAB proposal is supported by its use in existing biomedical formats, such as the Study Data Tabulation Model (SDTM), which encompasses both the Standard for Exchange of Nonclinical Data (SEND, 13) and the Clinical Data Interchange Standards Consortium (CDISC, 14). SDTM has been endorsed by the US Food and Drug Administration (FDA) as the preferred way to organize, structure and format both clinical and nonclinical (toxicological) data submissions (15, 16). See also section 3.3. 3 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 2. ISA-TAB v0.3 structure overview The ISA-TAB uses three types of files to capture the experimental metadata: Investigation file Study file Assay file (with associated data files) The Investigation file contains all the information needed to interpret the experiment, while the experimental steps (or sequence of events) are described in the Study and in the Assay file(s). For each Investigation file there will be one or more Study files; for each Study file there will be one or more Assay files. For submission or transfer, files can be packaged into an ISArchive as shown in Figure 1. Each file has a predefined structure, with fields being organized on a per-column or per-row basis; each file is described briefly in the subsections below and more fully in the section 5. 2.1 Investigation file In this file the fields are organized on a per-column basis and divided in sections. The Investigation file is intended to meet three needs: (1) to record all declarative information referenced in the other files; (2) to relate Assay files to Study files; and optionally, (3) to relate Study file to an Investigation (this only becomes necessary when two or more Study files need to be grouped). The declarative sections cover general information such as contacts, protocols and equipment, and also –where applicable- the description of terminologies (controlled vocabularies or ontologies) and other annotation resources that were used. The optional Investigation section is a flexible solution to group two or more Study files, as required by several use cases. In the toxicogenomics domain, for example, acute toxicity studies are followed by long term toxicity studies and in vitro toxicity studies. For clarity, the users would link these Study files by filling the Investigation section. Another example comes from the environmental genomics domain, where several studies carried out in the same area can be usefully related under the same Investigation. See also section 5.1. 2.2 Study file In this file the fields are organized on a per-row basis. The Study file contains contextualizing information for one or more assays, for example; the subjects studied; their source(s); the sampling methodology; their characteristics; and any treatments or manipulations performed to prepare the specimens. See also section 5.2. 2.3 Assay file In this file the fields are again organized on a per-row basis. The Assay file represents a branch of the experimental graph, containing information about a certain set (or type) of assays, defined by the endpoint measured (i.e. gene expression) and the technology employed (i.e. DNA microarray). Assay-related information includes protocols, additional information in part relating to the execution of those protocols and references to data files (whether raw, processed or normalized). Note that in the case of microarray assays the Array Description Format (ADF) and Final Gene Expression Data Matrix (FGDM) are also referenced. See also section 5.3. For example, in a study looking at the effect of a compound inducing liver damage in subjects by characterizing the metabolic profile of urine (by NMR spectroscopy) and measuring protein and gene expression in the liver (by mass spectrometry and DNA microarrays, respectively): The Study file will contain information on the subjects studied, their source(s) and characteristics, the description of the treatment with the hepatotoxicant and the steps undertaken to take the specimens (urine and liver) from the treated subjects for the assays. The Assay file for the urine metabolic profile (endpoint) by NMR spectroscopy (technology) will contain the (stepwise) description of the methods by which the urine was processed for the assay, subsequent steps and protocols, and the link to the resultant data files. The Assay file for the gene expression measurements by DNA microarray, and similarly the Assay file for protein expression by mass spectrometry, will contain the (stepwise) description of how the RNA or protein extract were prepared from the liver (or a section) and prepared for assaying, subsequent steps and protocols, and the link to the resultant data files. 4 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 3. Relating ISA-TAB to other formats and requirements 3.1 Commonalities and difference with MAGE-TAB v1.1 The ISA-TAB v0.3 proposal results from the alignment between MAGE-TAB v1.1 and ISA-TAB v0.1. This alignment process aimed to fulfill the need for a more general framework, that nonetheless maintained backward compatibility with existing MAGE-TAB files, ultimately to facilitate the adoption of one common format. ISA-TAB v0.2 contains all the fields in MAGE-TAB, plus others required by the Pride repository, which supports the Proteomics Standards Initiative (PSI, 17). ISA-TAB introduces additional structure to accommodate complex, multi-domain experiments, and to place the concepts in line with the FuGE objects. While no changes have been made to the ADF and the FGDM files, the changes affect the MAGE-TAB Sample and Data Relationship Format (SDRF) and Investigation Description Format (IDF) are as follows: The content of MAGE-TAB SDRF has been divided between the ISA-TAB Study and Assay files and their relationship is declared in the ISA-TAB Investigation file. The content of the Study file corresponds to the SDRF fields (on the left) describing the subjects studied, source(s), characteristics etc, while the content of the Assay file can be mapped to the SDRF fields (on the right) containing assay-related information and the link to the resultant data files. Although, the SDRF can be split into two or more files, MAGE-TAB does not require a particular division and does not identify their content or regulate their order during the parsing process. The MAGE-TAB IDF corresponds largely to the ISA-TAB Investigation file. The latter reuses many fields from the IDF but in a slightly different order and divided into sections that can be repeated accordingly; the intent being to enhance readability and presentation, and possibly to help with parsing. In the MAGE-TAB IDF, Investigation is used as a synonym for ‘experiment’ (sensu MIAME, 18) to link related hybridizations. In the ISA-TAB Investigation file, the homonymous section is used sensu FuGE, to relate (list and order) related Study files and will also serve to facilitate the alignment with FuGE investigation object. For more details refer to the section 5. 3.2 Relation to FuGE: rendering FuGE in ISA-TAB via XSL transformations The ISA-TAB format could be seen as competing with XML-based formats, whether existing or under development, such as the FuGE-ML. However, ISA-TAB addresses the immediate need for a framework to communicate for multi-omics experiments, whereas all existing FuGE-based modules are still under development. When these become available, ISA-TAB could continue serving those with little or no bioinformatics support, as well as finding utility as a user-friendly presentation layer for XML-based formats (via an XSL transformation); i.e. in the manner of the HTML rendering of MAGE-ML documents (19). Initial work has been carried out to evaluate the feasibility of rendering FuGE-ML files (and FuGE-based extensions, such as GelML and Flow-ML) in ISA-TAB layout. Examples are available at the ISA-TAB website under the document section (2), along with a report, detailing the issues faced during these transformations. When finalized, the xsl templates will also be released, along a Xpath expressions and a table ‘mapping’ FuGE objects and ISA-TAB labels. 3.3 Relation to biomedical tabular formats Where omics-based technologies are used in clinical or nonclinical studies, ISA-TAB will complement existing biomedical formats such as the SDTM. It is inevitable that some information will be duplicated between the two frameworks, but this is not generally seen as a major problem. A reference system, probably based on the subject source, will be needed to link the two files. 3.4 Minimal content and terminology As with MAGE-TAB, ISA-TAB is no more than the format with which to communicate information. Both ‘minimum information’ requirements and the use of controlled vocabularies/ontologies are beyond the scope of this proposal, and are the focus of related efforts. ‘Minimal information’ checklists are under development both by individual communities for their particular domains of interest and collaboratively through the Minimal Information for Biological and Biomedical Investigations (MIBBI, 20) project. Efforts proceeding under the Open Biomedical Ontology (OBO, 21) Foundry umbrella will provide both 'universal' terms to describe studies and assays, via the 5 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Ontology for Biomedical Investigations (OBI, 22), and ‘domain-specific’ terms relevant to a particular domain (i.e. chemical, biological, clinical). Therefore, the decision on how to regulate the use of the ISA-TAB (marking certain fields mandatory, or enforcing the use of controlled terminology) is a matter for those who will implement the format in their system. Although certain fields would benefit from the use of controlled terminology, ISA-TAB files with all fields left empty are syntactically valid, as are those where all fields are filled with free text values rather than controlled vocabulary or ontology terms. 4. ISA-TAB development process 4.1 Communities and workshops Created to fulfill the need of the Bio Investigation Index project, the first straw-man proposal, ISA-TAB v0.1, was presented and discussed in a workshop held at the European Bioinformatics Institute (EBI), Cambridge, UK, on 6-8 December, 2007, funded by the UK Biotechnology and Biological Sciences Research Council (BBSRC BB/E025080/1). The goal of this first workshop was the creation of an ‘exchange network test bed’, including heterogeneous nodes, such as public and proprietary repositories, public and commercial software, by academic, industrial and governmental groups. Participants included a group of collaborative systems interested in leveraging on common standards to report omics-based experiments, also with experience of building standards and actively involved or leading relevant efforts. The workshop report (23) is also available from the ISA-TAB website (2). Overall, this workshop produced a general consensus on the importance of the ISA-TAB to address the immediate need for a communication framework for multi-omics experiments. A series of technical challenges were also identified - along with the need to align the ISA-TAB v0.1 with MAGE-TAB- the majority of which have been already resolved in the current ISA-TAB v0.3. Next workshop is planned for 16-17-18 June, 2008 at EBI; in this occasion the current version will be revised – if necessary- in the light of a series of real case examples that are being produced by the participating communities. The agenda will include also a discussion on and examples of implementations and tools that need to be developed to assist the users in the creation of ISA-TAB files. 5. ISA-TABv0.3 detailed structure As stated in the introduction, a number of design decisions are carried over from MAGE-TAB work. In particular: Field separator Tabulation is the official separator accepter in ISA-TAB and MAGE-TAB files. Node Names used as identifier All strings found in Node fields (identified by a header with a ‘Name’ tag) will be regarded as object identifiers. Therefore, they should locally unique within an ISA-TAB submission. As long as this requirement is met, there are specific requirements about formatting. However, we encourage following the guidelines detailed in the MAGE-TAB specifications in section 3.1.3. Referencing As described in MAGE-TAB specifications in section 3.1.4, Objects identified in the Investigation File can be referenced in either the Study or Assay files. A REF tag is used to indicated that fact, such Protocol REF and Term Source REF. Parameters and Factors declared in the IDF are referenced in a different way as their values will be specified during data entry relying to [] mechanism instead. Each file is described in details in the sections below. As guidance, Table 2 shows the type of values allowed per each field or header and comments some implementers may find useful. 5.1 Investigation file In this file the fields are organized on a per-column basis and divided in sections. Table 3 shows the number of value allowed per each field, although, as stated in section 3.4, ISA-TAB files with all fields left empty are syntactically valid. The example shows how to record one Study file and two Assay files: Ontology Source Reference Term Source Name Term Source File CTO MO http://obo.sourceforge.net/cgibin/detail.cgi?cell http://mged.sourceforge.net/ontol ogies/MGEDontology.php 6 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Term Source Version Term Source Description 1.3.0.1 The Cell Type Ontology The Microarray Ontology Investigation Investigation Title Investigation description Date of Investigation submission Investigation Public Release Date (#start of repeatable block) Studies Study Identifier Study Title STD1 The Influence of Pharmacogenetics on Fatty Liver Disease in the Wistar and Kyoto Rats: A Combined Transcriptomic and Metabonomic Study Date of Study Submission DD/MM/YYYY Study Public Release Date PubMed ID Publication DOI 13/12/2006 17203948 10.1021/pr0601640 Publication Author list Publication Title Publication Status Griffin JL, Scott J, Nicholson JK. The influence of pharmacogenetics on fatty liver disease in the wistar and kyoto rats: a combined transcriptomic and metabonomic study. Study Description indexed for MEDLINE Analysis of liver tissue from rats exposed to orotic acid for 1, 3, and 14 days was performed by DNA microarrays and high resolution 1H NMR spectroscopy based metabonomics of both tissue extracts and intact tissue (n ) 3). Study Design time course design Study Design Term Accession OBI_11215 Study Design Term Source REF Study File Name OBI study-1.txt Study Factors Factor Name Time Treatment Factor Type Time Stressor Factor Type Term Accession OBI_XXXx OBI_XXXx Factor Type Term Source REF OBI OBI Measurements/Endpoints Name Gene Expression Metabolite Characterization Measurements/Endpoints Term Accession OBI_XXXx OBI_XXXx Measurements/Endpoints Term Source REF OBI OBI DNA microarray 1D 1H NMR spectroscopy OBI_XXXx OBI_XXXx OBI OBI Assay-Tx.txt Assay-Mx.txt standard procedure 1 griffin procedure 2 Study Assays Technology Type Technology Type Term Accession Technology Type Term Source REF Assay File Name Study Protocols Protocol Name 7 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Protocol Type animal procedure nucleic acid extraction Protocol Type Term Source REF OBI OBI Protocol Type Term Accession OBI_XXXx All animal procedures conformed to Home Office, UK, guidelines for animal welfare. Male Wistar rats (n ) 3 for each time point; control animals fed control diet for the same time period; Charles River UK Ltd.) were fed either standard laboratory chow, or chow supplemented with 1% orotic acid (Sigma Aldrich, UK) ad libitum.5-6 Rats were killed by cervical dislocation at days 0, 1, 3 and 14, and the left lateral lobe of the liver excised. Tissues were snap frozen and stored at -80 °C. OBI_XXXx diet;population density extracted product; amplification Person Last Name Griffin Fleckenstein Person First Name Julian Scott Protocol Description Total RNA was extracted by RNA Isolation Kit (Stratagene) from the livers of Wistar rats at day 0 (n ) 3), day 1 and 3 (n ) 2), and day 14 (n ) 3). Protocol Contact Protocol Parameter Protocol Parameter Type Term Accession Protocol Parameter Type Term Source REF Protocol Instrument Name Protocol Instrument Components Protocol Instrument Components Term Accession Protocol Instrument Components Term Source REF Protocol Instrument Parameter Protocol Instrument Parameter Term Accession Protocol Instrument Parameter Term Source REF Protocol Software Protocol Software Name Protocol Software Version Processing Method Parameter Processing Method Parameter Term Accession Processing Method Parameter Term Source REF Study Contacts Person Mid Initial Person Email jlg40@mole.bio.cam.ac.uk Person Phone 44*(0)*1223766002 Person Fax Person Address University of Cambridge Imperial College London Person Affiliation Department of Molecular Biology Genetics and Genomics Research Institute Person Role submitter;investigator investigator Person Role Term Accession OBI_XXXx OBI OBI_XXXx OBI Person Role Term Source REF (#end of repeatable block) The following paragraph details the organization of the individual sections. 8 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 5.1.1 Ontology source section This annotation section is common to the MAGE-TAB format. Table 2 shows the type of values allowed per each field or header including those that would benefit from the use of a controlled terminology –. However, this decision is left to individual implementers and, as stated in section 3.4, ISA-TAB files with all fields filled with free text values rather than controlled vocabulary or ontology terms are valid. Term Source Name The names of the Term Sources (ontologies or databases) used within the ISA-TAB document. This name will be used in all corresponding "Term Source REF" fields. Examples: OBI, GO, DO. Used as an identifier within the ISA-TAB document. Term Source Namespace Location A file name or valid pointer to an official resource to allow cross validation and version tracking of terms used in a submission. Term Source Version The version of the Term Source used throughout the ISA-TAB document. 5.1.2 Investigation section In this section the fields are organized on a per-column basis. The optional Investigation section is a flexible solution to group two or more Study files, where required. The name of the Study file, however, should be used even when one study is performed. Investigation Title A concise name given to the investigation Investigation Description A textual description of the investigation Date of Investigation Submission To provide the date on which the investigation was reported to the repository. Investigation Public Release Date To provide the date on which the investigation should be released publicly. 5.1.3 Study section [#start of repeatable block] This section can be repeated according to the number of Study files. This repeatable blocks are arranged vertically; the intent being to enhance readability and presentation, and possibly to help with parsing. Study Identifier A unique identifier: temporary identifier supplied by users or generated by repository / database. It could be (but no necessarily) an identifier complying with LSID specification. Study Title A concise phrase used to encapsulate the purpose and goal of the study. Study Public Release Date To provide the date on which the study should be released publicly. PubMed ID The PubMed IDs of the publication(s) associated with this investigation (where available). Publication DOI A Digital Object Identifier (DOI) for each publication (where available). Publication Author List The list of authors associated with each publication. Publication Title The title of publication associated to the investigation. Publication Status A term describing the status of each publication (i.e. "submitted", "in preparation", "published"). Terms for this field should come from OBI. Publication Status Term Accession The accession number from the Source associated to the selected term. Publication Status Term Source REF Source REF have to match one the Term Source Names declared in the in the annotation section. Study Description A textual description of the study, including section such as objective or goals. Study Design 9 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 A term allowing the classification of the study. Study Design Term Accession The accession number from the Source associated to the selected term. Study Design Term Source REF Study Design Term Source REF has to match one the Term Source Names declared in the annotation section. Study File Name A field to indicate the Study file, to ensure validation of the ISAarchive. For each study, provide a Study file formatted according to the specifications detailed in section 5.2. Multiple files may be reported using a semi-colon (;) within cell between file names. In a valid ISArchive, this field should always have at least one value. In that case, it simply indicates that there is only one study and the other fields for investigation would be left empty 5.1.4 Study Factor section Factor Name The name of the Factors used in the Study and/or Assays files. Factors should correspond to the independent variable manipulated by experimentalists and intended to affect biological systems in such a way that an assay is devised to measure the responses of the biological system to the perturbation by following a response variable (also known as dependent variable). Factor Type A term allowing the classification of Factor in categories. Factor Type Term Accession The accession number from the Source associated to the selected term. Factor Type Term Source REF Source REF have to match one the Term Source Names declared in the annotation section. 5.1.5 Study Assay section It has been devised as a way to declare the various types of assays and to determine also the number of Assay file reported. Measurements/Endpoint Name Terms to qualify the endpoint, what is being measured, i.e. are gene expression profiling, protein identification. Measurements/Endpoint Term Accession The accession number from the Source associated to the selected term. Measurements/Endpoint Term Source REF Source REF have to match one the Term Source Names declared in the annotation section. Technology Type Term Accession The accession number from the Source associated to the selected term. Technology Type Term to identify the technology used to perform the measurement, i.e. DNA microarray, mass spectrometry. Technology Type Term Source REF Source REF have to match one the Term Source Names declared in the annotation section. Assay File Name A field to list the Assay file(s), to ensure validation of the ISA archive. For each assay, provide an Assay file formatted according to the specifications detailed in section 5.3. Multiple files may be reported using a semi-colon (;) within cell between file names. 5.1.6 Protocol section Columns are used to report information on protocol: one protocol requires 1 column to be reported Protocol Name The names of the protocols used within the ISA-TAB document. These will be referenced in the Study and Assay files in the "Protocol REF" columns. Used as an identifier within the ISA-TAB document, these can also be accession values. In such case decisions about how to deal with the protocol information are left to data curators and individual implementers. For instance, an importer tool could be designed so that, in case an existing protocol is mentioned by means of a public accession, only those fields which are non-empty in the ISA-TAB are updated in the target repository. 10 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Protocol Type Term to classify the protocol. Protocol Type Term Accession The accession number from the Source associated to the selected term. Protocol Type Term Source REF Source REF have to match one the Term Source Names declared in the annotation section. Protocol Description A free-text description of the protocol. This text is included in a single tab-delimited field. If you wish to include tab or newline characters as part of this text, you must enclose the whole text within double quotes ("). Protocol Contact If used, the contact should be declared in the Contact section and referenced here. Protocol Parameters A semicolon-delimited list of parameter names; these names are used in the Study and Assay files (as "Parameter Value [<parameter name>]" headings) to list the values used for each protocol parameter. If more than one parameter was used for a given protocol, they should be separated with semicolons (";"). Used as an identifier within the ISA-TAB document. Protocol Parameter Term Accession The accession number from the Source associated to the selected term. Protocol Parameter Term Source REF Source REF have to match one the Term Source Names declared in the annotation section. Protocol Instrument Name The instrument used by the protocol. Protocol Instrument Components Terms to indicate key part of an instrument set up. Protocol Instrument Components Term Accession The accession number from the Source associated to the selected term. Protocol Instrument Components Term Source REF Source REF have to match one the Term Source Names declared in the annotation section. Note: Protocol Instrument Components is a special case as it is thought as capable of holding semicolon (";") separated values to allow reporter of several instrument components. The field should be used in combination with keywords such as detector type, analyzer type, source type which would allow sufficient information to be reported when describing a mass spectrometer (and meet so minimal reporting standards as defined for instances by MIBBI). Refer to the example for illustration of this mechanism Protocol Instrument Parameter To indicate an important parameter attached to the Instrument. Protocol Instrument Parameter Term Accession The accession number from the Source associated to the selected term. Protocol Instrument Parameter Term Source REF Source REF have to match one the Term Source Names declared in annotation section. Protocol Software Name The name of the software used in a protocol. Protocol Software Version The version of the software used in a protocol. 5.1.7 Study Contact section Columns are used to report information on protocol: one protocol requires 1 column to be reported Person Last Name The last name of each person associated with the investigation or study. Person First Name The first name of each person associated with the investigation or study. Person Mid Initials The middle initials of each person associated with the investigation or study. Person Email The email address of each person associated with the investigation or study. Email could be used as internal identifier to reference persons within the TABs. However, if this is effective and practical, there might be privacy issues related to relying on email as person tracker. 11 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Person Phone The telephone number of each person associated with the investigation or study. Person Fax The Fax number of each person associated with the investigation or study. Person Address The street address of each person associated with the investigation or study. Person Affiliation The organization affiliation for each person associated with the investigation or study. Person Roles Term to classify the role(s) performed by each person. Multiple annotations or values attached to one person may be provided by using a semicolon (";") a separator, for example: "submitter;funder;sponsor”. Person Roles Term Accession The accession number from the Source associated to the selected term. Person Roles Term Source REF Source REF have to match one the Term Source Names declared in the annotation section. [#end of repeatable block] 5.2 Study file In this file the fields are organized on a per-row basis. The Study file contains contextualizing information for one or more assays. Example of few field is shown below: Source Name Characteristics[Material Type] Term Source REF MO Term Accession xxxx Study1.animal1 whole_organism Study1.animal2 whole_organism {Term Name} Rattus norvegicus MO xxxx Study1.animal3 whole_organism {Term Name} Rattus norvegicus MO xxxx {Term Name} Rattus norvegicus {Term Name} Rattus norvegicus Characteristics[Organism] Study1.animal4 whole_organism MO xxxx Study1.animal5 whole_organism MO xxxx {Term Name} Rattus norvegicus {Term Name} Rattus norvegicus Study1.animal6 whole_organism MO xxxx Study1.animal7 whole_organism MO xxxx {Term Name} Rattus norvegicus Study1.animal8 whole_organism MO xxxx {Term Name} Rattus norvegicus {Term Name} Rattus norvegicus Study1.animal9 whole_organism MO xxxx Study1.animal10 whole_organism MO xxxx {Term Name} Rattus norvegicus Study1.animal11 whole_organism MO xxxx {Term Name} Rattus norvegicus MO xxxx {Term Name} Rattus norvegicus Study1.animal12 whole_organism The following fields describe the column headers: Source Name Sources are considered as the starting biological material used in a study. Source items can be qualified using the following headers: Characteristics [], Term Source REF, Unit, Provider, Description, and Comment []. Sample Name Samples represent major outputs resulting from a protocol application but which cannot be treated as an Extract or a Labeled Extract. Sample items can be qualified using the following headers: Characteristics [], Term Accession, Term Source REF, Unit and Comment []. Characteristics [<category term>] Used as a qualifying field following Source Name, Sample Name. This column contains terms describing each material according to the characteristics category indicated in the column header. For example, a column header "Characteristics [OrganismPart]" would contain individual OrganismPart terms. If these terms come from a controlled terminology/ontology, the accession number could be supplied using a Term Accession header. The source would then be indicated using a Term Source REF column. If the characteristic being reported is a measurement, a Unit [] column may be used to qualify further. Protocol REF 12 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 This column contains references to Protocol Names defined in the Investigation File, or accession numbers of protocols already present in public repositories, such ArrayExpress or Pride. Valid qualifying headers for Protocol REF item are Parameter [Value], Performer, Date, and Comment [] (which is optional) Factor Value [<factor name>] Factor Value column is key to provide the actual values taken by independent variables manipulated by the operator. Those study variables also known as Factors should have been declared in the Investigation File and the Factor Name should be recalled between the square brackets of the Factor column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description Unit, Unit Term Source REF in case of quantitative values and Term Accession, Term Source REF in case of qualitative values. 5.3 Assay file In this file, the fields are again organized on a per-row basis and define records. The Assay file represents a set of assays, defined by the endpoint measured or overall purpose of the assay (e.g. metabolite profiling) and the technology employed (i.e. Mass spectrometry). The following subsections provide a list of fields focusing on microarray hybridization, gel electrophoresis, mass spectrometry and high throughput sequencing technologies. However, additional work is required to complete this section and to define the format of the resulting data matrices for the different technologies, like it has been done for the microarray (FGDM in MAGE-TAB), peptide and protein identification (and used in PRIDE Harvest). 5.3.1 Technology Type: DNA microarray hybridization When dealing with DNA microarray technology, the allowed fields essentially correspond to those defined by the MAGE-TAB format. Extract Name Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics[], Material Type, Description, Comment[] Labeled Extract Name Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Labeled Extract material Valid qualifying header for LabeledExtract item is Label (which is mandatory) and Characteristics [], Material Type, Description, Comment [] (which are optional) Material Type Used as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: whole_organism, organism_part, cell, total_RNA. Valid qualifying header for Material Type item are Term Accession and Term REF. Label Used as an attribute column following Labeled Extract Name to indicate the compound linked to an Extract to create the Labeled Extract. Examples: Cy3, Cy5, biotin, alexa_546. Valid qualifying headers for Label item are Term REF. Hybridization Name Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Hybridization. Valid qualifying headers for Hybridization Name item are ArrayDesign REF (which is mandatory) and Comment [] (which is optional) Array Design REF This column contains references to the array design used for individual hybridizations. For ArrayExpress submissions this should be a valid accession number, e.g. "A-AFFY-33" but for the purpose of data exchange, it should be an unambiguous name such as a commercial name HG-U133A-2 in the case of an Affymetrix array. Valid qualifying header for Array Design REF item is Comment [] (which is optional). The values in this field are used as identifiers. They must match the references provided in the array description file (ADF). Submitters, curators and software tools should consider the use of public accessions for this value. Scan Name 13 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Used as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Scan Event Valid qualifying headers for Scan Name item are Description and Comment [] (which are optional) Image File This optional column contains a list of image files, one for each row of the Assay file, linking these image files to their corresponding hybridizations. Note that ArrayExpress does not store image data due to storage constraints on the database. However, in the context of an infrastructure intending to use the format for data exchange purposes, this column is valuable to include links to image files stored on local web server. Valid qualifying headers for Image File item is Comment [] (which is optional) Normalization Name Used as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Normalization event. Valid qualifying headers for Normalization Name item are Description and Comment [] (which are optional) Array Data File (*) This column contains a list of raw data files, one for each row of the Assay file, linking these data files to their respective hybridizations. Valid qualifying header for Array Data File item is Comment [] (which is optional) Derived Array Data File (*) This column contains a list of processed data files, one for each row of the SDRF file, linking these data files to their respective hybridizations. Valid qualifying header for Derived Array Data File item is Comment [] (which is optional) Array Data Matrix File (*) This column contains a list of raw data matrix files, where data from multiple hybridizations is stored in a single file, and the data mapped to individual hybridization via the Data Matrix format itself. Valid qualifying header for Array Data Matrix File item is Comment [] (which is optional) Derived Array Data Matrix File (*) This column contains a list of processed data matrix files, where data from multiple hybridizations is stored in a single file, and the data mapped to each hybridization (or scan, or normalization) via the Data Matrix format itself. Valid qualifying header for Derived Array Data Matrix File item is Comment [] (which is optional) Factor Value [<factor name>] Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purposes, it could be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and factor values. Examples: In ISA_TAB-V2_Griffin-PMID17203948 .xls available from http://isatab.sf.net, the Assay-Tx worksheet illustrates a typical usage of the fields to compose a report detailing a series of hybridization event carried out on Affymetrix platform on a series of samples whose origin and treatement are described on the Study worksheet. Discussion: Note that On the Assay-Tx,Factor Values have not been cascaded from the Study worksheet but could have been, in order to allow for sorting to easily identify datafiles corresponding to a particular treatment group 5.3.2 Technology Type: Gel electrophoresis Extract Name Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics[], Material Type, Description, Comment[] Labeled Extract Name (optional, where relevant) Labeled Extract should be reported when Differential Gel Electrophoresis (DIGE) is employed. The field is used as an identifier within the ISA-TAB document. This column contains user-defined names for 14 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 each Labeled Extract. Valid qualifying headers for LabeledExtract item are Label (which is mandatory) and Characteristics [], Material Type, Description, Comment []. Material Type Used as an attribute column following Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: whole_organism, organism_part, cell, fraction. Label (optional, where relevant) Used as an attribute column following Labeled Extract Name to indicate the compound linked to an Extract to create the Labeled Extract. Examples: Cy2, Cy3, Cy5. A valid qualifying header for Label item is Comment [] Gel Electrophoresis Name Used as an identifier within the ISA-TAB document. This column contains user-defined names for each electrophoresis gel. Valid qualifying headers for Electrophoresis Gel Name item are Protocol REF, Parameter [] Comment [] (which is optional). For specific 2D applications, the following headers can be used instead: First Dimension This column hold a control term e.g. Isoelectrofocusing from an identified ontological resource If the case, Term accession and Term Source REF should be used Second Dimension This column hold a control term e.g. SDS-PAGE from an identified ontological resource If the case, Term accession and Term Source REF should be used Electrophoresis Type (To be discussed…..) Scan Name Used as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Scan event. Valid qualifying headers for Scan Name item are Protocol REF, Parameter [], Performer, Date and Comment [] (which is optional) Normalization Name Used as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Normalization event. Valid qualifying headers for Normalization Name item are Protocol REF, Parameter [], Performer, Date and Comment [] (which is optional) Image File This optional column contains a list of image files, one for each row of the Assay file, linking these image files to their respective electrophoresis events. Raw Data File (*) This column contains a list of raw data files, one for each row of the Assay file, linking these data files to their respective gels. A valid qualifying header for Raw Data File item is Comment [] (which is optional) Processed Data File (*) This column contains a list of processed data files, one for each row of the Assay file, linking these data files to their respective gel runs. Valid qualifying headers for Processed Data File item is Comment [] (which is optional) Spot Picking File This column contains a file name pointing to files hosting protein spot coordinates and metadata for use by spot picking instruments, typically for downstream analysis by Mass spectrometry. Factor Value [<factor name>] Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Investigation File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purposes, it is be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and Factor Values. Examples: An example available from http://isatab.sourceforge.net/docs/DIGEexample_from_amersham_v4.xml shows a rendering of GelML encoded data in a ISA-TAB Gel electrophoresis Assay tab. 15 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 5.3.3 Technology Type: Mass Spectrometry (MS) Extract Name Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract Name node are Characteristics [], Material Type, Description, Comment [] Labeled Extract Name (where relevant) Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Labeled Extract material. Valid qualifying headers for LabeledExtract node are Label (which is mandatory) and Characteristics [], Material Type, Description, Comment [] (which are optional) Material Type Used as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: column fraction, gel excised spot. Label Used as an attribute column following Labeled Extract Name to indicate the compound linked to an Extract to create the Labeled Extract. Examples: P33, C14. Valid qualifying headers for Label item are and Term Source REF Mass Spectrometry Run Name Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Assay. Valid qualifying headers are Protocol, Parameter [], Performer, Date and Comment [] (which is optional) Under Discussion: Could be declared in the Protocol Instrument Section The following columns can be used to annotate Assay Name columns: Source A term to classify the type of source. Valid qualifying header are Term Accession, Term Source REF Analyzer A term to classify the type of analyzer. valid qualifying header are Term Accession, Term Source REF. Detector A term to classify the type of detector. Valid qualifying header are Term Accession, Term Source REF Raw Spectral Data File (MzData File for Proteomics applications) This column contains a list of raw data files, one for each row of the Assay file. Processed Spectral Data File This column contains a list of raw data files, one for each row of the Assay file. Normalized Spectral Data File This column contains a list of raw data files, one for each row of the Assay file. Factor Value [<factor name>] Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Investigation File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term accession, Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purpose, it could be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and FactorValues. When Mass Spectrometry is used in proteomics the following information will be required. Peptide Identification File (*) This data file should be formatted following the PSI-MS specifications and according to Pride submission requirements (8). Protein Identification File (*) This data file should be formatted following the PSI-MS specifications and according to Pride submission requirements (8). Post Translational Modification File (*) This data file should be formatted following the PSI-MS specifications and according to Pride submission requirements (8). Metabolite Identification File (*) 16 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 This data file should be formatted following the MSI-MS specifications and according to MSI submission requirements (24). When Mass Spectrometry is used in metabol/nomics a list of requirements is under development in collaboration with the Metabolomics Standards Initiative (MSI, 24) representatives. 5.3.4 Technology Type: Nuclear Magnetic Resonance spectroscopy (NMR) Extract Name Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics [], Material Type, Description, Comment [] Material Type Used as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: whole_organism, organism_part, cell, protein_extract. Valid qualifying header for Material is Term REF NMR Run Name Used as an identifier within the ISA-TAB document. This column contains user-defined names for each NMR run. The following columns can be used to annotate NMR Run Name columns: Instrument This column contains a reference to instruments declared in the protocol section. Free Induction Decay Data File This column contains a list of raw data files, one for each row of the Assay file. Valid qualifying header for Free Induction Decay Data File (Assay Data File) item is Comment [] (which is optional). Refer to the Annex for a list of data format generated by most commonly used NMR instruments. Acquisition Parameter Data File [NMR pulse sequence] This column contains a list of files detailing acquisition parameters in particular, a file containing the NMR pulse sequence must be provided. Refer to the Annex for a list of data format generated by most commonly used NMR instruments. Processed Spectral Data File This column contains a list of processed spectral data files, one for each row of the Assay file. Valid qualifying header for Processed Spectral Data File item is Comment [] (which is optional) Normalized Data File This column contains a list of raw data files, one for each row of the Assay file. Valid qualifying header for Raw Spectral Data File(Assay Data File) Name item is Comment [] (which is optional) Factor Value [<factor name>] Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purpose, it could be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and FactorValues. When NMR Spectroscopy is used in metabol/nomics the requirements for the final list of metabolites (identification and quantification) is under development in collaboration with the MSI representatives. 5.3.5 Technology Type: High throughput sequencing When dealing with DNA microarray technology, the allowed fields essentially correspond to those defined by the MAGE-TAB format. Extract Name Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics[], Material Type, Description, Comment[] Extract material Valid qualifying header for LabeledExtract item is Label (which is mandatory) and Characteristics [], Material Type, Description, Comment [] (which are optional) Material Type 17 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Used as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: whole_organism, organism_part, cell, total_RNA. Valid qualifying header for Label item is Term REF. Sequencing Run Name (Assay Name) Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Hybridization. Valid qualifying headers for Hybridization Name item are ArrayDesign REF (which is mandatory) and Comment [] (which is optional) Raw Data File An archive containing short read data generated by the instruments. It could also be an archive of fasta formatted sequences. Refer to the data file section for a more detailed review of accepted format generated from high throughput sequencing machines. Derived Data File A file containing the interpretation of sequence counts and assembly depending on application (i.e. genomics or transcriptomics). Examples: In http://isatab.sf.net/docs/ISA-TAB-v0.2-MIMS-Gilbert-et-al-2008v3.xlsavailable from http://isatab.sourceforge.net/examples.html the PML-Gilbert-MetaTx-Assay.txt worksheet illustrates a typical usage of the fields to compose a report detailing a series of sequencing event carried out on 454 Sequencer on a series of samples whose origin and treatement are described on the Study worksheet. Discussion: Note that On the Assay-Tx,Factor Values have not been cascaded from the Study worksheet but could have been, in order to allow for sorting to easily identify datafiles corresponding to a particular treatment group When this technology is used in metagenomics applications, the requirements are under development in collaboration with the Genomics Standards Consortium (GSC, 25) representatives. 18 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 6. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. NET project members: http://www.ebi.ac.uk/net-project ISA-TAB project page and documents: http://isatab.sourceforge.net MAGE-TAB specification: http://www.mged.org/mage-tab BioInvestigation Index project: http://www.ebi.ac.uk/net-project/projects.html#BioInvestigation Index ArrayExpress: www.ebi.ac.uk/arrayexpress Pride: www.ebi.ac.uk/pride/ Reporting Structure for Biological Investigation (RSBI) working group: http://www.mged.org/Workgroups/rsbi/index.html Sansone SA, Rocca-Serra P, Tong W, Fostel J, Morrison N, Jones AR; RSBI Members. A strategy capitalizing on synergies: the Reporting Structure for Biological Investigation (RSBI) working group. OMICS. 2006 Summer;10(2):164-71. Rayner TF, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Irizarry RA, Liu J, Maier DS, Miller M, Petersen K, Quackenbush J, Sherlock G, Stoeckert CJ Jr, White J, Whetzel PL, Wymore F, Parkinson H, Sarkans U, Ball CA, Brazma A. A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics. 2006 Nov 6;7:489. Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M, Swiatek M, Marks WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A, White J, Hubley R, Deutsch E, Senger M, Aronow BJ, Robinson A, Bassett D, Stoeckert CJ, Brazma A. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 2002;3:RESEARCH0046. doi: 10.1186/gb-2002-3-9-research0046. Jones AR, Miller M, Aebersold R, Apweiler R, Ball CA, Brazma A, Degreef J, Hardy N, Hermjakob H, Hubbard SJ, Hussey P, Igra M, Jenkins H, Julian RK Jr, Laursen K, Oliver SG, Paton NW, Sansone SA, Sarkans U, Stoeckert CJ Jr, Taylor CF, Whetzel PL, White JA, Spellman P, Pizarro A. The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics. Nat Biotechnol. 2007 Oct;25(10):1127-1133. FuGE working group: http://fuge.sourceforge.net SEND: http://www.cdisc.org/models/send/v2.3 CDISC: http://www.cdisc.org/standards FDA Data Standard Council: http://www.fda.gov/oc/datacouncil Pharmacogenomic Data Submissions - Companion Guidance http://www.fda.gov/cder/guidance/7735dft.pdf PSI: http://www.psidev.info Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001 Dec;29(4):365-71. HTML rendering of MAGE-ML documents: http://www.ebi.ac.uk/~rocca/MAGEXSLT/HTML%20rendering%20of%20MAGE.htm MIBBI: http://mibbi.sourceforge.net Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ; OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007 Nov;25(11):1251-5. OBI: http://obi.sourceforge.net Sansone SA, Rocca-Serra P, Brandizi M, Brazma A, Field D, Fostel J, Garrow AG, Gilbert J, Goodsaid F, Hardy N, Jones P, Lister A, Miller M, Morrison N, Rayner T, Sklyar N, Taylor C, Tong W, Warner G, Wiemann S; Members of the Rsbi Working Group. The First RSBI (ISA-TAB) Workshop: "Can a Simple Format Work for Complex Studies?". OMICS. 2008, ahead of print. Sansone SA, Fan T, Goodacre R, Griffin JL, Hardy NW, Kaddurah-Daouk R, Kristal BS, Lindon J, Mendes P, Morrison N, Nikolau B, Robertson D, Sumner LW, Taylor C, van der Werf M, van Ommen B, Fiehn O. The metabolomics standards initiative. Nat Biotechnol. 2007 Aug;25(8):846-8. GSC: http://gensc.org 19 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Figure 1 - ISAarchive For submission or transfer, files can be packaged into an ISArchive, as shown in this figure. ISArchive Assay(s) Study 1 Data Investigation Note that in the simple case there is only 1 Study file or 1 Assay file only Study 2..n Assay(s) Data The Investigation file - records all declarative information referenced in the other files such as contacts, protocols and equipment; - relates Assay files to Study files; - relates Study files to an Investigation (only when two or more Study files exist, as in this example). The Assay file contains information about a certain set (or The Study file type) of assays, contains defined by the endpoint contextualizing information for one or measured (i.e., gene more assays; i.e., the expression) and the technology employed subjects studied, (i.e., DNA microarray), their source(s), the including information sampling on protocols, additional methodology, their information in part characteristics, and relating to the any treatments or execution of those manipulations performed to prepare protocols and references to data files the specimens. 20 of 27 All Data files resulting from assays, including raw data files, processed and normalized data files. For DNA gene expression assays employing microarray technology, the set of data files, should include an Array Description File (ADF) and a Final Gene Expression Data Matrix (FGDM), as required by the MAGE-TAB specification. Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Table 2 – Type of value in the Investigation File Type of values allowed per each field in the Investigation File only and comments some implementers may find useful. However, as stated in section 3.4 ISA-TAB files with all fields left empty are syntactically valid, as are those where all fields are filled with free text values rather than controlled vocabulary or ontology terms. Investigation file fields Ontology Source Reference Term Source Name Type of value Term Source File Term Source Version URI text Term Source Description text Comments for implementations text Investigation Investigation Title text Investigation description text Date of Investigation submission Investigation Public Release Date date (DD/MM/YYYY) date (DD/MM/YYYY) Studies Globally unique ID, i.e. LSID could be useful Study Identifier text Study Title text Date of Study Submission Study Public Release Date date (DD/MM/YYYY) date (DD/MM/YYYY) PubMed ID accession number Publication DOI accession number Publication Author list text Publication Title text Publication Status Study Description text text controlled terminology could be useful Study Design text controlled terminology could be useful Study Design Term Accession text Study Design Term Source REF Term Source Name Study File name URI/text Study Factors Factor Name text Factor Type text Factor Type Term Accession text Factor Type Term Source REF term source Name Study Assays text Measurements/Endpoints Name Measurements/Endpoints Term Accession text text Measurements/Endpoints Term Source REF Term source name Technology Type text Technology Type Term Accession text Technology Type Term Source REF Term Source Name Assay File Name Study Protocols URI/text Protocol Name text Protocol Type text Protocol Type Term Accession text Protocol Type Term Source REF Term Source Name Protocol Description Protocol Contact text text Protocol Parameter text Protocol Parameter Type Term Accession text Protocol Parameter Type Term Source REF Term Source Name controlled terminology could be useful controlled terminology could be useful controlled terminology could be useful controlled terminology could be useful 21 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Protocol Instruments text Protocol Instrument Name Protocol Instrument Component text text Protocol Instrument Components Term Accession Protocol Instrument Components Term Source REF Protocol Instrument Parameter text Protocol Software text Protocol Software Name text Protocol Software Version Processing Method Parameter text text controlled terminology could be useful Term Source Name text Study Contacts Person Last Name text Person First Name text Person Mid Initial text Person Email Person Phone text text or numeric Person Fax text Person Address text Person Affiliation text Person Role text Person Role Term Accession Person Role Term Source REF text Term Source Name controlled terminology could be useful 22 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Table 3 – Number of value in the Investigation File Number of value allowed per each field in the Investigation File only, although, as stated in section 3.4, ISA-TAB files with all fields left empty are syntactically valid. Number of value(s) per column To report multiple values 1 1 1 1 As many column as necessary As many column as necessary As many column as necessary As many column as necessary 1 1 1 1 n/a n/a n/a n/a [# start of repeatable block] Only 1 set within a block Only 1 set within a block Only 1 set within a block Only 1 set within a block Only 1 set within a block Only 1 set within a block Only 1 set within a block Only 1 set within a block Only 1 set within a block Only 1 set within a block Only 1 set within a block Ontology Source Reference Term Source Name Term Source File Term Source Version Term Source Description Investigation Investigation Title Investigation description Date of Investigation submission Investigation Public Release Date Studies Study Identifier Study Title Date of Study Submission Study Public Release Date PubMed ID Publication DOI Publication Author list Publication Title Publication Status Study Description Study Design Study Desgin Term Accession Study Design Term Source REF Study File name Study Factors Factor Name Factor Type Factor Type Term Accession Factor Type Term Source REF Study Assays Measurements/Endpoints Measurements/Endpoints Term Accession Measurements/Endpoints Term Source REF Technology Type Technology Type Term Accession Technology Type Term Source REF Assay File Name Study Protocols Protocol Name Protocol Type Protocol Type Term Accession Protocol Type Term Source REF Protocol Description Protocol Contact Protocol Parameter Protocol Parameter Type Term Accession 1 1 1 1 1 1 1 1 1 1 1..n semi-colon (;) separated 1 1 1 Only 1 set within a block 1 1 1 1 As many column as necessary As many column as necessary 1 As many column as necessary As many column as necessary As many column as necessary 1 1 As many column as necessary 1 1 As many column as necessary As many column as necessary 1 1 As many column as necessary As many column as necessary 1 1 1 1..n semi-colon (;) separated As many column as necessary As many column as necessary As many column as necessary 1 23 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Protocol Parameter Type Term Source REF Protocol Instrument Name Protocol Instrument Components Protocol Instrument Components Term Accession Protocol Instrument Components Term Source REF Protocol Instrument Parameter Protocol Software Protocol Software Name Protocol Software Version Processing Method Parameter Study Contacts Person Last Name Person First Name Person Mid Initial Person Email Person Phone Person Fax Person Address Person Affiliation Person Role Person Role Term Accession Person Role Source REF As many column as necessary 1 1 1..n semi-colon (;) separated with keywords As many column as necessary 1..n semi-colon (;) separated with keywords 1..n semi-colon (;) separated with keywords 1..n semi-colon (;) separated 1 1 1 1..n semi-colon (;) separated 1 1 1 1 1 1 1 1 1..n semi-colon (;) separated 1 1 24 of 27 As many column as necessary As many column as necessary As many column as necessary As many column as necessary As many column as necessary As many column as necessary As many column as necessary As many column as necessary As many column as necessary As many column as necessary As many column as necessary As many column as necessary [# end of repeatable block] Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Table 4- Node headers in the Study and Assay files List of all valid fields for describing nameable/ identifiable objects = nodes which can be used to produce an ISATAB archive. The table shows the nodes, their possible attributes, the tab file in which the node should be used, number of allowed values, data type and if they require declaration of parent node (which could be inferred if absent). Node Name Source Name Sample Name Extract Name Labeled Extract Name Assay Name TAB location Node possible attributes Characteristics, Provider, Material Type, Description, Comment Characteristics, Material Type, Protocol REF, Description, Comment Characteristics, Material Type, Protocol REF,Description, Comment Characteristics, Material Type, Protocol REF, Description, Label, Comment Protocol REF, Raw Data File, Processed Data File Accept multiple values per field Data type Parent node dependency Study NO string Study NO string Source Name Assay NO string Sample Name Assay NO string Extract Name Assay NO string Hybridization Name Protocol REF, Array Data File, Derived Array Data File, Array Design, Comment Assay NO string Labeled Extract Name NMR Run Name Protocol REF, Free Induction Decay Data File, Acquisition Parameter Data File [NMR pulse sequence], Processed Spectral Data File Assay NO string Sample or Extract or Labeled Extract Gel Electrophores is Name Protocol REF, Raw Data File, Processed Data File, Spot Picking File Assay NO string Sample or Extract or Labeled Extract Mass Spectrometry Run Name Protocol REF, Analyzer Type, DetectorRaw Spectral Data File, Processed Spectral Data File, Protein identification File, Peptide Identification File, Metabolite Identification File Assay NO string Sample or Extract or Labeled Extract Sequencing Run Name Protocol REF, Raw Data File, Processed Data File Assay NO string Labeled Extract Scan Name Protocol REF, Image File, Array DataFile, Derived Array Data File, Comment Assay NO string Hybridization Name or Gel Name Normalization Name Protocol REF, Derived Array Data File, Comment Assay NO string Scan Name or NMR Run Name or Mass Spectrometry Run Name 25 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 Table 5 – Attribute headers in the Study and Assay files Attribute fields which can be used to qualify nodes and provide annotation to entities. The table shows also the tab file in which the node should be used, number of allowed values, data type and if they require declaration of parent node. Attributes Attribute qualifiers TAB location Accept multiple values Data type Dependency Label Term Source REF, Term Accession Assay NO string LabeledExtract Material Type Term Source REF, Term Accession Study, Assay NO string Source,Sample,Extract,Labele dExtract Characteristics [ ] Unit, Term Source REF, Term Accession Study, Assay NO string or number Source,Sample,Extract,Labele dExtract Factor Value [ ] ( ) Unit, Term Source REF, Term Accession Study, Assay NO string or number Parameter Value [ ] Unit, Comment Study, Assay NO string or number Protocol REF Unit [ ] Term Source REF, Term Accession Study, Assay NO string Characteristics[ ], FactorValue[ ], ParameterValue[ ] Term Source REF Study, Assay NO string Characteristics[ ], FactorValue[ ], ParameterValue[ ], Unit[ ] Term Accession Study, Assay NO string Characteristics[ ], FactorValue[ ], ParameterValue[ ], Unit[ ] Study, Assay NO string Source,Sample,Extract,Labele dExtract, Assay and Subtypes Assay NO string Mass Spectrometry Run Assay NO string Mass Spectrometry Run Protocol REF Analyzer Type Detector Parameter Value [ ], Performer, Date, Comment Term Source REF, Term Accession Term Source REF, Term Accession Array Design File / REF Comment Assay NO string Hybridization Image File Comment Assay NO string Hybridization, Gel Electrophoresis Raw Data File Comment Assay NO string Assay Raw Spectral Data File Comment Assay NO string NMR Run, Mass Spectrometry Run Free Induction Decay Data File Comment Assay NO string NMR Run Comment Assay NO string NMR Run, Mass Spectrometry Run Comment Assay NO string Hybridization Comment Assay NO string Hybridization Comment Assay NO string Hybridization Comment Assay NO string Hybridization Comment Assay NO string Sequencing Run Acquisition Parameter Data File Array Data File Derived Array Data File Array Data Matrix File Derived Array Data Matrix File Processed Data 26 of 27 Specification documentation for the ISA-TAB v0.3 - 17 February 2016 File Processed Spectral Data File Comment Assay NO string NMR Run, Mass Spectrometry Run Comment Assay NO string Gel Electrophoresis Comment Assay NO string Mass Spectrometry Run Comment Assay NO string Mass Spectrometry Run Metabolite Identification File Comment Assay NO string NMR Run, Mass Spectrometry Run Performer Comment Study, Assay NO string Protocol REF Provider Comment Study, Assay NO string Source Date Study, Assay NO date Protocol REF Description Study, Assay NO string Material Comment [ ] Study, Assay NO string ALL Spot Picking File Protein Identification File Peptide Identification File 27 of 27