Data submission to AE 1 ArrayExpress Data submission to AE www.ebi.ac.uk/microarray/submissions.html 2 ArrayExpress MAGE-TAB Example 3 ArrayExpress & Atlas MAGE-TAB Submission Ontology Link Submitter is directed to either submit an experiment or download a template to the desktop 4 ArrayExpress & Atlas MAGE-TAB Submission 5 ArrayExpress & Atlas Submission of HTS gene expression data • Submit via MAGE-TAB submission route • Submit: • MAGE-TAB spreadsheet containing details of the samples and protocols used. • Trace data files for each sample (in SRF, FASTQ or SFF format ) • Processed data files • For non-human species we will supply your SRF or FASTQ files to the European Nucleotide Archive (ENA). • If you have human identifiable sequencing data you need to submit to the The European Genome-phenome Archive and not ArrayExpress. They will supply you with a suitable template for submission and store human identifiable data securely. 6 ArrayExpress & Atlas Types of data that can be submitted 7 ArrayExpress & Atlas MAGE-TAB Example: IDF Value indicating which sequencing instrument was used (e.g. 454 GS, Illumina Genome Analyzer, AB SOLiD System). MAGE-TAB Example: SDRF Source Name finch 1 finch 2 finch 3 finch 4 finch 5 finch 6 finch 7 finch 8 finch 9 finch 10 Material Type whole_organism whole_organism whole_organism whole_organism whole_organism whole_organism whole_organism whole_organism whole_organism whole_organism Term Source REF MGED Ontology MGED Ontology MGED Ontology MGED Ontology MGED Ontology MGED Ontology MGED Ontology MGED Ontology MGED Ontology MGED Ontology Characteristics[Organism] Geospiza fortis Geospiza fortis Geospiza fortis Geospiza fortis Geospiza fortis Geospiza fortis Geospiza fortis Geospiza fortis Geospiza fortis Geospiza fortis Characteristics[Sex] male male male male male male male male male male Characteristics[StrainOrLine] Pinta Pinta Marchesa Marchesa Santiago Santiago Floreana Floreana Pinzon Pinzon Protocol REF EXTRACTION EXTRACTION EXTRACTION EXTRACTION EXTRACTION EXTRACTION EXTRACTION EXTRACTION EXTRACTION EXTRACTION MAGE-TAB Example: SDRF Protocol REF MY SEQ PROTOCOL MY SEQ PROTOCOL MY SEQ PROTOCOL MY SEQ PROTOCOL MY SEQ PROTOCOL MY SEQ PROTOCOL MY SEQ PROTOCOL MY SEQ PROTOCOL MY SEQ PROTOCOL MY SEQ PROTOCOL 10 ArrayExpress & Atlas Performer MWG MWG MWG MWG MWG MWG MWG MWG MWG MWG Assay Name pinta 1 pinta 2 marchesa 1 marchesa 2 santiago 1 santiago 2 floreana 1 floreana 2 pinzon 1 pinzon 2 Technology Type high_throughput_sequencing high_throughput_sequencing high_throughput_sequencing high_throughput_sequencing high_throughput_sequencing high_throughput_sequencing high_throughput_sequencing high_throughput_sequencing high_throughput_sequencing high_throughput_sequencing COMMENT[FLOW_SEQUENCE] COMMENT[FLOW_COUNT] Array Data File TACG 800 run1.fastq TACG 800 run2.fastq TACG 800 run3.fastq TACG 800 run4.fastq TACG 800 run5.fastq TACG 800 run6.fastq TACG 800 run7.fastq TACG 800 run8.fastq TACG 800 run9.fastq TACG 800 run10.fastq What needs to be included in the Spreadsheet? • Include Assay Name and Technology Type columns • Raw files must go in the Array Data File column • A sequencing protocol must be provided. • The sequencing protocol should have a performer- this is used as the run center name. • This protocol must have a Protocol Hardware value saying which sequencing instrument was used (e.g. 454 GS, Illumina Genome Analyzer, AB SOLiD System • Reference this in the Protocol REF column before the Assay Name column. • These 4 extra Comment[] columns should be added after Extract Name to provide information about how the library was prepared • • • • 11 Comment[LIBRARY_LAYOUT]- either SINGLE or PAIRED Comment[LIBRARY_SOURCE]- one of GENOMIC, NON GENOMIC, SYNTHETIC, VIRAL RNA, OTHER Comment[LIBRARY_STRATEGY] - one of WGS, WCS, CLONE, POOLCLONE, AMPLICON, BARCODE, CLONEEND, FINISHING, ChIP-Seq, MNase-Seq, EST, FL-cDNA, CTS, OTHER Comment[LIBRARY_SELECTION]- one of RANDOM, PCR, RANDOM PCR, RT-PCR, HMPR, MF, CF-S, CF-M, CF-H, CF-T, MSLL, cDNA, ChIP, MNase, other, unspecified ArrayExpress & Atlas Platform Specific Attributes Include the following attributes as Comment[] columns after Assay Name: • For LS454: • KEY_SEQUENCE (string - The first bases that are expected to be produced by the challenge bases) • FLOW_SEQUENCE (value is a string, e.g. TACG) • FLOW_COUNT (value is an integer) • For Illumina: • SEQUENCE_LENGTH (integer - The fixed number of bases expected in each raw sequence, including both mate pairs and any technical reads.) • For Helicos: • FLOW_SEQUENCE • FLOW_COUNT • For ABI SOLID: • SEQUENCE_LENGTH (integer - The fixed number of bases expected in each raw sequence, including both mate pairs and any technical reads.) 12 ArrayExpress & Atlas MAGE-TAB Submission Indicate submission type Ontology Link Submitter is directed to either submit an experiment or download a template to the desktop 13 ArrayExpress & Atlas MAGE-TAB Submission 14 ArrayExpress & Atlas What happens after submission? MAGE-TAB spreadsheet, raw and processed data files MTAB to SRA conversion script Submit SRA XML, raw data Curation ENA Linked by accessions MAGE-TAB, processed data ArrayExpress Archive What happens after submission? • Email confirmation • Curation • The curation team will review your submission and will email you with any questions. • Possible reopening for editing • We will send you an accession number when all the required information has been provided. • We will load your experiment into ArrayExpress and provide you with a reviewer login for viewing the data before it is made public. 16 ArrayExpress & Atlas