includes MAGE-TAB tutorial - Beta Cell Biology Consortium

advertisement
Time line and procedures for
datasets
BCBC Pre-retreat Workshop
Tyson’s Corner, VA
May 11, 2011
Topics to cover
• Timeline for a dataset from contact to web
site
• Policies to follow and documents to use
• Ten questions about your dataset
• Creating a MAGE-TAB document with us
• Seeing your dataset on the Beta Cell web
site
• A tool you can use for MAGE-TAB:
Annotare
Datasets to Contact us about
• Your deliverables
– Microarray experiments
– High Throughput sequencing experiments
(RNA-seq, ChIP-seq, FAIRE-seq, etc.)
– RT-PCR screens
– Other deliverables – we can discuss how to
integrate
• Other key datasets
– From your lab but from different funding
– From the literature
Steps to get a study into Beta Cell
• Contact us. Let us know what is coming and when so we can
schedule working with you.
• Fill out the Ten Questions. When we get this from you, we can
generate an initial spreadsheet (MAGE-TAB) for you to
complete.
• Fill out highlighted areas of the MAGE-TAB. We will go back
and forth with you on details to get it right.
• Send us your data. We will set up a FTP account for you.
Send us the raw data (e.g., Affymetrix CEL files, FASTQ
sequence reads) and the processed data that the conclusions
are based upon.
• Set a release schedule. We will load the dataset and
incorporate into queries and web pages as appropriate. We
need to set when to release to the BCBC and to the general
public.
– We can also submit your data to ArrayExpress or, if desired,
GEO.
• View/Query your dataset. Beta Cell has releases every 3 to 4
months.
Timeline
• Completion of MAGE-TAB:
– Requires back and forth between the CC and the contact person in the
investigator’s lab
– Time to completion depends on responsiveness of such a contact
person
– Until the MAGE-TAB is completed, data loading cannot occur
• Data loading:
– Once the MAGE-TAB is completed and all necessary files have been
delivered, time to load the data depends on the size of your study
– For a typical study data loading takes a few weeks
– Missing files will delay the process
• Keep in mind that when you contact us to submit a study, you will be
put in a queue and the process of getting your study into Beta Cell
Genomics will start once you reach the top of the queue
• Studies that are meant to be viewable on the BCBC website (either
by the general public or by BCBC investigators only) have priority
over private studies, i.e. a study which is to be kept private will be
placed lower in the queue
Policies to follow and documents to use
• Ten Questions about your dataset
– Available as a BCBC miscellaneous resource
– http://www.betacell.org/resources/data/miscell
aneous/
• Bioinformatics/Epigenomics Working
group
– RNA-seq and ChIP-seq recommendations
• Includes checklists for data and information to
provide
– Mike Snyder will provide overview and
discuss
Meeting Deliverables
• For a study to be considered fully
“delivered”, the following is required on the
investigator’s part:
– Provide answers to the initial 10 questions
and all necessary data files
– Respond to all inquiries needed to generate
an accurate MAGE-TAB
– Allow your study to be visible (at least by
other BCBC investigators) on the Beta Cell
website
Topics to cover
• Timeline for a dataset from contact to web
site
• Policies to follow and documents to use
• Ten questions about your dataset
• Creating a MAGE-TAB document with us
• Seeing your dataset on the Beta Cell web site
• A tool you can use for MAGE-TAB: Annotare
MGED Standards
• What information is needed for a microarray
experiment?
– MIAME: Minimal Information About a Microarray
Experiment. Brazma et al., Nature Genetics 2001
• How do you “code up” microarray data?
– MAGE-OM: MicroArray Gene Expression Object Model.
Spellman et al., Genome Biology 2002
– MAGE-TAB Rayner et al., BMC Bioinformatics 2006
• What words do you use to describe a microarray
experiment?
– MO: MGED Ontology. Whetzel et al. Bioinformatics 2006
MIAME in a nutshell (ala Alvis Brazma)
Sample
Sample
Sample
Sample
Sample
Experiment
Array design
RNA
extract
RNA
extract
RNA
extract
RNA
RNAextract
extract
labelled
labelled
labelled
labelled
nucleic
acid
labeled
nucleic
acid
nucleic
acid
nucleic
nucleicacid
acid
Protocol
Protocol
Protocol
Protocol
Protocol
Protocol
Stoeckert et al.
Drug Discovery Today TARGETS 2004
genes
hybridisation
hybridisation
hybridisation
hybridisation
hybridization
array
array
array
array
Microarray
Gene
expression
data matrix
normalization
integration
Sequencing is replacing array technology
Sample
Sample
Sample
Sample
Sample
RNA
extract
RNA
extract
RNA
extract
RNA
RNAextract
extract
labelled
labelled
labelled
labelled
nucleic
acid
nucleic
acid
nucleic
acid
nucleic
nucleicacid
acid
Protocol
Protocol
Protocol
Protocol
Protocol
Protocol
genes
Experiment
Array design
@HWI-EAS266_0011:8:1:6:969#0/1
GTTTGCCNGTGTGTACGCTACCCCCTTCTTGTGTGTGTGTGTCT
+HWI-EAS266_0011:8:1:6:969#0/1
_abb`a[DZ`aabaa_a`b]___^^aa_`aa_a^a[\\aZTZVY
@HWI-EAS266_0011:8:1:7:1688#0/1
AAGATGANGGCAGGGTGCAAGATGGCAGGATGCAAGATGGCAGG
+HWI-EAS266_0011:8:1:7:1688#0/1
a`^ab`^D\a]a`b``b_bbbaabb^abaa``^a_^_aa\]_VR
@HWI-EAS266_0011:8:1:7:593#0/1
CAGTTCANTTCTCAGCACCACACTGGGATGCTCACACATGCCTG
+HWI-EAS266_0011:8:1:7:593#0/1
abbbb_VD[bbbba_`bbbbbbbbbbbaa_`bbaabaabb_aa_
@HWI-EAS266_0011:8:1:7:139#0/1
CATGGGGNATAATTGCAATCCCCGATCCCCATCACGAATGGGGT
+HWI-EAS266_0011:8:1:7:139#0/1
aab`[^YDY]Z\baa`aabaaaa`aa`a]aa```\aY]^\]ZVX
@HWI-EAS266_0011:8:1:7:1390#0/1
GAATAATNGAATAGGACCGCGGTTCTATTTTGTTGGTTTTCGGA
+HWI-EAS266_0011:8:1:7:1390#0/1
_U^b_`]D\__a_a`S```Y[a__]a\aa_`]`aTVZ__\HYVX
@HWI-EAS266_0011:8:1:7:1663#0/1
TGATGTTNGTGGCAATAATGGGGGTAGCGGCAATGGTGGCGGGG
+HWI-EAS266_0011:8:1:7:1663#0/1
a`[_X]\DQTZ[^YYa[[aXV[PZUUYSYBBBBBBBBBBBBBBB
hybridisation
hybridisation
hybridisation
hybridisation
hybridisation
array
array
array
array
Microarray
Gene
expression
data matrix
normalization
integration
Sequencing is replacing array technology
Sample
Sample
Sample
Sample
Sample
RNA
extract
RNA
extract
RNA
Chromatin,
RNAextract
extract
DNA extract
labelled
labelled
labelled
labelled
nucleic
acid
nucleic
nucleic
acid
nucleic
acid
nucleic
acid
acid
Protocol
Protocol
Protocol
Protocol
Protocol
Protocol
genes
Experiment
Array design
@HWI-EAS266_0011:8:1:6:969#0/1
GTTTGCCNGTGTGTACGCTACCCCCTTCTTGTGTGTGTGTGTCT
+HWI-EAS266_0011:8:1:6:969#0/1
_abb`a[DZ`aabaa_a`b]___^^aa_`aa_a^a[\\aZTZVY
@HWI-EAS266_0011:8:1:7:1688#0/1
AAGATGANGGCAGGGTGCAAGATGGCAGGATGCAAGATGGCAGG
+HWI-EAS266_0011:8:1:7:1688#0/1
a`^ab`^D\a]a`b``b_bbbaabb^abaa``^a_^_aa\]_VR
@HWI-EAS266_0011:8:1:7:593#0/1
CAGTTCANTTCTCAGCACCACACTGGGATGCTCACACATGCCTG
+HWI-EAS266_0011:8:1:7:593#0/1
abbbb_VD[bbbba_`bbbbbbbbbbbaa_`bbaabaabb_aa_
@HWI-EAS266_0011:8:1:7:139#0/1
CATGGGGNATAATTGCAATCCCCGATCCCCATCACGAATGGGGT
+HWI-EAS266_0011:8:1:7:139#0/1
aab`[^YDY]Z\baa`aabaaaa`aa`a]aa```\aY]^\]ZVX
@HWI-EAS266_0011:8:1:7:1390#0/1
GAATAATNGAATAGGACCGCGGTTCTATTTTGTTGGTTTTCGGA
+HWI-EAS266_0011:8:1:7:1390#0/1
_U^b_`]D\__a_a`S```Y[a__]a\aa_`]`aTVZ__\HYVX
@HWI-EAS266_0011:8:1:7:1663#0/1
TGATGTTNGTGGCAATAATGGGGGTAGCGGCAATGGTGGCGGGG
+HWI-EAS266_0011:8:1:7:1663#0/1
a`[_X]\DQTZ[^YYa[[aXV[PZUUYSYBBBBBBBBBBBBBBB
hybridisation
hybridisation
hybridisation
hybridisation
hybridisation
array
array
array
array
Microarray
ChiP-Seq
MeDIP-Seq
Etc.
normalization
integration
From MGED to FGED
• What information is needed for an HTS
experiment?
– MINSEQE: Minimum Information about a highthroughput SeQuencing Experiment
• How do you “code up” functional genomics data?
– MAGE-TAB can still be utlized
• What words do you use to describe a functional
genomics experiment?
– OBI: Ontology for Biomedical Investigations, incorporates
MO
MAGE-TAB Format
What is MAGE-TAB?
• A simple spreadsheet view consisting of 2 files:
– IDF: describing the experiment design, contact details, variables, and
protocols
– SDRF: a spreadsheet with columns that describe samples, annotations,
protocol references, assays, and data
– Linked data files (e.g. CEL files) are referenced by the SDRF
Where can I get MAGE-TAB from?
• ~10,000 MAGE-TAB files are available from ArrayExpress (includes
GEO derived and ArrayExpress data)
• caArray also provides MAGE-TAB files for download
Who is using MAGE-TAB?
• BioConductor
• GenePattern
• MeV
• and Beta Cell Genomics!
IDF file for E-TABM-34
IDF = Investigation Description Format
SDRF file for E-TABM-34
SDRF = Sample and Data Relationship Format
A microarray expression study
IDF
Experimental Design
Following 1 sample: bench
component
OrganismPart
black border = biomaterials
red border = treatments
in-silico component
image acquisition
feature extraction
summarization (feature extraction II)
and quantile normalization
SDRF
Let’s focus on the highlighted row
From design to MAGE-TAB
From design to MAGE-TAB
Viewing the Annotation
Querying the Annotation
Loading and Analyzing the Data
• Image and .CEL files are archived and
their location stored in the database
• Raw and processed data loaded into the
database
• Downstream analyses (e.g. differential
expression) are performed, generating
gene lists
• Analysis results loaded into the database
Querying the Data
A ChIP-Seq study
IDF
Experimental Design
Bench Component
In-silico Component
Ptf1a_s5
Ptf1a_s5_seq.txt
s5_eland.txt
Ptf1a_peaks
Ptf1a_s4
Ptf1a_s4_seq.txt
s4_eland.txt
Input_s8
Input_s8_seq.txt
s8_eland.txt
Rbpjl_s6
Rbpjl_s6_seq.txt
s6_eland.txt
Input_s2
Input_s2_seq.txt
s2_eland.txt
Rbpjl_peaks
Rbpjl_s4
Rbpjl_s4_seq.txt
s4_eland.txt
cluster generation
image acquisition
sequencing
alignment
peak calling
SDRF
Viewing the Annotation
Querying the Annotation
Viewing the Data
Querying the Data
Topics to cover
• Time line for a dataset from contact to web
site
• Policies to follow and documents to use
• Ten questions about your dataset
• Creating a MAGE-TAB document with us
• Seeing your dataset on the Beta Cell web site
• A tool you can use for MAGE-TAB: Annotare
Annotare - An open source
standalone MAGE-TAB editor
Shankar R, Parkinson H, Burdett T, Hastings E, Liu J, Miller M,
Srinivasa R, White J, Brazma A, Sherlock G, Stoeckert CJ Jr,
Ball CA.
Annotare - a tool for annotating high-throughput biomedical
investigations and resulting data.
Bioinformatics. 2010 Aug 23.
Annotare - an open source MAGE-TAB Editor
Annotare is an annotation tool for high throughput gene
expression experiments in MAGE-TAB format.
Researchers can describe their investigations with the
investigators’ contact details, experimental design,
protocols that were employed, references to
publications, details of biological samples, arrays, and
experimental data produced in the investigation.
Annotare Features
• Intuitive graphical user interface forms
for editing
• Ontology support, an inbuilt ontology
and web services connectivity to
bioportal
• Searchable standard templates
• Design wizard
• Validation module
• Mac and Windows Support
http://code.google.com/p/annotare/
Annotare Demo
• File Gallery: Three different ways to get
started
• Looking at an existing MAGE-TAB
– Form versus sheet view
• Using a template
• Using the wizard
Download