bchm628_lect2_14

advertisement
Working with gene lists:
Finding data using GEO
& BioMart
June 5, 2014
Analyzing a gene list
 With hundreds of genes but a limited budget and
lab personnel, you need to prioritize the gene list to
candidate genes for follow-up
 Pick ones that are “interesting”
 Known to
be involved in other related processes but
not (yet) in your process of interest
 Has
protein features which suggest a function in your
process, but it has not been characterized
 No
known function or domain, but it shows up in
other, related high-throughput experiments
suggesting a key role in your process of interest
Our approach
Analyzing gene lists by:
1. Finding overlap with other high-throughput
experiments
2. Finding additional information using BioMart
1.
Mouse/human homologs
2.
Protein domain content
3.
GO classification
GEO (gene expression omnibus)
 GEO Datasets
 Curated gene expression datasets

i.e. there is backlog of experiments that haven’t made it
into the database
 Can
search for experiments and conduct differential
gene expression queries on some datasets
 Can download
datasets & do offline analyses
 GEO Profiles
 Profiles of
expression data for genes
Why search GEO?
 What other experiments have been done that are
similar to yours?
 GEO
datasets
 How do my genes of interest behave in other large
scale experiments
 GEO
profiles
GEO Profile search
Search on a gene
name (C04F5.7):
GEO Dataset search
“C. elegans”: 4434
GEO Dataset searches
Query
Total
datasets
C. elegans
datasets
C. elegans
4434
4072
C. elegans AND response
131
121
C. elegans AND host response
5
5
C. elegans AND immune
24
20
C. elegans AND antimicrobial
109
94
Once dataset identified
 Download data
 SOFT format:
tab-delimited data
 Issues:
 Not
necessarily processed such that they have the
ratios of experiment/control
 If starting with raw
data, may not be able to replicate
exactly what authors did or lack expertise/software
to generate a list of DE genes
 Look for supplementary data from publication
 Usually they provide a
list of all DE genes
Choice of dataset for comparison
In class demo
Biomart – EBI Ensembl
 Use series of menus

Data source – organism (genes, variation, ect)
Filters -- reduce the number of results

Attributes – what data to return

 Can
set up very precise and multilayered queries
 Can
query across multiple organisms
 Simple query:
 Given a
list of gene IDs, you can obtain attributes or
sequences for the entire list
 Tools
 ID converter
– very useful, easy to use
Two sites for BioMart access
www.biomart.org
Database journal issue on BioMart
Filtering in BioMart
Attributes in BioMart
Biomart
 Filters
 C.
elegans genes with a human homolog
 Specify only genes with >=
# isoforms
 protein coding genes with a
transmembrane domain
 Attributes
 Entrez Gene IDs, WormBase IDs, Affy IDs
 Sequence data

transcript, protein, UTRs, flanking regions, ect.
BioMart
 In class demo
Today’s exercise
 Compare current dataset from PLoS Pathogens
paper to data from a different dataset
 Identify & retrieve additional information about C.
elegans genes using BioMart
Download