Transcriptomes

advertisement
BMMB 551 Assignment on Transcriptomes (Lesson 8)
This assignment is intended to be an opportunity to explore some of the
topics covered in class. The report should emphasize insights or information that
were interesting to you.
Note that the following are options; you only have to do one (and option 3
is to develop your own exercise).
Option 1. Use Gene Expression Omnibus (GEO) to explore microarray
expression data
This will give you some hands-on experience with microarray-based data on
transcription. I encourage you to do the analysis on cellular systems and genes
that are of interest to you. You are free to follow a path different from the one
outlined here if that is where your curiosity and interest take you. Please make it
clear in your report what you did.
Go to GEO, a server at NCBI:
http://www.ncbi.nlm.nih.gov/geo/
(1) GEO currently holds data for over 36,000 “series”, each of which comprises a
study with multiple samples. Over 3000 of these are curated as GEO “DataSets”,
which provide not only the data but also tools for finding differentially expressed
genes and for cluster analysis. You should choose one of the latter. The query
engine supports text queries like “breast cancer” so try it to find something you
are interested in. I’m working with the data in GDS568 record “Erythroid
differentiation: G1E model” data, so if you have no other preference, choose it.
All subsequent instructions refer to this dataset, but the same features should be
available for any comparable dataset. If you use a different dataset, then
substitute equivalent biological questions in the following.
(2) Under “Data Analysis Tools”, look at “Experiment design and value
distribution” to see what the samples are and the distribution of expression levels
in each.
(3) Under “Data Analysis Tools”, use “Cluster heatmaps” to generate at least two
views of the clusters in the data. Include a hierarchical clustering and a k-means
clustering. You can control the colors in the heat map; if you change them please
make it clear in the report what the colors mean.
(4) Now find out something about genes that show interesting patterns of
expression in the clusters (e.g. an up-regulated or down-regulated set). There
are multiple ways to focus on an interesting subset; use any or all of the
following.
(4.1) In order to get a subset of genes with an interesting expression
pattern, you can use the cropping tool on a cluster to zoom in. There is a
1
cropping tool (reddish-yellow dashed line box) to specify a cluster of interest.
Detailed instructions on using this interface can be obtained by clicking on “How
To” at the upper right, above the diagram.
(4.2) Alternatively, you can click on “Find genes” to choose genes that are
up or down regulated for a condition (time, disease, etc).
(4.3) You can also “Compare 2 sets of samples” to find an interesting set
of genes.
After choosing a small subset of genes (4-5) with interesting expression patterns,
follow the hyperlinks to get quantitative information on the expression levels for
and to find out more about the genes (e.g. links to Entrez Gene). In your report,
tell me what you learned.
Option 2. Map and analyze RNA-seq data
Galaxy contains tools for mapping and analyzing RNA-seq data, and it looks like
some datasets of tractable size are available in the Data Libraries (look under the
“Shared Data” pull-down menu). This is an opportunity to get some experience in
handling these data. A serious effort in running the mapping and some analyses,
and a description of your experiences and what you learned, would constitute a
good report. I expect each students experience will vary depending on your
background, and you may be inspired to dig more deeply. If you are feeling
adventurous, go for it!
At the Galaxy web site, go to:
“Shared Data” (top black row)
Select “Data Libraries”
Click on “Demonstration Datasets”
Click on the blue triangle for “Human RNA-seq: CHB ENCODE Exercise”
Import the data (do at least one cell type) into your history. It looks like the data
may be filtered so it is only chr19.
Click on “Analyze Data” (top black row), and return to your history.
Under NGS: RNA-seq, run Tophat to map the reads, and then explore cufflinks
and cuffcompare for further analysis. I expect it will take a while to finish the initial
mapping and maybe other functions as well.
Option 3. Develop your own assignment
Develop your own exercise using the data from the technologies we are
discussing. Note that substantial data from modENCODE projects on flies and
worms are available.
2
Download