presentation

advertisement
First of all:
“Darnit Jim, I’m a doctor not a bioinformatician!”
Researcher interested in gene expression
• I have obtained raw RNAseq files (FASTQ) for a set of cell lines.
How can I process this data and examine my gene(s) of interest?
– Ask a bioinformatician
– Do it yourself using TraIT tools: run available NGS workflow in Galaxy
First time experience of Galaxy
Looks like RNA expression analysis…
But, I have something
called a FASTQ file
I don’t know about this
format, where do I get
such a reference?
Looks like RNA expression analysis…
And many more options…
How do I know that the settings
here are correct for my type of
data?
Instead of a BAM file I have a FASTQ file. How
do I process this?
Solution: readily available workflow
And other pipelines in progress
Gene expression: input parameters
Ideally metadata on these parameters was provided by original data owners and/or
can be traced back (own data  known; from other person  trace back)
Trial run
• For 4 colorectal cancer cell lines the FASTQ files were provided.
Data owner could provide:
• platform
• adapter sequences
• library type
• Wanted to compare these to the processed RNAseq data of prostate
cell lines (same experimental platform was used).
• Ran workflow and obtained readcounts/measure of expression for
the new cell lines.
Comparison: colon and prostate
Possible for non/little-informed user to run
Galaxy workflow and obtain results in a
format that can be used in downstream
analysis.
Further analysis…
• Usually, comparison is tumour sample vs normal sample.
– EdgeR is available to perform this comparison.
• Comparison of expression between groups is possible (e.g.
colorectal cell lines vs prostate cell lines), however, when I have
only cell lines:
– how to solve the question:
“does my gene of interest show altered expression in a particular
sample compared to a reference sample?”
Issues
• When not in possession of normal/reference in the dataset (T only,
cell lines), how to determine altered expression of a gene of
interest?
– Use a general normal reference that needs to be provided for
comparison? (standard cut-off for increased or decreased
expression)
< xx reads = decreased exp, > xxx reads = increased exp?
– Calculate a median expression for all genes of the platform and
then compare expression of one gene to median expression of
all genes (significant outliers?)
– Distiguish expression of a gene in diploid vs aneuploid cells 
trouble, in most cases no ploidy status known
Issues
• When investigating data in the data-integration platform, query for
the gene AURKA will give certain results.
• If one study had T/N and the other only T – and different manners
for determining altered expression were applied – can this data be
compared?
– Pro: it’s processed and called data you’re comparing in this platform,
trust the called data
– Con: I don’t think it’s fair to compare differently called data – if
comparing such datasets, start from the beginning and treat in the
same manner  convert the data of the T/N analysed data to T-only or
cell line only analysed
Download