Metagene Projection

advertisement
Metagene Projection
• There are a lot of array data available
• Species, platform, labeling method, researcher
and other issues make using these data difficult.
• Metagene Projection claims to “reduce noise
while still capturing the invariant biological
features of the data.”
• This should “enable cross-platform and crossspecies analysis, improve clustering and class
prediction and provide a computational means to
detect and remove sample contamination.”
NMF (Brunet et al., 2004 PNAS 101:4164) – Nonnegative Matrix Factorization
W=genes X small # metagenes
H= small # metagenes X samples
M and T
n (genes) x N
(samples)
KEY point:
n (genes)
identifiers in M
and T must
match
Unknown: Can
M and T be
totally different
types of data?
Moore-Penrose
generalized pseudoinverse
Model –
30 samples, 3 metagenes
Test –
38 samples, 3 metagenes
After
Metagene
Projection
Before
Metagene
Projection
Before Metagene Projection : Rank
normalized and including only the top 500
markers of each class. – Underperforms
metagene projection
•KO of the same gene impacts
different cell lines in similar way.
• Both mouse stem cell lines, one on
Exon array, one on 430_2
•For 3’ UTR – max average per gene selected
•For Exon – max probe count per transcript cluster id selected
•gene symbol <–> gene symbol join
•All 17354 genes used
Expressed Clustering (10989 genes)
RankNorm = 15((Rank-1)/(#genes-1))
Expressed and rank normalized clustering
Metagene Projection
Preprocessing
2 required inputs for the genepattern metagene projection module are model
and test preprocessing parameter files.
gct.file="Arv.gct"
cls.file="Arv.cls"
column.subset="ALL"
column.sel.type="samples"
thres=3
NO
ceil=14
FILTER at
fold=1
this value
delta=1.5
norm=6
4525 pass
Model input,
preprocessing and
refinement
H matrix from NMF
Projected model
dataset
Model combined
with test
Original data – Platforms
separated
Projected data – possibly better
separation move 1 het to improve
clades
Download