Possible Extensions of RadViz

advertisement
Ideas not planned in Unity.
From before, more suited for R&D.
.1. Continue Gene Ontology evaluation and visualization started
by Hong Li.
This will link functional genomics to expression analysis in a visual and analytical
manner (cluster comparison etc.) But do this in a manner that is good for general
“Meta Data” mining. Adding descriptions, or hiarchical data, etc Additionally, we
want to integrate gene pathway information if that is available also.
There are at least 3 levels of knowledge we want integrated
into our expression analysis.
Mesh (literature hits)
Gene Ontology (function descriptions, chemical etc.)
Pathway information.
2. N-fold validation of a RadViz classifier
.
We need a mechanism such as re-laying out RadViz N- times (N-fold)
and automatically (or guided by the opr) defining a classification region and
then place the N-percent of points left out to see where they are classified.
Since RadViz is a visually classifier we need a way to estimate its accuracy,
like any standard classifier.
3. Map RadViz layout to all Classifier outputs
As discussed several times, most classifier outputs can be made
"gradual" that is one can make a classifiying decision but also it is
possible to find out how "close" a point is to the other classes. This
information can be displayed nicely in the RadViz "pie wedge"
classification paradigm.
4.
Cortviz- method for showing “Correlation signatures” of
genes within experiments. This should be further investigated and studied.
CortViz is essentially the Pearson Correlation matrix of all genes with all genes, but
arranged in a manner sorted by genes that correlate highly together.
5. Enhance RadViz class layout by FuzzViz
– lines perpendicular to the perimeter representing t-statistic, or significance etc. This
gives a visually representation of the layout statistic and when a class layout is done,
will have perpendicular radial lines sorted from high to low for each class pie shape
wedge.
6.
Enhance RadViz class layout by t-statistic. That is the weights of
Radviz should be modified by the calculated layout statistic. Thus possibly
enhancing the class separation.
7. Principle Components –Radviz clustering ( A new idea
mapping pc coefficients to radViz weights see other document
for details)
One of the problems with PCA is that the coefficients of the “real” dimensions making up
the PC’s are not usually shown, thus being somewhat a “black box” clusterer.
One could do a PCA and then use the coefficients of each PC for each “real”dimension as
weights in a Radviz layout. The coefficients-weights would be sorted Hi to Low , with
negative coefficients on the opposite side or RadViz enhanced to use negative weights.
Thus if one wanted to “see” 3 PC’s one would have 3 groups of all the real
dimensions around the radviz layout, but with spring forces equivalent to the PC
coefficients. This should give clustering similar to a 3D PCA analysis, but with the
dimensions layed out and shown in the “importance” order. This should help
understanding PC clustering better.
PC1
PC2
PC layout example dimensions would be duplicated for each PC, and ordered according
to size of coefficients. Weights would be equal to coefficients.
8. Dataset chunking or multi-way formatting. Pivot and flattening are
subsets of this general idea. All the grouping operations we did for Hypnion were subsets
of this data chunking. Transforming “heterogeneous “ data in this manner can greatly
enhance the power of patchgrid and radviz. Essentially this is reorganizing a bunch of
tables into a flat file. This is probably best done by designing a database with the “clean”
tables and then re-querying the data into different view for analysis. EZ can help us with
this idea.
9. More and better Statistics (see last Micro Array Stats
proposal)
10. Gene Correlation Searching/Analysis – based on our three correlation
methods (Pearson , Jacknife and Cosine) we could build a database of gene correlation
over different experiments. For example in one experiment we have looked at the two
genes U11863(1713) and. U39400(2011) and found a very high negative
correlation(-0.976). A database query system could be built for customers with different
correlation values for different experiments.
11.. Relevance Network
· Show gene to gene correlations across an experiment. The correlation of all genes to all genes
thresholded in PatchGrid would show this as Gene Correlation Signature (see CortViz). But an
alternative method would be as a Relevance network overlayed in RadViz. A nice option when
we are displaying say all 7000 genes in RadViz would be to connect all genes to each other that
have say greater than .9 correlation. This would essentially be a Relevance network. Doing all 7k
correlations is time consuming (but we have done it), but we could limit it to the top significant
genes (or the Purs genes). This would be a pretty easy enhancement to our RadViz layout. One
could as show Functional Similarity in this same relevance radviz network. The functional
similarity could be based on Gene Ontology. For this we would need the link to that database to
show the connections. It seems that Xpogen is using distance metrics for relevance such as
correlation or Euclidean distance. Duplicating that will be fairly easy.
12· Bi-clustering Relevance network (Phils idea)
Cluster genes and patients separately and connect highly over expressed and under expressed
genes to the appropriate patient clusters. This may help in pathway mapping. This is related to
network models for genetic pathways and was touched on by Shamir and Thorsson.
..peh 3-01-02
Download