On Network Randomization Methods: A Negative Control - Bio-Grid

advertisement
On Network Randomization Methods: A Negative Control Study
2012 NSF Bio-Grid
REU Research Fellow
University of Connecticut
Max Espinoza (Fairfield 2013)
Understanding negative control in any scientific discipline is a necessity when trying
to show correlations. Negative control in an experiment is a sample that should yield a
negative result. In network theory, and many other sciences, graph randomization is vital to
generating a negative control group. A typical example of this can be found in motif detection
algorithms. We will use motif detection as a case to base this study on due to the its need
for randomized graphs as a negative control. Given an input network, a motif detection
algorithm will find that network’s sub-graph frequencies then compare those frequencies to
the negative control’s subgraph frequencies to determine which subgraph are statistically
overrepresented. These statistically overrepresented subgraphs are defined as motifs. The
generation of randomized networks that constitute the control group is done by a graph
randomization algorithm. Thus far there are many graph randomization algorithms in use
in motif detection. The problem lies in the fact that if randomized graphs are used as negative
control, then different randomization algorithms may yield different results. Randomization
algorithms differ in the topologies they preserve. The more network topologies preserved by
the randomization processes, the probability of encountering a false-positive is lessened. In
motif detection some networks might have higher frequencies of specific subgraphs due to
topologic properties, which could be misinterpreted as motifs if the randomized graphs did
not preserve those topological properties.
We intend to study and compare the set of result motifs found by motif detection software
to investigate how changing the randomization algorithm employed will affect the motifs
found given a static input network. We will use mFinder, an open source network-centric
motif detecting tool, due to its documentation and modularity. An E. Coli Protein to Protein
Interaction network will be used as the static input network. We will compare motifs found
by mFinder using the following graph randomization algorithms: The switching method, the
stubs method, and go with the winner algorithm. Future research could include modifying
mFinder to utilize other graph randomization algorithms such the ErdsRnyi algorithm and
Barabasi-Albert Preferential Attachment algorithm. Looking for network motifs three nodes
and under wouldnt yield many significant results to compare and looking for network motifs
over six nodes is computationally expensive. Therefore, we will use mFinder to search for
four to six node motifs in the E.Coli PPI network using these graph randomization algorithm
to generate the negative control group. We will then compare the motifs found and examine
the effect of altering graph randomization algorithms have on motif detection.
Demonstrating how changing the graph randomization algorithm in mFinder could yield
different result is an incremental step to future research. The importance of this study is to
help researchers understand and see the effects of altering graph randomization algorithms
have in terms of negative control. Network motifs detection are only one of the many
applications which require the use of randomization algorithms to generate negative control
groups.
1
Download