SaiBadey_wi14

advertisement
[TYPE THE COMPANY NAME]
Network Generation & Motif Detection in Biological Networks
Application of the MASS Library
Sai Badey
Winter 2014
This document outlines my research over the last quarter under Professor Fukuda. There are
multiple ways to generate biological networks for comparison. This project identifies four key
ways to generate methods and the potential implications.
1
Application of biological networks occur at both micro and macro levels. They map out
connections between two organic compounds in organisms or represent the connections between
different organisms or communities. The current work focuses on detecting the essential proteins
in microorganisms. Currently existing software will search through networks, represented by
vertices and edges, and locate recurring patterns. If there is a significantly larger level of
frequency of occurrence of these patterns than in other similar networks, the pattern is labelled a
network motif. These motifs are unique across networks and are invaluable in determining
essential proteins in microorganisms.
Some of important terminology is defined in this section. Biological networks are simply
a list of connections between different components. Motifs are certain patterns which emerge
from these connections that are significant for being uniquely prevalent in a particular network.
Isomorphs are different representations of a single pattern. These are difficult to detect since
many patterns seem unique until viewed from a specific angle. Vertices are the nodes, or the
agents, which form the core of the network. The edges represent the connections between the
vertices. Although in real life, the connections are very complex, they are represented by edges
which simply imply whether or not there is a connection.
My work is based on Amala Ghandi’s (a previous research student) previous paper. Her
work allows for an efficient method of searching patterns within a network. Her search method
includes tracking patterns where each vertex connects to a higher-level vertex only. This search
method ensures that no pattern is counted more than once (solving the issue with counting
isomorphs).
2
My work attempts to explore the area of network generation. More specifically, it
attempts to determine a proper method of network generation that creates networks that is
random, yet still contains the key characteristics of the original. In many previous studies
conducted in the biological network area, few focus on the methods of network generation.
However, this may be the case since it is possible that the manner of network generation does not
play any effect when scaled up to large counts in the hundreds or thousands of graphs. This is an
important area to explore, nonetheless, since if there is a difference, it will have great
implications for many study results published thus far.
The research attempts to create several methods of network generation, some which are
currently in use. The 4 main methods of random network generation include: taking the vertex
and changing its order, but still maintaining the same number of edges; swap the node degrees in
a random manner; combine 1,000 graphs into a single graph where only samples are drawn; and
“direct generation”.
All four methods of network generation assure that the key aspects of the networks are
preserved. All of the vertices are intact and the number of interactions between the vertices is
untouched as well. The only change occurs to the where the interactions occur. Some nodes have
higher degrees of interactions compared to other nodes. This node cannot be suddenly isolated,
but the nodes and the number of nodes that it interacts with can change. The first two methods
capitalize on this in order to simply swap the places of the nodes or the node degrees. The latter
two methods are already in use in several publications (which are referenced at the end of this
paper) proving their validity.
3
There are two main possible results: there IS NO significant difference between the
results of each of the network generation methods and that there IS a significant difference. This
is determined through z-score analysis and comparisons of the results of the different generation
methods. Since the methods are so vastly different from one another, a single seed cannot
determine the validity of the information. Several networks will have several sets of graphs
created from which the data is drawn. This allows for good comparison of performance time and
the results.
If there is no significant difference in the generation methods, it is still possible that one
of the methods has a shorter performance time than the others. If this is a significant difference,
it will still be a viable standard for future network generations. However, if there is no significant
difference, it will give rise to a larger issue – how to determine the accuracy of one network
generation method over the other?
The applications of this project vary in range due to its applicability in the various fields
of biology (macro & micro). It can be essential in detecting certain types of diseases or cancers
in the medical field, it can help detect essential proteins in microbiology, it can even be used for
finding key niches in ecosystems in zoology. Since all of these fields have a massive amount of
information to process, a MASS integrated approach to finding network motifs can have
enormous impact in the field.
Currently, the scope of this project is to analyze a network through the MASS library.
Then, several graph will be generated through the 4 network generation methods outlined above
using the MASS and JUNG libraries. These graphs will undergo different types of analyses
(using the entire graph for smaller graphs, and using samples for larger ones).
4
In the future, this project will attempt to determine motifs along multiple graphs using the
network motif processes and have a fully-fledged compare networks class that allows a user to
determine the types of comparisons and analyses they would like to do.
Resources:

Amala Ghandi’s research paper

Sahand Khakabimamaghani, Iman Sharafuddin, Norbert Dichter, Ina Koch, Ali
Masoudi-Nejad


QuateXelero: An Accelerated Exact Network Motif Detection Algorithm (Article)
Joseph Blitzstein and Persi Diaconis

A SEQUENTIAL IMPORTANCE SAMPLING ALGORITHM FOR
GENERATING RANDOM GRAPHS WITH PRESCRIBED DEGREES (Article)

Bjorn H. Junker & Falk Schreiber

Analysis of Biological Networks (Book)
Download