Efficient protocols in Complex networks

advertisement
Efficient protocols in Complex networks
Danny Dolev (Hebrew University)
Shlomo Havlin (Bar Ilan University)
(Supported by ISOC)
The Internet is by far the largest distributed system mankind has ever built. The number of nodes and
links is constantly increasing in amazing speed. The future Internet raises new challenges due to the
scale and seemingly unstructured nature of the network. A fundamental question in Internet research
is what are the algorithms and protocols that are suitable for such a system. What theoretical ideas
can be used to help improve the Internet?
The project has lead to many new observations, algorithms and theoretical studies that have made
lasting impact on the research community. Many research papers were written and the thesis work
of several PhD and MSc students were focused on topics covered by this research.
Among other things, we investigated new protocols for search in complex networks. The ability to
perform an efficient search is of great importance in many real systems (e.g. social networks), and in
particular in communication networks. Naive methods which store the global network connectivity at
each node are not scalable, and therefore new schemes have to be considered, in which the memory
requirement at the nodes (routing table) is bounded in at the cost of slightly decreasing quality of
routing. We suggested a new method for searching when such constraint exists. We assign new
names to nodes ('labelling') based on the path between them to the closest hub (node with high
connectivity). The new names are expected to be short since in real networks each node is within a
very short distance from one of the hubs. We were able to reduce significantly the amount of
information stored at the nodes, such that the required memory needed scales only logarithmically
with the network size. Due to the special properties of the hub-containing network we analyzed, the
actual paths used in the scheme are very close in distance to the shortest ones. We gave theoretical
arguments why the method is expected to be efficient, and bound the average length of the paths
taken. We also considered a distributed version of the algorithm. This should make our method
extremely useful for realistic systems such as the Internet.
To deepen our understanding of the characteristics of real communication networks, we studied a
new map of the Internet (at the autonomous systems level) that was obtained with exclusive data
from the DIMES project. In this project, the Internet was measured by exploiting collaboration of
many world-wide distributed volunteers. We introduced and use the method of "k-shell
decomposition" and methods of percolation theory and fractal geometry, to construct a model for the
structure of the Internet. In the k-shell method, the network nodes are peeled layer by layer in
ascending order of their number of connections into deeper shells. We used results on the statistics
of the network shells to separate, in a unique and fast way, the Internet into three subcomponents: (i)
a nucleus that is a small (~100 nodes), very well connected globally distributed sub-network; (ii) a
fractal subcomponent that is able to connect the bulk of the Internet without congesting the nucleus,
with self-similar properties and critical exponents predicted from percolation theory; and (iii)
dendrite-like structures, usually isolated nodes that are connected to the rest of the network through
the nucleus only. The resulting structure is illustrated schematically as a jellyfish with the nucleus at
the center. We showed that our method of decomposition is robust and provides insight into the
underlying structure of the Internet and its functional consequences. Our approach of decomposing
the network is general and is also useful when studying other complex networks.
As an effort to model the transport properties of real and random networks, we studied the
distribution of flow and electrical conductance in these networks. Our model networks are scale-free
networks (random networks with broad distribution of degrees) and Erdos-Renyi networks (random
networks where each link exists independently with small probability, leading to narrow, Poisson
distribution of degrees). Our theoretical analysis for scale-free networks predicts a power-law tail
distribution for the conductance and flow in the network. We also find an expression for the decay
exponent. The power-law tail in leads to large values of G, thereby significantly improving the
transport in scale-free networks, compared to Erdos-Renyi networks where the tail of the
conductivity distribution decays exponentially. We then studied the case of many sources and sinks,
where the transport is defined between two groups of nodes. This corresponds to exchange of files
between many users in parallel. We found a fundamental difference between the conductance and
flow when considering the quality of the transport with respect to the number of sources, due to the
different dependence on the distance travelled. We also found an optimal number of sources, or
users, for the flow case, which is supported by theoretical arguments
A complementary topic we studies the use of Belief Propagation in a large distributed network. Belief
propagation (BP, a.k.a. sum-product algorithm) message-passing is a powerful and efficient tool in
solving, exactly or approximately, inference problems in probabilistic graphical models. The
underlying essence of estimation theory is to detect a hidden input to a channel from its observed
output. The channel can be represented as a certain graphical model, while the detection of the
channel input is equivalent to performing inference in the corresponding graph. In one part of our
research, the analogy between message-passing inference and estimation is further strengthened,
unveiling a surprising link among disciplines. In spite of its significant role in estimation theory, linear
detection has never been explicitly linked to BP, in contrast to optimal MAP detection and several
sub-optimal nonlinear detection techniques. In this paper, we reformulate the general linear detector
as a Gaussian belief propagation (GaBP) algorithm. This message-passing framework is not limited to
the large-system limit and is suitable for channels with arbitrary prior input distribution. Revealing
this missing link allows for a distributed implementation of the linear detector, circumventing the
necessity of, potentially cumbersome, direct matrix inversion (via, e.g. , Gaussian elimination).
As a final note, we also studied the problem of distributing streaming media to a large and dynamic
network of nodes. We developed an unstructured solution where stream segments are given priority
based on their real-time scheduling constraints. We model the problem using a probabilistic graphical
model and run a distributed inference protocol for solving it. The solution is used to determine the
best schedule for the transfer of data satisfying the real time constraints. Our protocol is completely
distributed and scalable to very large networks. It doesn’t assume any predefined topology, and is
therefore fault tolerant. Unlike many of the proposed systems that are optimized for the efficient
transfer of either bulk-data or real-time streaming we proposed a novel method for the efficient
transfer of both streaming media and bulk-data. We demonstrated the applicability of our approach
using simulations over both synthetic topologies and real-life topologies.
Download