Algorithmic Performance in Complex Networks Milena Mihail Georgia Tech. 1 E.g. the Internet at the level of Autonomous Systems supports the critical BGP routing protocol. The Internet is a remarkable phenomenon that involves graph theory in a natural way and gives rise to new questions and models. 2 Search and routing networks, like the WWW, the internet, P2P networks, ad-hoc (mobile, wireless, sensor) networks are pervasive and scale at an unprecedented rate. Performance analysis/evaluation in networking: measure parameters hopefully predictive of performance. Important in network simulation and design. 3 Want Sparse metrics small-world predictive graphs orwith explanatory large degree-variance. of network function. , but frequency no sharp concentration Erdos-Renyi 2 4 10 100 degree 4 Networking questions How does delay scale in routing? Does packet drop (blocking) scale? Are network resources used efficiently? Is there load balancing? Does the network evolve towards monopolies? Routing Congestion How fast can you crawl the WWW? Can you search a P2P network with low overhead? Graph on Are there strategies to improve crawling and searching? nodes. Route 1 unit of flow between each pair of nodes. How can you maintain a well connected topology? How about distributed and dynamic networks? Searching Total flow Design . Congestion = flow on most loaded link under optimal routing. 5 Relevant metric: Conductance “bottlenecks” Alon 85 Jerrum & Sinclair 88 Leighton & Rao 95 6 computationally soft Second eigenvalue Matlab does 1-2M node sparse graphs of the lazy random walk associated with the adjacency matrix closely approximates conductance: + + + + This also says that congestion under link capacities, search time and sampling time scale smoothly Internet - - is also another point of view - This of the small-world phenomenon Plots at 700 nodes, 3000 nodes, and 15000 nodes. Eigenvectors associated with large eigenvalues are “shadows” of sets with bad conductance. Random Graph 7 100 largest eigenvalues Beyond today, we need network models to predict future behavior. What are suitable network models? The Internet grows anarchically, so random graphs are good canditates. Current network models are random graphs which produce power law degree sequences (thus also matching this important observed data). 8 EVOLUTIONARY:Growth & Preferential Attachment One vertex at a time New vertex attaches to existing vertices Simon 55, Barabasi & Albert 99, Kumar et al 00, Bollobas & Riordan 01, Bollobas, Riordan, Spencer & Tusnady 01. 9 CONFIGURATIONAL aka structural MODEL “Random” graph with given “power law” degree sequence. Given choose random perfect matching over minivertices Bollobas 80s, Molloy & Reed 90s, Aiello, Chung & Lu 00s, Sigcomm/Infocom 00s 10 CONFIGURATIONAL MODEL Given edge multiplicity O(log n) , a.s. Choose random perfect matching over connected, a.s. minivertices 11 Bounds on Conductance Technique: Probabilistic Counting Arguments & Combinatorics. Difficulty: Non homogeneity in state-space, Dependencies. Theorem [M, Papadimitriou, Saberi 03]: For a random graph grown with preferential attachment with , , a.s. Previously: Cooper & Frieze 02 Theorem [Gkantsidis, M, Saberi 03]: For a random graph in the configurational model arising from degree sequence , , a.s. Independent: for a different structural random graph model and Chung,Lu&Vu 03 12 Structural Model, Proof Idea: Difficulty: Non homogeneity in state-space But all vertices do not have the same degree. Worst case is when all vertices have degree 3. 13 Growth with Preferential Connectivity Model, Proof Idea: 1st 2nd 3rd 4th 5th 6th 7th Difficulty: whether there is edge depends upon arrival order of all vertices! Key Observation: To bound conductance of S, suffices to study combinatorics of how these two sequences interleave. 14 Growth with Preferential Connectivity Model, Proof Idea: 1st 2nd 3rd 4th 5th 6th 7th Shifting Argument 15 Theorem [MM, Papadimitriou, Saberi 03]: For a random graph grown with preferential attachment with there is a poly time computable flow that routes demand between all vertices and with max link congestion , a.s. Theorem [Gkantsidis,MM, Saberi 03]: For a random graph in the structural model arising from degree sequence there is a poly time computable flow that routes demand between all vertices and with max link congestion a.s. Note: Why is demand ? Each vertex with degree in the network core serves customers from the network periphery. 16 Networking questions How does delay scale in routing? Does packet drop (blocking) scale? Are network resources used efficiently? Is there load balancing? Does the network evolve towards monopolies? Routing Congestion It is How fast can you crawl the WWW? Can you search a P2P network with low overhead? Are there strategies to improve crawling and searching? How can you maintain a well connected topology? How about distributed and dynamic networks? Searching Is it or ? Design Is it or ? 17 Searching, Cover Time and Mixing Time Graph on nodes. Search the graph by random walk. Cover time = expected time to visit all nodes. Mixing time = time to reach stationary distribution (arbitrarily close). 18 Conductance, Mixing and Cover Time For Cover Time Rapid Mixing of Random Walk “mixing” in Alon 85 Jerrum & Sinclair 88 19 Extensions of Cover Time In practice, when crawling the WWW or searching a P2P network, when a node is visited, all nodes incident to the node are also visited. This can be implemented by one-step local replication of information. 20 Cover Time with Look-Ahead One Theorem [MM,Saberi,Tetali 05]: In the configurational model with Proof can discover vertices in steps. Adamic et al 02 Chawathe et al 03 Gkanstidis, MM, Saberi 05 21 Cover Time with Look-Ahead Two Theorem [MM,Saberi,Tetali 05]: In the configurational model with can discover in vertices steps. Proof 22 Networking questions How does delay scale in routing? Does packet drop (blocking) scale? Are network resources used efficiently? Is there load balancing? Does the network evolve towards monopolies? Routing Congestion It is How fast can you crawl the WWW? Can you search a P2P network with low overhead? Searching Cover time Are there strategies to improve crawling and searching? It is and local replication offers substantial improvement How can you maintain a well connected topology? How about distributed and dynamic networks? Design Is it or ? 23 The case of Peer-to-Peer Networks Must maintain well connected topology, e.g. a graph with good concuctance, a random graph Distributed, decentralized n nodes, d-regular graph Each node has resources O(polylogn) and knows a very small size neighborhood around itself ? Search for content, e.g. by flooding or random walk 24 P2P networks are constantly randomizing their links Gnutella: constantly drops existing connections and replaces them with new connections There are between 5 and 30 requests for new connections per second per client. About 1% of these requests are satisfied and existing links are dropped. The network is working “in panic” trying to randomize thus avoiding network configurations with bottlenecks and trying to maintain high conductance. 25 P2P Network Topology Maintenance by Constant Randomization Theorem [Cooper, Frieze & Greenhill 04]: The Markov chain corresponding to a general 2-link switch on d-regular graphs is rapidly mixing. LOCALITY: In reality, network can only switch links that are within constant distance. Theorem [Feder, Guetz, M, Saberi 06]: Rapid mixing even under local 2-link switches or flips. 26 The proof is a Markov chain comparison argument Space of connected d-regular graphs local Flip Markov chain Space of d-regular graphs general 2-link switch Markov chain Map the transitions of S to the transitions of SC, with small load. Load = max # transitions of S mapped to single transition of SC. 27 The proof is a Markov chain comparison argument Space of connected d-regular graphs local Flip Markov chain Space of d-regular graphs general 2-link switch Markov chain Natural mapping from S to SC: map switch between u, v to path of local flip switches. Problem: length of path unbounded. Key Construction: mapping such that each edge in S maps to a constant number of edges in SC 28 P2P Dynamic Network Construction Problem: ? ? ? 29 P2P Dynamic Network Topology Construction by Random Walk ? ? ? Theorem [Law & Siu ‘03]: Construct a constant expander on n vertices with overhead O( log n) per node addition. 30 P2P Dynamic Network Topology Construction by Random Walk 31 P2P Dynamic Network Topology Construction by Random Walk 32 P2P Dynamic Network Topology Construction by Random Walk Gkantsidis, MM, Saberi ’04 Heuristic reminiscent of saving random bits in simulation of BPP [AKS87,ZI89,G95] Overhead O(1) per new node addition. 33 34 35 Networking questions Congestion It is How fast can you crawl the WWW? Can you search a P2P network with low overhead? Are there strategies to improve crawling and searching? How can you maintain a well connected topology? How about distributed and dynamic networks? Cover time It is Conductance How does delay scale in routing? Does packet drop (blocking) scale? Are network resources used efficiently? Is there load balancing? Does the network evolve towards monopolies? Mixing time It is 36 Networking is growing at an unprecedented rate and it is rich with algorithmic questions. In particular, it raises novel new questions related to expander graphs. How can we maintain an expander in a distributed way under dynamic settings or arriving and departing nodes? Can we develop efficient distributed algorithms that discover critical links in the network? 37