Master’s Seminar November 2012 Modelling, Mining, and Searching Networks Anthony Bonato Ryerson University Networks - Bonato 1 21st Century Graph Theory: Complex Networks • web graph, social networks, biological networks, internet networks, … Networks - Bonato 2 • a graph G = (V(G),E(G)) consists of a nonempty set of vertices or nodes V, and a set of edges E nodes edges • directed graphs (digraphs) Networks - Bonato 3 Degrees • the degree of a node x, written deg(x) is the number of edges incident with x First Theorem of Graph Theory: deg(x) 2 | E(G) | xV(G) Networks - Bonato 4 The web graph • nodes: web pages • edges: links • over 1 trillion nodes, with billions of nodes added each day Networks - Bonato 5 Ryerson Nuit Blanche City of Toronto Four Seasons Hotel Frommer’s Greenland Tourism Networks - Bonato 6 Small World Property • small world networks introduced by social scientists Watts & Strogatz in 1998 – low distances between nodes Networks - Bonato 7 Power laws in the web graph • power law degree distribution b Ni,n i n, some b 2 (Broder et al, 01) Networks - Bonato 8 Geometric models • we introduced a stochastic network model which simulates power law degree distributions and other properties – Spatially Preferred Attachment (SPA) Model • nodes have a region of influence whose volume is a function of their degree Networks - Bonato 9 SPA model (Aiello,Bonato,Cooper,Janssen,Prałat, 09) • as nodes are born, they are more likely to enter a region of influence with larger volume (degree) • over time, a power law degree distribution results Networks - Bonato 10 Networks - Bonato 11 Biological networks: proteomics nodes: proteins edges: biochemical interactions Yeast: 2401 nodes 11000 edges Networks - Bonato 12 Protein networks • proteins are essential macromolecules of life • understanding their function and role in disease is of importance • protein-protein interaction networks (PPI) – nodes: proteins – edges: biochemical interaction Networks - Bonato 13 Domination sets in PPI (Milenkovic, Memisevic, Bonato, Przulj, 2011) • dominating sets in graphs • we found that dominating sets in PPI networks are vital for normal cellular functioning and signalling – dominating sets capture biologically vital proteins and drug targets – might eventually lead to new drug therapies Networks - Bonato 14 Social Networks nodes: people edges: social interaction (eg friendship) Networks - Bonato 15 On-line Social Networks (OSNs) Facebook, Twitter, LinkedIn, Google+… Networks - Bonato 16 Lady Gaga is the centre of Twitterverse Dalai Lama Arnold Schwarzenegger Queen Rania of Jordan Anderson Cooper Lady Gaga Networks - Bonato 17 6 degrees of separation • Stanley Milgram: famous chain letter experiment in 1967 Networks - Bonato 18 6 Degrees in Facebook? • 1 billion users, > 70 billion friendship links • (Backstrom et al., 2012) – 4 degrees of separation in Facebook – when considering another person in the world, a friend of your friend knows a friend of their friend, on average • similar results for Twitter and other OSNs Networks - Bonato 19 Dimension of an OSN • dimension of OSN: minimum number of attributes needed to classify nodes • like game of “20 Questions”: each question narrows range of possibilities • what is a credible mathematical formula for the dimension of an OSN? Networks - Bonato 20 GEO-P model (Bonato, Janssen, Prałat, 2012) • reverse engineering approach – given network data GEO-P model predicts dimension of an OSN; i.e. the smallest number of attributes needed to identify users • that is, given the graph structure, we can (theoretically) recover the social space Networks - Bonato 21 6 Dimensions of Separation OSN Dimension YouTube Twitter Flickr Cyworld 6 4 4 7 Networks - Bonato 22 Cops and Robbers C C R C Networks - Bonato 23 Cops and Robbers C C R C Networks - Bonato 24 Cops and Robbers C R C C cop number c(G) ≤ 3 Networks - Bonato 25 Cops and Robbers • played on reflexive undirected graphs G • two players Cops C and robber R play at alternate time-steps (cops first) with perfect information • players move to vertices along edges; allowed to moved to neighbors or pass • cops try to capture (i.e. land on) the robber, while robber tries to evade capture • minimum number of cops needed to capture the robber is the cop number c(G) – well-defined as c(G) ≤ |V(G)| Networks - Bonato 26 Applications of Cops and Robbers • moving target search – missile-defense – gaming • counter-terrorism – intercepting messages or agents Networks - Bonato 27 How big can the cop number be? • if the graph G with order n is disconnected, then the cop number can be as n • if G is connected, then no one knows how big the cop number can be! • Meyniel’s Conjecture: c(G) = O(n1/2). Networks - Bonato 28 Networks - Bonato 29 Example of a variant The robber fights back! • robber can attack neighbouring cop C C R C • one more cop needed in this graph (check) • Conjecture: For any graph with this modified game, one more cop needed than for usual cop number. Networks - Bonato 30 Thesis topics • what precisely is a community in a complex network? • biological network models – more exploration of dominating sets in PPI • fit GEO-P model to OSN data – machine learning techniques • new models for complex networks • Cops and Robbers games – Meyniel’s conjecture, random graphs, variations: good vs bad guy games in graphs Networks - Bonato 31 Good guys vs bad guys games in graphs bad good slow slow medium fast helicopter eternal security traps, tandem-win medium robot vacuum Cops and Robbers edge searching fast cleaning distance k Cops and Robbers Cops and Robbers The Angel on disjoint edge and Devil sets seepage Helicopter Cops and Robbers, Marshals, The Angel and Devil, Firefighter helicopter Networks - Bonato Hex 32 Brief biography • over 80 papers, two books, two edited proceedings, with 40 collaborators (many of which are my students) • over 250K in research funding in past 6 years – grants from NSERC, Mprime, and Ryerson • supervised 8 masters students, 2 doctoral, and 7 postdocs • over 30 invited addresses world-wide (India, China, Europe, North America) • won 2011 and 2009 Ryerson Research awards • editor-in-Chief of journal Internet Mathematics; editor of Contributions to Discrete Mathematics Networks - Bonato 33 AM8204 – Topics in Discrete Mathematics • Winter 2012 • 6 weeks each: complex networks, graph searching • project based • Prequisite: AM8002 (or permission from me) Networks - Bonato 34 Graphs at Ryerson (G@R) Networks - Bonato 35