MINFS544: Network-based Business Intelligence (BI) Feb 19th, 2013 Daning Hu, Ph.D., Department of Informatics University of Zurich F Schweitzer et al. Science 2009 Stop Contagious Failures in Banking Systems During 2008 financial tsunami, which bank(s) we should inject capital first to stop contagious failures in bank networks? 2 Utilize Peer Influence in Online Social Networks Intelligent Advertising, Product Recommendation Who are the most influential people? What are the patterns of information diffusion?3 Develop Strategies to Attack Terrorist Networks A Global Salafi Jihad Terrorist Network Hu et al. JHSEM 2009 How to effectively break down a terrorist network? 4 Network-based Business Intelligence Network-based (Modeling and Analysis) Modeling and analyzing various real-world social and organizational networks to understand: the cognitive and economic behaviors of the network actors; and the dynamic processes behind the network evolution Based on the above… Business Intelligence (BI) Design network-based BI algorithms and information systems to provide decision support in various application domains Financial Risk Management, Security Informatics, and Knowledge Management, etc. Network Analysis, Simulation of Network Evolution, Data Mining, etc. 5 MINFS544: Network-based Business Intelligence • Lecturer: Dr. Daning Hu; Teaching Assistant: Dr. Jiaqi Yan • Email: hdaning@ifi.uzh.ch jackiejqyan@gmail.com • Credits: 3 ECTS credits • Class Meetings: Tue 14:00-15:45 PM, or Thu 10:15 –12:00 pm (Please see the schedule) • Language: English • Audience: Master and doctoral students • Office Hours: Tue 13:00–14:00 PM, Room 2.A.12 • Grading: Course report (term paper) 70%, presentation 20%, participation 10% Grading • 1. A full research paper (70%). The format of this paper can be found at: http://icis2012.aisnet.org/index.php/submissions – * If possible, get it published in ICIS 2013 and get it cited. • This paper should include answers to the following questions: – – – – – What is the problem? Why is it interesting and important? Why is it hard? Why have previous approaches failed? What are the key components of your approach? What 1) models, 2) data sets and 3) metrics will be used to validate the approach? 7 Grading • 2. Oral presentation of the paper (using slides) + Q&A (20%) • For presentations, please see slide on How to give a good research talk at: • http://research.microsoft.com/enus/um/people/simonpj/papers/giving-a-talk/giving-a-talk.htm • 3. Active participations and interactions (10%) 8 Course Schedule Date Event 19.02.13 Course introduction, Kick-off meeting 20.02.13 – 04.03.13 One to one meeting 05.03.13 Research method tutorial 06.03.13 – 18.03.13 One to one meeting on research progress 19.03.13 Lecture, feedback and discussion 20.03.13 – 10.04.13 Writing Research-in-Progress Paper (RIP) 11.04.13 Lecture, feedback and discussion 12.04.13 – 02.05.13 Writing full paper 25.04.13, 16.05.13 Presentation days, feedback and discussion 29.05.13 Final paper due Deliverable 1 page summary due 3 pages literature review due 5 – 8 pages RIP due 8 – 12 pages full paper due (first deadline) A Brief History of Network Science 1736 Mathematical foundation – Graph Theory 1930 Social Network Analysis and Theories Sociogram: Network visualization Six degree of separation Structural hole: Source of innovation 1990 2000 (Physicists) Complex Network Topologies Small-world model (e.g., WWW) Scale-free model (“Rich get richer”) Network Science Economic networks (Agent modeling & simulation) Dynamic network analysis BI applications: product diffusion in social media, recommendation systems 2012 ? 10 Outline Introduction Dynamic Analysis of Dark Networks A Global Salafi Jihad (GSJ) Terrorist Network A Narcotic Criminal Network A Network Approach to Managing Bank Systemic Risk Ongoing Work Conclusion 11 Dynamic Network Analysis (DNA) Studying dynamic link formation processes behind network evolution. Nodes forming links What Model the changes in network evolution Temporal changes in network topological measures Dynamic network recovery on longitudinal data Network Evolution Why Statistical analysis of determinants behind link formation Homophily Preferential attachment How Simulate the evolution of networks Agent-based Modeling and Simulation Examine network robustness Shared affiliations 12 Research Testbed: A Global Terrorist Network The Global Salafi Jihad (GSJ) network data is compiled by a former CIA operation officer Dr. Marc Sageman - 366 terrorists friendship, kinship, same religious leader, operational interactions, etc. geographical origins, socio-economic status, education, etc. when they join and leave GSJ The goal of dynamic analysis gain insights about the evolution of GSJ network develop effective attack strategies to break down GSJ network Sample data of GSJ terrorists 13 a 14 15 Dynamic Network Analysis Studying dynamic processes (i.e., link formation) behind network evolution. Nodes’ behaviors Network Evolution What Why Model the changes Statistical analysis of in network evolution Temporal changes in network topological measures Dynamic network recovery on longitudinal data determinants behind link formation Homophily Preferential attachment How Simulate the evolution of networks Agent-based Modeling and Simulation Examine network robustness Shared affiliations 16 Temporal Changes in Network-level Measures Average Degree <k > 0.24 0.21 14 degree 12 10 8 <k> probability of degree 16 0.18 0.15 1990 1991 0.12 1993 0.09 Poisson 0.06 6 0.03 4 0.00 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 2 0 19 89 19 9 19 0 9 19 1 92 19 9 19 3 94 19 9 19 5 9 19 6 97 19 9 19 8 99 20 0 20 0 0 20 1 02 20 03 b 0.24 Fig.1. The temporal changes in the (a) average degree, (b) and (c) degree distribution 0.21 probability of degree a 0.18 0.15 1995 0.12 1997 1999 0.09 0.06 0.03 0.00 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 Degree = number of links a node has c Findings There are three stages for the evolution of the GSJ network: 1989 - 1993 The emerging stage: The network grows in size Accelerated Growth - No. of edges increases faster than nodes Random network topology (Poisson degree distribution) 1994 - 2000 The mature stage: The size of the network reached its peak in 2000 Scale-free topology (Power-law degree distribution) 2001 - 2003 The disintegration stage: Falling into small disconnected components after 9/11 18 Temporal Changes in Node Centrality Measures Degree Betweenness 60 5000 4500 50 4000 3500 40 3000 30 2500 2000 20 1500 1000 10 500 0 Figure.2. Temporal changes in Degree and Betweenness centrality of Osama Bin Laden 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 0 Degree: No. of links a node has Betweenness of a node i No. of shortest paths from all nodes to all others that pass through node i Measure i’s influence on the traffic (information, resource) flowing through it 19 Findings and Possible Explanations 1994 – 1996: A sharp decrease in Bin Laden’s Betweenness 1994: Saudi revoked his citizenship and expelled him 1995: Went to Sudan and was expelled again under U.S. pressure 1996: Went to Afghanistan and established camps there 1998 –1999: Another sharp decrease in his Betweenness After 1998 bombings of U.S. embassies, Bill Clinton ordered a freeze on assets linked to bin Laden (top 10 most wanted) August 1998: A failed assassination on him from U.S. 1999: UN imposed sanctions against Afghanistan to force the Taliban to extradite him 20 Research Testbed: A Narcotic Criminal Network The COPLINK dataset contains 3 million police incident reports from the Tucson Police Department (1990 to 2006). 3 million incident reports and 1.44 million individuals Their personal and sociological information (age, ethnicity, etc.) Time information: when two individuals co-offend AZ Inmate affiliation data: when and where an inmate was housed A Narcotic Criminal Network 19,608 individuals involved in organized narcotic crimes 29,704 co-offending pairs (links) Table 1. Summary of the COPLINK dataset and the Arizona inmate dataset Number of People Time Span COPLINK Narcotic Data Arizona Inmate Data Overlapped (identified by first name, last name and DOB) 36,548 165,540 19,608 1990 - 2006 1985 - 2006 17 years 21 Statistical Analysis of Determinants for Link Formation Proportional hazards model (Cox Regression Analysis) h(t, x1, x2, x3...) = h0 (t)exp(b1x1 + b2 x2 + b3 x3...) Homophily in age (group) and race Shared affiliations: Mutual acquaintances (through crimes) Vehicle affiliation (same vehicle used by two in different crimes) vehicle mutualacq Fig.3. Results of multivariate survival (Cox regression) analysis of triadic closure (link formation). age race gender 01 10 20 Hazard Ratio g 30 40 22 BI Application: Co-offending Prediction in COPLINK IBM’s COPLINK is an intelligent police information system aims to to help speed up the crime detection process. COPLINK calculates the co-offending likelihood score based on the proportional hazards model . A ranked list of individuals based on their predicted likelihood of co-offending with the suspect under investigation. Fig.4. Screenshots of the COPLINK system 23 Simulate Attacks on Dark Networks Three attack (i.e. node removals) strategies: Attack on hubs (highest degrees) Attack on bridge (highest betweenness) Real-world Attack (Attack order based on real-world data) Simulate two types of attacks to examine the robustness of the Dark networks Simultaneous attacks (the degree/betweenness of nodes are NOT updated after each removal) – Static Progressive attacks (the degree/betweenness of nodes are updated after each removal) – Dynamic 24 Hub Vs. Bridge Attacks Both hub and bridge attacks are far more effective than realworld arrests – Policy implications? Both Dark networks are more vulnerable to Bridge attacks than Hub attacks. Bridge (highest beweenness): Field lieutenants, operational leaders, etc. Hub (highest degree) : e.g., Bin Laden GSJ 1 0.9 0.8 S and <s> 0.7 S (Hub attacks) 0.6 S (Bridge attacks) 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 Fraction of nodes removed 0.8 1 25 Summary and Contributions We developed a set of Dynamic Network Analysis (DNA) methods that are effective in Linking network topological changes to analytical insights Systematically capturing the link formation processes Examining the determinants of link formation Dark networks are robust against real-world attacks but vulnerable to targeted bridge attacks COPLINK provides real-time decision support for fighting crimes. 26 Research Readings and Resources • 1. Networks Overview: • * Statistical mechanics of complex networks, Section III, VI – http://rmp.aps.org/abstract/RMP/v74/i1/p47_1 • * Networks, Crowds, and Markets: – http://www.cs.cornell.edu/home/kleinber/networks-book/ • 2. Networks in Finance: • * Financial Networks blog and research databases: – – – – WRDS database http://www.financialnetworkanalysis.com/research-database/ http://www.stern.nyu.edu/networks/electron.html * Company Board Social Networks 27 Research Readings and Resources (cont.) • 3. Networks in Marketing: – * Sinan Aral’s research in networks and marketing – Peer influence – http://web.mit.edu/sinana/www/ • * Social Media based Marketing: – http://searchengineland.com/guide/what-is-social-media-marketing • 4. Recommender Systems: – http://www-cs-students.stanford.edu/~adityagp/recom.html • 5. Word-of-Mouth Effects in Social Networks: – http://papers.ssrn.com/sol3/papers.cfm?abstract_id=393042& 28