ppt - Computer Science - University of Birmingham

advertisement
Mining Social Networks
www.the-data-mine.co.uk
Dr Andy Pryke
Andy@the-data-mine.co.uk
Commercial Programming Lecture
October 2011
Contents
www.the-data-mine.co.uk
 What are Social Networks
 Why Analyse Them?
 Analysis Techniques
 Example Applications
Social Network Analysis
www.the-data-mine.co.uk
 Also called Organizational Network Analysis
 Pre-dates data mining.
 Developed by sociologists and
anthropologists
 Formalise their understanding of family and
community relationships.
What is a Network
www.the-data-mine.co.uk
 Referred to technically as a "graph".
 Each person (or organisation etc.) is represented as a
node.
 Visually this is normally a dot or square.
 Connections are called “links” or “edges”
 Represented as a line.
 Indicates communications (e.g. emails), purchases, visits, or less
tangible things such as emotional relationships.
 Can be “directed” or “undirected”
e.g. On Twitter, you follow Stephen Fry, but he doesn’t follow you!
www.the-data-mine.co.uk
Source: Erickson Data Blog
Email communication Graph
www.the-data-mine.co.uk
Nodes = People
Links = Emails
Source: orgnet.com
Example - Mapping Links between Blogs
www.the-data-mine.co.uk
1 - Daily Kos
2 - BoingBoing
3 - LiveJournal Users
4 - Highly Interlinked Blogs
5 - Porn Blogs - not linked in
6 - Sports Blogs - Separate but connected
Sources:
http://discovermagazine.com/2007/may/map-welcome-to-the-blogosphere
http://datamining.typepad.com/gallery/blog-map-gallery.html
Example - Twitter Social Network
www.the-data-mine.co.uk
Source: Bruno Peeters
http://bvlg.blogspot.com/
2007/04/twitter-vrienden.html
www.the-data-mine.co.uk
Video
Nicholas Christakis
The hidden influence of social networks
TED Talk, Feb 2010
Applications of Social Network DM
www.the-data-mine.co.uk
Typical applications of social network analysis and data mining:
 Detection of criminal activity, Counter terrorism, "homeland
security" and intelligence
 Analysis of relationships within companies
 Sociological and anthropological studies
 Reciprocal trust schemes such as e-bay ratings
 Recommended friends on Facebook
 Filter or recommend social media content
 Etc….
Complex Network Example
www.the-data-mine.co.uk
Complex Network Example
www.the-data-mine.co.uk
www.the-data-mine.co.uk
How do we
Analyse Networks?
Graph Statistics - Individual Nodes
www.the-data-mine.co.uk
 Degree Centrality
 Number of connections to other nodes.
 High values mean many connections.
 Can measure links in and out separately
 Applications….
Graph Statistics - Individual Nodes
www.the-data-mine.co.uk
 Degree Centrality
 Number of connections to other nodes.
 High values mean many connections.
 Can measure links in and out separately
 Applications
 Who is most listened to on Twitter?
 Who has most contacts within a company?
 Which user’s reviews influence others the most?
Graph Statistics - Individual Nodes
www.the-data-mine.co.uk
 Closeness Centrality
 The average number of steps required to reach any
other node. Communications are easier if you don't
have to go through too many people.
 Applications...
Graph Statistics - Individual Nodes
www.the-data-mine.co.uk
 Closeness Centrality
 The average number of steps required to reach any
other node. Communications are easier if you don't
have to go through too many people.
 Applications
 Is this person central to the group?
 Is your message likely to reach the audience?
Graph Statistics - Individual Nodes
www.the-data-mine.co.uk
 Betweenness Centrality
 How much of a link between other nodes is this
node?
 Applications…
Graph Statistics - Individual Nodes
www.the-data-mine.co.uk
 Betweenness Centrality
 How much of a link between other nodes is this
node?
 Applications
 Someone who has a high betweenness centrality is
often a broker between others.
 What happens if this person leaves the network?
Graph Statistics - Networks as a Whole
www.the-data-mine.co.uk
 Structural holes
 Gaps in linkage between groups.
 Applications…
Graph Statistics - Networks as a Whole
www.the-data-mine.co.uk
 Structural holes
 Gaps in linkage between groups.
 Applications
 Bridges across this access information from both, suggesting
influence and understanding of an organisation.
 Can we create a bridge?
 Is there an opportunity to control or influence
communications between groups?
Graph Statistics - Networks as a Whole
www.the-data-mine.co.uk
 Degree of centralisation
 is the network held together by just a few nodes?
 Or is it more cohesive?
 Measures include average and variance of degree centrality
 Applications…
Graph Statistics - Networks as a Whole
www.the-data-mine.co.uk
 Degree of centralisation
 is the network held together by just a few nodes?
 Or is it more cohesive?
 Measures include average and variance of degree centrality
 Applications
 Is a crime network vulnerable to disruption?
 What happens to a company if a few key people leave?
Graph Statistics – More…
www.the-data-mine.co.uk
 There are many other measures, for examples see:
 http://faculty.ucr.edu/~hanneman/networkshop/index.html
 http://en.wikipedia.org/wiki/Social_network
Data Mining Approaches to Networks
www.the-data-mine.co.uk
 Structural Equivalence
 Find nodes with similar roles in the network
 Cluster Analysis
 Identify groups of nodes which are closely connected - and
characterise them
 Identifying the Most Influential People
 Predicting Node Types (e.g. Fraudster)
 Profiling Sub-networks (e.g. terrorist cell)
Twitter - Clustered Network
www.the-data-mine.co.uk
To reduce clutter, we
can cluster people who
reference each other,
and only show links
within clusters.
http://www.neoformix.com/2009/To
rontoTwitterCommunity.html
Data Mining Social Networks - Challenges
www.the-data-mine.co.uk
 Standard problems
 Incompleteness – We don’t know everything
 Incorrectness – What we think we know is wrong
 Inconsistency – We have contradictions in our data
 Data transformation - Getting data into a form acceptable
by your tools
 Fuzzy Boundaries - Networks do not normally have
distinct boundaries
 Network Dynamics - Relationships change over time
Example Application - Viral Marketing
www.the-data-mine.co.uk
"In our experiments with the Epinions knowledge-sharing
Web site, the most valuable customer had a network
value of over 20,000, meaning that marketing to that
customer was as effective as marketing to over 20,000
others in the absence of network effects, but the
customer's number of direct links to others in the network
(i.e., people who read his reviews) was much smaller."
Pedro Domingos, Mining Social Networks for Viral Marketing
http://www.cs.washington.edu/homes/pedrod/papers/iis04.pdf
Example - Identifying Academic Groups
www.the-data-mine.co.uk
Community Detection in Large-Scale Social Networks
Nan Du, Bin Wu, Xin Pei , Bai Wang and Liutong Xu,
SIGKDD Workshop on Web Mining and Social Network Analysis,
August 12-15, 2007, San Jose , California
Software for Social Network Analysis / DM
www.the-data-mine.co.uk
StatNet – R Packages - http://statnet.org/
StatNetTutorial - http://www.jstatsoft.org/v24/i09/paper
JUNG – Open Source Java toolkit for SNA - http://jung.sourceforge.net/
NetMiner - Commercial, Comprehensive SNA - http://www.netminer.com/
Pajek - Comprehensive Social Network Analysis, free for academic
use - http://pajek.imfm.si/doku.php
Subdue - Graph based data mining tool. Copyright but freely
downloadable - http://ailab.wsu.edu/subdue/
More - http://en.wikipedia.org/wiki/Social_network_analysis_software
Looking Forward
www.the-data-mine.co.uk
 Lots and lots of network data out there
 What about:




Applications for individuals
Social Applications (e.g. like TheyWorkForYou.com )
Applications within a University
Applications which make money
 Potential final year / M.Sc Projects ?
Mining Social Networks
www.the-data-mine.co.uk
Dr Andy Pryke
Andy@the-data-mine.co.uk
Commercial Programming Lecture
October 2011
www.the-data-mine.co.uk
Bibliography
Very out of date - do look for newer papers and references!
Bibliography - Overview
www.the-data-mine.co.uk
 Paper credited with launching the field Barnes, J. (1954). Class and Committees in a Norwegian Island
Parish. Human Relations, 7, 39-58.
 List of systems for Mining Graph data http://hms.liacs.nl/graphs.html
 Introduction to Social Network Analysis http://www.orgnet.com/sna.html
 Network Theory and Analysis in Organizations, a brief overview http://www.tcw.utwente.nl/theorieenoverzicht/Theory%20clusters/
Organizational%20Communication/Network%20Theory%20and%
20analysis_also_within_organizations.doc/
Bibliography - Journals and Workshops
www.the-data-mine.co.uk
 Social Networks Journal http://www.elsevier.com/wps/find/journaldescription.cws_home/50
5596/description
 Workshop on Link Analysis and Group Detection
http://kt.ijs.si/Dunja/LinkKDD2006/
 SIGKDD Workshop on Web Mining and Social Network
Analysis
http://workshops.socialnetworkanalysis.info/websnakdd2007/
Bibliography Data Mining Papers
www.the-data-mine.co.uk
 Maitrayee Mukherjee, and Lawrence B. Holderm, Graph-based Data Mining
on Social Networks - http://www2.cs.cmu.edu/~dunja/LinkKDD2004/Maitrayee-Mukherjee-LinkKDD-2004.pdf
 Ingrid Fischer and Thorsten Meinl, Graph Based Molecular Data Mining - An
Overview - http://www2.informatik.unierlangen.de/Forschung/Publikationen/download/graphBasedDM_SMC2004.
pdf
 Jennifer Xu and Hsinchun Chen, Criminal Network Analysis and
Visualization: A Data Mining Perspective, Communications of the ACM http://ai.eller.arizona.edu/COPLINK/publications/crimenet/Xu_CACM.doc
Bibliography - Data Mining Papers (2)
www.the-data-mine.co.uk
 Pedro Domingos, Mining Social Networks for Viral Marketing http://www.cs.washington.edu/homes/pedrod/papers/iis04.pdf
 David Jensen and Jennifer Neville, Data Mining in Social
Networks - Looks specifically at predicting film receipts from
IMDB data http://kdl.cs.umass.edu/papers/jensen-neville-nas2002.pdf
 Bootstrapping the FOAF-Web: An Experiment in Social Network
Mining - http://www.w3.org/2001/sw/Europe/events/foafgalway/papers/fp/bootstrapping_the_foaf_web/
www.the-data-mine.co.uk
www.the-data-mine.co.uk
www.the-data-mine.co.uk
Impact of Computers on SNA
www.the-data-mine.co.uk
The rise in the power and use of computers
has had two main impacts.
1. New data is available from logs of email
conversations, phone calls, chat and website
usage, facebook friends, tweets etc...
2. Computers can be employed for analysis and
data mining.
Role of computer analysis
www.the-data-mine.co.uk
 Data collected about social networks can be complex
and large.
 Imagine a network documenting each purchase you've
made using a credit/debit card, every phone call and
SMS, each email etc.
 When these kinds of data are collected over large
populations, the resulting graphs are much too large to
be understood by eye.
Download