SI508-Pajek

advertisement
Introductory Social Network Analysis
with Pajek
(Lecture for SI508 Networks: Theory and Application)
Sept 16, 2009
Instructor: Qiaozhu Mei
School of Information
University of Michigan
2009 © University of Michigan
1
Overview of Network Analysis Tools
Pajek
network analysis and visualization,
menu driven, suitable for large networks
Netlogo
agent based modeling
recently added network modeling capabilities
GUESS
network analysis and visualization,
extensible, script-driven (jython)
platforms: Windows (on linux
via Wine)
download
platforms: any (Java)
download
platforms: any (Java)
download
Other software tools that we will not be using but that you may find useful:
visualization and analysis:
UCInet - user friendly social network visualization and analysis software (suitable smaller networks)
iGraph - if you are familiar with R, you can use iGraph as a module to analyze or create large networks, or you can directly use the C functions
Jung - comprehensive Java library of network analysis, creation and visualization routines
Graph package for Matlab (untested?) - if Matlab is the environment you are most comfortable in, here are some basic routines
SIENA - for p* models and longitudinal analysis
SNA package for R - all sorts of analysis + heavy duty stats to boot
NetworkX - python based free package for analysis of large graphs
InfoVis Cyberinfrastructure - large agglomeration of network analysis tools/routines, partly menu driven
visualization only:
GraphViz - open source network visualization software (can handle large/specialized networks)
TouchGraph - need to quickly create an interactive visualization for the web?
yEd - free, graph visualization and editing software
specialized:
fast community finding algorithm
finally: INSNA long list of SNA packages
motif profiles
CLAIR library - NLP and IR library (Perl Based) includes network analysis routines
2009 © University of Michigan
Tools We’ll Use in this Course
• Pajek: extensive menu-driven functionality, including
many, many network metrics and manipulations
– but… not extensible
• Guess: extensible, scriptable tool of exploratory data
analysis, but more limited selection of built-in methods
compared to Pajek
• NetLogo: general agent based simulation platform with
excellent network modeling support
– many of the demos in this course were built with NetLogo
• NetDraw: network visualization tool associated with
UCInet. UCInet is not free, but NetDraw is.
2009 © University of Michigan
Other Tools: gephi
• http://gephi.org
• primarily for visualization, has some nice
touches
2009 © University of Michigan
Other Visualization Tools:
Walrus
•
developed at CAIDA available under the GNU GPL.
•
“…best suited to visualizing moderately sized graphs that are
nearly trees. A graph with a few hundred thousand nodes
and only a slightly greater number of links is likely to be
comfortable to work with.”
Java-based
Implemented Features
•
•
–
–
–
–
–
–
–
rendering at a guaranteed frame rate regardless of graph size
coloring nodes and links with a fixed color, or by RGB values
stored in attributes
labeling nodes
picking nodes to examine attribute values
generating subgraph: displaying a subset of nodes or links
based on a user-supplied boolean attribute
interactive pruning of the graph to temporarily reduce clutter and
occlusion
zooming in and out
Source: CAIDA, http://www.caida.org/tools/visualization/walrus/
2009 © University of Michigan
Visualization Tool: GraphViz
•
•
•
•
Takes descriptions of graphs in simple text languages
Outputs images in useful formats
Options for shapes and colors
Standalone or use as a library
• dot: hierarchical or layered drawings of directed graphs,
by avoiding edge crossings and reducing edge length
• neato (Kamada-Kawai) and fdp (Fruchterman-Reinhold
with heuristics to handle larger graphs)
• twopi – radial layout
http://www.graphviz.org/
• circo – circular layout
2009 © University of Michigan
Dot (GraphViz)
2009 © University of Michigan
Visualization Tools: YEd - JavaTM Graph Editor
http://www.yworks.com/en/products_yed_about.htm
(good primarily for layouts, scales better, maybe free)
2009 © University of Michigan
yEd and 26,000 Nodes (Takes a Few Seconds)
2009 © University of Michigan
Visualization Tools: Prefuse
• (free) user interface toolkit for interactive information visualization
–
–
–
–
–
built in Java using Java2D graphics library
data structures and algorithms
pipeline architecture featuring reusable, composable modules
animation and rendering support
architectural techniques for scalability
• requires knowledge of Java programming
• website: http://prefuse.sourceforge.net/
– CHI paper http://guir.berkeley.edu/pubs/chi2005/prefuse.pdf
2009 © University of Michigan
Simple Prefuse Visualizations
Source: Prefuse, http://prefuse.sourceforge.net/
2009 © University of Michigan
Prefuse Application: Flow Maps
A flow map of migration from California
from 1995-2000, generated automatically
by Prefuse system using edge routing but
no layout adjustment.
 http://graphics.stanford.edu/papers/flow_map_layout/
2009 © University of Michigan
Prefuse Application: Vizster
 http://jheer.org/vizster/
2009 © University of Michigan
Visualization Tool: Manyeyes
• http://manyeyes.alpha
works.ibm.com/manye
yes/
• Only for Visualization
• Not just for networks,
but many other data
type
• Web based, very easy
to use
2009 © University of Michigan
14
Outline
• In Pajek
–
–
–
–
–
–
visualization and layouts
degree
connected components
snowball sampling
one mode projections of bipartite graphs
thresholding weighted graphs
• Fetching you Facebook network
– What is an egonetwork?
2009 © University of Michigan
Using Pajek for Exploratory Social
Network Analysis
• Pajek – (pronounced in Slovenian as Pah-yek) means ‘spider’
• website: http://vlado.fmf.uni-lj.si/pub/networks/pajek/
• wiki: http://pajek.imfm.si/doku.php
–
–
–
–
download application (free)
tutorials
lectures
data sets
• Windows only (works on Linux via Wine, Mac via Darwine)
• helpful book: ‘Exploratory Social Network Analysis with Pajek’ by
Wouter de Nooy, Andrej Mrvar and Vladimir Batagelj
2009 © University of Michigan
Pajek: Interface
we’ll use today
Drop down list of networks opened or created with pajek. Active is displayed
Drop down list of network partitions by discrete variables, e.g. degree, mode, label
Drop down list of continuous node attributes, e.g. centrality, clustering coefficients
can be used for clustering
Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download
2009 © University of Michigan
Pajek: Opening a Network File
click on folder icon
to open a file
Save changes to your network, network partitions, etc., if you’d like to keep them
Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download
2009 © University of Michigan
Pajek: Working with Network Files
• The active network, partition, etc is shown on top of the
drop down list
Draw the network
Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download
2009 © University of Michigan
Pajek data format
Louise
Ada
Cora
directed edges
from Ada(1) to Louise(3) w/
eight “2” and color Black
undirected edges
between Ada(1) to Cora(2) w/
weight “1” and color Black
number of vertices
*Vertices 26
1 "Ada"
2 "Cora"
3 "Louise"
..
*Arcs
1 3 2 c Black
1 2 1 c Black
2 1 1 c Black
..
*Edges
2 3 1 c Black
..
0.1646
0.0481
0.3472
2009 © University of Michigan
vertex x,y,z coordinates (optional)
0.2144
0.3869
0.1913
0.5000
0.5000
0.5000
Pajek: Let’s Get Started
• Opening a network
– File  Network  Read
• Visualization
– Draw  Draw
• Essential measurements
–…
2009 © University of Michigan
Pajek: Opening a File
• A planar graph and layouts in Pajek
• Download the file 'planarnet.net' from the Ctools website.
• Open it in Pajek by either clicking on the yellow folder
icon under the word "Network" or by selecting
FileNetworkRead from the main menu panel
• A report window should pop up confirming that the
graph has been read and the filename and location will
be displayed in the 'active' position of the network
dropdown list
2009 © University of Michigan
Pajek: Visualization & Manual Positioning
•
Visualize the network using Pajek's DrawDraw
command from the main menu panel.
• This will bring up the 'draw' window with its own menu
bar at the top
• Reposition the vertices by clicking on them and holding
down the mouse button while dragging them to a new
location. Continue doing this until you have shown that
the graph is planar (no edges cross have to cross )
• (If you think this is really fun to do in your spare time, go
to http://www.planarity.net)
2009 © University of Michigan
Pajek: Visualization & Layout Algorithms
• Now let Pajek do the work for you by selecting from the
draw toolbar several layout algorithms under
'LayoutEnergy'.
• Why did you select the layout algorithm you did?
• Did the layout leave any lines crossed? If you were to
do this assignment over, what order would you do it in?
2009 © University of Michigan
A Directed Network
•
•
girls’ school dormitory dining-table partners (Moreno, The sociometry reader, 1960)
first and second choices shown
Louise
Ada
Lena
Adele
Marion
Jane
Frances
Cora
Eva
Maxine
Mary
Anna
Ruth
Edna
Robin
Betty
Martha
Jean
Laura
Alice
Hazel
Helen
Ellen
Ella
Irene
2009 © University of Michigan
Hilda
Node Centrality: Degree
• Node network properties
– from immediate connections
indegree=3
• indegree
how many directed edges (arcs) are incident on a node
outdegree=2
• outdegree
how many directed edges (arcs) originate at a node
• degree (in or out)
number of edges incident on a node
– labels
2009 © University of Michigan
degree=5
Centrality: Degree
• More on degree and other centrality measures in
the next lecture…
• Degree: calculate it
– Net  Partitions  Degree
• Visualize degree centrality
– DrawDraw-Vector
– If nodes are not the right size, use resize option
• Options  Sizeof Vertices
• Adjust the default size
2009 © University of Michigan
Connected Components
• Strongly connected components
– Any two nodes in the component
can be reached from each other by
following directed edges




BCDE
A
GH
F
B
F
C
A
E
D
G
H
• Weakly connected components: every node can either reach
or be reached from every other node by following directed edges
 ABCDE
 GHF
• In undirected networks one talks simply about
“connected components”
2009 © University of Michigan
The bowtie model of the Web
Broder et al. (1999)
•
SCC (strongly connected
component):
– can reach all nodes from any
other by following directed edges
•
IN
– can reach SCC from any node in
‘IN’ component by following
directed edges
•
OUT
– can reach any node in ‘OUT’
component from SCC
•
Tendrils and tubes
– connect to IN and/or OUT
components but not SCC
•
Disconnected
– isolated components
2009 © University of Michigan
Let’s try this on the dining table partners
• Suppose each girl will share her dish (and any dish that
is shared with her), with only her 1st and 2nd choice
– Find groups of girls who can all sample each other's dishes
• Net Components Strong
• Draw  Draw-partition
– Which girls will get to sample no other dishes but their own?
• Net  Partitions  Degree  Input
– Display the network of strongly connected components
• Operations  Shrink Network  Partition
• Draw  Draw-partition
– Identify the bowtie structure
• Net  Partitions  Bow-Tie
– What part of the bowtie is missing? What does this imply for the
circulation of the different dishes?
2009 © University of Michigan
Snowball Sampling
• Dining partners:
You are a prince who just met an enchanting young lady at a ball,
but she left at the stroke of midnight and left a shoe behind. Now
you'd like to find the shoe's owner. All you know about her is that
she lives in this particular girls' dorm. The headmistress won't let you
talk to the girls, so the only way you can find your princess is to
covertly ask the one girl you know, Ella, to introduce you to her two
favorite friends. Once you know her friends, you can ask them to
introduce you to their two favorite friends, etc. This is the snowball
sampling technique we covered in class.
Highlight the vertices that you will reach using snowball sampling
– Net  K-Neighbors  ...
• Which girls will you not find using snowball sampling starting with Ella (23)?
2009 © University of Michigan
Bipartite networks
2009 © University of Michigan
Going from a Bipartite to a One-mode
group 1
Graph
 Two-mode network
• One mode projection
group 2
– two nodes from the first
group are connected if
they link to the same
node in the second
group
– some loss of information
– naturally high
occurrence of cliques
2009 © University of Michigan
Let’s Try it on the Actors Network
• Open actorsandmovies.net
• Create a 2-mode partition (newer versions of Pajek do
this automatically)
– NetPartition2-Mode
• Draw the network
– DrawDraw-Partition (the two classes of nodes should be
colored differently)
– If labels are not shown, add them:
Options Mark Vertices UsingLabels
• Experiment with different layout algorithms
– “Layout” menu in the draw window
2009 © University of Michigan
Actors Network (continued)
• Transform the network into a one-mode network
– NetTransform2-Mode to 1-ModeRows
•
Draw the network
– DrawDraw
•
Qualitatively compare the structure of the 2-Mode to the 1-Mode network. Is
there a loss of information?
• Show the weights on each edge
– OptionsLinesMark Lineswith Values
•
What do the values represent?
• Compute the unweighted degree of each node
– NetPartitionsDegreeAll
• Draw the network using
– DrawDraw-vector
• How is the degree represented?
• Add the vector value to each vertex (it will be the degree/(max
possible degree))
– OptionsMark Vertices UsingVector Values
• Who are the most important actors using this measure?
2009 © University of Michigan
Actors Network (continued)
• How does the boundary of the network (i.e. who is
included) affect who is found to be most central?
– Load the file actorsandmoviesWithGere.net. It contains one extra
actor, Richard Gere. Repeat the above procedure.
– In the 1-mode network of actors, is there a change in who is
most central?
– What does this tell you about biases and boundaries in sample
selection?
• Remove all edges between actors who have costarred in
fewer than 3 movies
– NetTransformRemoveLines with valueLower than
• Which actors comprise the central core of this network
2009 © University of Michigan
Grabbing Your Data from Facebook via
Nexus
2009 © University of Michigan
Pajek: Wrap Up
• Used frequently by sociologists
– UCInet is comparable and arguably more user friendly (but not
free)
• Extensive functionality
– But not extendable
• What we covered
–
–
–
–
–
–
visualization
node properties: degree
connected components
k-neighbors
converting two-mode networks to one-mode
thresholding the network
2009 © University of Michigan
Download