Large Scale Social Network Analysis Using Semantic Web Technologies

Semantics for Big Data
AAAI Technical Report FS-13-04
Large Scale Social Network Analysis Using
Semantic Web Technologies
Dominic DiFranzo, Qingpeng Zhang, Kristine Gloria, and James Hendler
Tetherless World Constellation, Rensselaer Polytechnic Institute
difrad@rpi.edu
Abstract
We focused on characteristics and behavioral patterns
associated with “trust networks”. Traditional analyses of
these networks concentrate on the trust formed between
people who connect under a certain context or to achieve a
certain goal (Golbeck 2008). In studying trust-based
relationships, we can better understand the multi-level
complexities of trust and how it relates to social
phenomena like mentoring, expertise, clandestine
behaviors etc (Ahmad 2012, 2010). We, like many other
researchers, also recognize that online trust uniquely
exploits the qualitative effects of our social contexts and
environments. Thus, to explore trust in online
communities, also presents an ideal litmus test for the need
of mixed methodology and tools.
The TWC’s response dissects trust further by leveraging
YarcData’s uRiKA technology to perform analysis on
large-scale non-partitionable networks like our EQ2 data.
We situate our own assumptions of trust networks by citing
the results of two separate works: “The Formation of TaskOriented Groups” (Huang, et.al., 2010) and “Trust Among
Rogues? A Hypergraph Approach for Comparing
Clandestine Trust Networks in MMOGs” (Ahmad, et.al.,
2010).
The first study examines team formation and
collaboration among users in EQ2 as they accomplish
combat activities while the second looks at housing
permissions granted between players. In EQ2, “housing
permissions” refer to the ability of a players have to grant
other players permission to enter their in-game houses,
move objects around in them, or even remove objects from
the house (ibid).
The owner of the house has the ability to assign access
level to other users in the game. These levels are: Trustee
(all rights that an owner has), Friend (can access the house
and move objects in the house), Visitor (can only enter the
house, can’t move things), None (can not enter the house).
In fall 2012, YarcData issued the Graph Analytics
Challenge, which called for the best submission of unpartitionable, big data graph problems. The team at the
Tetherless World Constellation (TWC) at Rensselaer
Polytechnic Institute (RPI) took up this challenge and
submitted an aggressive proposal to unpack the
complexities of human online social behavior. In particular,
TWC’s contribution explored notions of trust networks in
EverQuest II (EQ2), a Massively Multiplayer Online Role
Playing Game. The following presents: what social concepts
were explored, how we pursued its discovery, and why the
YarcData system best suited our needs. We conclude with a
brief discussion of the lessons learned and future projects.
Introduction
The marriage between computer science and social science
continues to yield fruitful, unique results significantly
influencing traditional social discourse. Particularly, the
Web has provided a concentrated space for social scientists
to examine human interaction at various levels and
scenarios. These studies point to human behaviors that are
present both online and offline making a strong case for
these findings as indicative of general human behavior
(Turkle 1995, 2005, 2011; boyd 2008, 2009; Rheingold
2002). However, we dispute that despite these
achievements there remains significant limitations to the
studies. Specifically, we argue that most existing works
focus on the topological features of social networks formed
by interactions. Due to the lack of data and the
multidimensional nature of these interactions, the more
detailed social behaviors have largely been ignored. We
suggest that this lack of data leads to a gap in meaningful
research from perspectives of social and organizational
sciences.
Copyright © 2013, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
20
Ahmad highlights that the housing permission network
“is a ready proxy for the level of trust amongst characters”
(ibid, p. 2). Both works are significant in their claims that
behaviors exhibited offline materialize online as well. For
example, in accomplishing team-oriented tasks either
online or offline, individuals who lack the ability or skill
set to accomplish a task are more likely to seek group help.
While these findings are important, we infer that much
more can be determined.
EverQuest II allows players to create multiple
characters, and each character can only play and interact in
one server (a server in this case acts as a parallel version of
the game, each with their own specialties). The four
servers in our records are Nagafen (Player vs Player server.
This means players are allowed to fight and kill each
other), Guk (Player vs Environment. This means players
can’t fight each other, they only fight computer controlled
characters), Antonia Bayle (Also a Player vs Environment,
but encourages users to role play as their character and act
like them in game), and The Bazaar (a Player vs
Environment that allows users to train and sell in game
commodities). The differences in these servers let us see
how these different rules effect the interaction and trust in
the game.
We emphasize that the two works mentioned above
serve primarily as a frame of reference in evaluating the
value of our data collected. Our results build upon these
findings. Moreover, as described later in the submission,
we leverage linked data technologies to strip away
limitations such as size and schema. For unlike previous
studies, this study examined cross server, multi-world
interactions over various months at a time. The total size of
our dataset is the largest of our knowledge featuring over
35 billion triples (4.5 terabytes of data). The detail results
include (but are not limited to) the following: activity type,
housing permissions held and granted, frequency of play,
etc. Instead of just reviewing the types of tasks completed
by a sampling of EQ2 players, we asked: Has this player
developed a relationship with anyone else in his task grou
and if so, for how long? Does trust flow between networks
and how does this materialize?
the amount of data that we used and queried. For this
study, we looked only at the housing network, the mentor
network, and one month (September 2006) of the
experience network. This in total was 1,270,497,287
triples (about 220 GB of data).
In our main analysis, we looked at the trust levels in the
housing network, and explored the connections between
the housing network and the mentorship network. We
wanted to explore the sequence and causality between the
trust in housing network and the mentorship relationship.
In order to answer this question, we queried the data for all
mentors that also owned a house, and had set a house
access level to their mentee. This returned back a list of all
mentors and mentees that also have a housing relationship,
along with attributes for each player.
We found that the majority of the mentors and mentees
have set the highest house access level (Trustee), which
indicates the strong trust between them. Next, we wanted
to see if there is any difference in the trust level and the
time they established the mentorship relationship? In other
words, does the trust level tend to be higher if it was built
after they establishment of the mentorship relationship?
Whether the trust was built before or after their mentorship
relationship?
Figure 1: Proportion of trust under three
conditions.
We constructed the proportion of trust levels under three
conditions as seen in Figure 1: (a) housing trust first, (b)
mentorship first, and (c) the same time. We could find that
more than half of the mentor-mentee pairs set the highest
trust (Trustee) in their housing relationship under
conditions (a) and (b), while when the two relationships
were built the same time, nearly 40% of the mentormentee pairs set to the second highest trust (Friend). We
hypothesize that under condition (a), when housing trust
was built first, they should have had other connections
(other than mentorship relationship) already; under
condition (b), when mentor-mentee pairs established the
housing trust, they had been working together as mentor
and mentee, a strong relationship in virtual world, thus
they were more likely to build a strong trust in housing
Method
As we outlined, our project highlights the power that
large-scale social network analysis can have, and how it
can be done using the new tools and technologies created
by YarcData at scale. We used data collected from
EverQuest II (EQ2), a Massively Multiplayer Online Role
Playing Game. Our data set includes over 35 billion triples
representing over 2 million players and over a billion
recorded interactions within ten months of play. Due to
limitations of the YarcData servers, we had to scale down
21
relationship as well. Under condition (c), they tended to
have no previous relationships before, which decreased the
average level of trust they established. More research
needs to be done to validate our hypothesis and will be
some of our future work.
We further studied the distribution of the time intervals
between the establishment of housing trust and mentorship
600
300
500
Count
200
150
Housing trust first
10
400
300
1
1
1
10
100
Time interval (# of days)
1000
1
10
100
Time interval (# of days)
1000
100
50
0
-300
-200
-100
0
100
Time interval (# of days)
200
0
-300
300
-200
-100
0
100
Time interval (# of days)
200
300
C
D 350
450
400
250
300
Count
250
200
D 1000
1000
100
300
350
Count
Mentorship first
100
10
200
100
C
Housing first
Mentorship first
Housing trust first
Mentorship first
10
10
200
150
1
1
1
150
100
Housing trust first
Count
Count
250
Mentorship first
100
Count
B 700
350
B 1000
1000
Count
400
A
Count
A
in time complexity with respect to the number of nodes and
edges in a graph. In addition to this, graphs are difficult to
cut and send to different processes running in parallel
without a heavy amount of communication between these
processes, which increase the complexity of a given task.
Moreover, what parallel computing gains in power and
100
100
10
100
Time interval (# of days)
1000
1
10
100
Time interval (# of days)
1000
50
50
0
-300
-200
-100
0
100
Time interval (# of days)
200
300
0
-300
-200
-100
0
100
Time interval (# of days)
200
Figure 3: Distribution of the absolute time interval
between building housing trust and mentorship
under two conditions (housing trust established first
and mentorship established first) in four servers: A.
Guk; B. Antonia Bayle; C. The Bazaar; D. Nagafen.
300
Figure 2: Distribution of the time interval between
building housing trust and mentorship. A. Guk; B.
Antonia Bayle; C. The Bazaar; D. Nagafen.
execution, it lacks in preserving context. Our decision to
use linked data and RDF technology is to demonstrate the
advantage of retaining such information. By standardizing
and connecting the data, we can unlock potential relational
qualities at an impressively large scale while keeping the
context of the data intact. This is particularly important
when studying human activity and task completion as
neither is accomplished context-free, relationship-free or
independent from the social situation (Feld 1981).
relationship, as shown in Figure 2. We found that the
distributions on both sides followed similar patterns. Most
of the time intervals were very small (less than ten days).
The distribution roughly followed a power-law distribution
when the time interval is less than 100 days. Then the
probability dropped very fast after 100 days, following a
power-law with a much steeper slope (Figure 3). However,
there are still a small number of housing trust or
mentorship relationships established half a year later than
the other one. In all four servers, the distributions under
both conditions are very similar to each other, without
distinct difference observed. Therefore we could not
identify a clear causality between the establishment of
housing trust and mentorship relationship. Instead, both
relationships have been making impact upon each other at
a macroscopic level. From a microscopic view, in some
pairs, housing trust was built first, and in some pairs, vice
versa.
It is with enthusiasm that we accepted the opportunity to
use the YarcData uRIKA technology in order to
accomplish our study. The system’s powerful graph
analytics hardware platform, large shared memory and
massive multithreading allows us to query large amounts
of our data without writing or creating special purpose
tools. By just knowing SPARQL, we can instantly start to
explore and analyze our data. Without this, we would
waste large amounts of time testing, writing, and using
special purpose programs which would have to recreated
for each and ever query we had. Many of the queries we
wished to ask would not be possible on distributed memory
architecture, as there is no real way to partition the data.
Why Semantics and YarcData?
From a computational standpoint, it is very difficult to
create algorithms and software that can scale for extremely
large graphs. Many basic graph algorithms are exponential
22
Conclusion and Further Discussion
References
While we had great success using the YarcData uRIKA
system, we still only got to use a small fraction of the data
we had access to. It was impossible to do any real queries
on the experience network (the largest network in the data
we have) as was our original plan. Even on the subset of
data we chose, many queries we tried or wanted to use
were impossible to run, as the machine ran out of memory,
or simply crashed. We wished to understand more about
the actually users behind the characters (like where the
lived, what sex they were, etc), but any queries that
included this demographic network failed to run. We also
had more queries that would find the average money,
experience, etc of characters we studied, but these queries
as well failed to run. Even with the subset of our data we
used, we ran against the limits of the system.
Likewise, some of the social network analysis we
wanted to run was impossible to write or do in the
SPARQL query language. Things like finding the shortest
path between two nodes, or finding the centrality of a
network. We feel that having extensions to the YarcData
uRIKA system that goes beyond or extends the SPARQL
query language would be greatly used and needed.
In future we would like to continue this work over more
data, and further test the hypotheses’ we created in this
study. As said before, this dataset has only been explored
in small pieces, and we believe there is much more to be
found in it.
Ahmad, M. A., Poole, M. S., & Srivastava, J. (2010, August).
Network Exchange in Trust Networks. In Social Computing
(SocialCom), 2010 IEEE Second International Conference on
(pp. 341-346). IEEE.
Ahmad, M. A., Keegan, B., Williams, D., Srivastava, J., &
Contractor, N. (2011, May). Trust amongst rogues? a hypergraph
approach for comparing clandestine trust networks in mmogs.
In Proceedings of Fifth International AAAI Conference on
Weblogs and Social Media (ICWSM 2011) (pp. 17-21).
Ahmad, M. A. (2012). Computational Trust in Multiplayer
Online Games(Doctoral dissertation, University of Minnesota,
2012. Major: Computer science.).
Boyd, D. (2008). How can qualitative internet researchers define
the boundaries of their projects: A response to Christine
Hine. Internet inquiry: Conversations about method, 26-32.
Boyd, D., Marwick, A., Aftab, P., & Koeltl, M. (2009). The
conundrum of visibility: Youth safety and the Internet.
Emirbayer, M., & Goodwin, J. (1994). Network analysis, culture,
and the problem of agency. American journal of sociology, 14111454.
Feld, S. L. (1982). Social structural determinants of similarity
among associates. American Sociological Review, 797-801.
Golbeck, J. 2008. Weaving a web of trust. Science, 321(5896),
1640-1641.
Acknowledgements
Huang, Y., Zhu, M., Wang, J., Pathak, N., Shen, C., Keegan, B.,
... & Contractor, N. (2009, August). The formation of taskoriented groups: Exploring combat activities in online games. In
Computational Science and Engineering, 2009. CSE'09.
International Conference on (Vol. 4, pp. 122-127). IEEE.
We would like to thank the researchers at Northwestern
University’s Science of Networks in Communities
(SONIC) research group and the other members of the
Virtual Worlds Observatory. The Virtual Worlds
Observatory is funded in part by the National Science
Foundation (Grant No. CNS-1010904, OCI-0904356, &
IIS-0841583) and the Army Research Lab (W911NF-0902-0053). In particular we would like to thank Dora Cai at
the University of Illinois at Urbana-Champaign for her
help with the EQ2 datasets.
The work in this paper was also supported in part by the
DARPA SMISC program and in part by the Army
Research Laboratory's Network Science Collaborative
Technology Alliance. The opinions in this paper do not
necessarily reflect the views of these sponsors.
Ratan, R. A., Chung, J. E., Shen, C., Williams, D., & Poole, M. S.
(2010). Schmoozing and smiting: Trust, social institutions, and
communication patterns in an MMOG. Journal of Computerā€
Mediated Communication, 16(1), 93-114.
Rheingold, H. (2002). Smart mobs: The new social revolution.
Perseus Publishing.
Turkle, S. (2011). Life on the Screen. Simon & Schuster.
23