social networks - The University of Texas at Dallas

advertisement

Knowledge Management,

Semantic Web and

Social Networking

Social Networks

Dr. Bhavani Thuraisingham

June 2010

OUTLINE OF PART I

What are Social Networks

Social Network Views: Science, Technology, Culture

Social Network Concepts

Social Networks and Knowledge Management

Social Networks and Semantic Web

Applications

Directions

References:

 ce.sharif.edu/~m_jamali/resources/WI06_SNA.ppt (WI 2006) ic.ucsc.edu/~wsack/fdm20c/fall2008/Lectures/social-networks.ppt

SOCIAL NETWORKS

HTTP://WWW.FLAIRANDSQUARE.COM/ARCHIVES/167

A social network site allows people who share interests to build a ‘trusted’ network/ online community. A social network site will usually provide various ways for users to interact, such as IM

(chat/ instant messaging), email, video sharing, file sharing, blogging, discussion groups, etc.

The main types of social networking sites have a ‘theme’, they allow users to connect through image or video collections online (like Flicker or You Tube) or music (like My Space, lastfm). Most contain libraries/ directories of some categories, such as former classmates, old work colleagues, and so on (like

Face book, friends reunited, Linked in, etc). They provide a means to connect with friends (by allowing users to create a detailed profile page), and recommender systems linked to trust.

POPULAR SOCIAL NETWORKS

Face book - A social networking website. Initially the membership was restricted to students of Harvard University. It was originally based on what first-year students were given called the “face book” which was a way to get to know other students on campus. As of July 2007, there over 34 million active members worldwide. From September 2006 to September 2007 it increased its ranking from 60 to 6th most visited web site, and was the number one site for photos in the United States.

Twitter- A free social networking and micro-blogging service that allows users to send “updates” (text-based posts, up to 140 characters long) via SMS, instant messaging, email, to the Twitter website, or an application/ widget within a space of your choice, like MySpace, Facebook, a blog, an RSS

Aggregator/reader.

My Space - A popular social networking website offering an interactive, usersubmitted network of friends, personal profiles, blogs, groups, photos, music and videos internationally. According to AlexaInternet, MySpace is currently the world’s sixth most popular English-language website and the sixth most popular website in any language, and the third most popular website in the

United States, though it has topped the chart on various weeks. As of

September 7, 2007, there are over 200 million accounts.

SOCIAL NETWORKS:

INTERDISICPLINARY FIELD

 social network analysis is an interdisciplinary social science;

Sociologists, computer scientists, physicists and mathematicians have made large contributions to understanding networks in general (as graphs) and thus contributed to an understanding of social networks

[Social network analysis] is grounded in the observation that social actors

[i.e., people] are interdependent and that the links [i.e., relationships] among them have important consequences for every individual [and for all of the individuals together]. ... [Relationships] provide individuals with opportunities and, at the same time, potential constraints on their behavior.

... Social network analysis involves theorizing, model building and empirical research focused on uncovering the patterning of links among actors. It is concerned also with uncovering the antecedents and consequences of recurrent patterns. (from Linton C. Freeman)

SOCIAL NETWORKS: HISTORY

“Sociograms” were invented in 1933 by Moreno.

In a sociogram, the actors are represented as points in a two-dimensional space. The location of each actor is significant. E.g. a “central actor” is plotted in the center, and others are placed in concentric rings according to “distance” from this actor.

Actors are joined with lines representing ties, as in a social network. In other words a social network is a graph, and a sociogram is a particular 2D embedding of it.

These days, sociograms are rarely used (most examples on the web are not sociograms at all, but networks). But methods like MDS (Multi-Dimensional

Scaling) can be used to lay out Actors, given a vector of attributes about them.

Social Networks were studied early by researchers in graph theory (Harary et al.

1950s). Some social network properties can be computed directly from the graph.

Others depend on an adjacency matrix representation (Actors index rows and columns of a matrix, matrix elements represent the tie strength between them).

SOCIAL NETWORKS AS TECHNOLOGY

 email, newsgroups, and weblogs

 search engines: e.g., Google (http://google.com)

Google’s Page Rank algorithm gives more weight to popular webpages.

A webpage is considered popular if many other webpages link to it.

collaborative filtering and/or recommender systems; e.g., amazon.com’s feature: “People who bought this book also bought...”

TECHNOLOGY : LINKEDIN

What is Your Network?

When your connections invite their connections, your Network starts to grow.

Your Network is your connections, their connections, and so on out from you at the center.

How do you classify users?

Your Network contains professionals out to “three degrees” — that is, friendsof-friends-of-friends. If each person had 10 connections (and some have many more) then your network would contain 10,000 professionals.

How do you see who is in your Network?

LinkedIn lets you see your network as one large group of searchable professional profiles.

SOCIAL NETWORKS AS

POPULAR CULTURE

 e.g., six degrees of kevin bacon

 bacon number: definition http://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_B acon

 kevin bacon has a bacon number of 0

 an actor, A, has a bacon number of 1 if s/he appeared in a movie with kevin bacon

 an actor, B, has a bacon number of 2 if s/he appear in a movie with A

. social software; e.g., facebook, friendster, orkut,

SOCIAL NETWORKS:

MORE FORMAL DEFINITION

A structural approach to understanding social interaction.

Networks consist of Actors and the

Ties between them.

We represent social networks as graphs whose vertices are the actors and whose edges are the ties.

Edges are usually weighted to show the strength of the tie.

In the simplest networks, an Actor is an individual person.

A tie might be “is acquainted with”. Or it might represent the amount of email exchanged between persons A and B.

SOCIAL NETWORK EXAMPLES

Effects of urbanization on individual wellbeing

World political and economic system

Community elite decision-making

Social support, Group problem solving

Diffusion and adoption of innovations

Belief systems, Social influence

Markets, Sociology of science

Exchange and power

Email, Instant messaging, Newsgroups

Co-authorship, Citation, Co-citation

SocNet software, Friendster

Blogs and diaries, Blog quotes and links

SOCIAL NETWORKS BASIC QUESTIONS

Balance: important in exchange networks

In a two-person network (dyad), exchange of goods, services and cash should be balanced.

More generally, exchanges of “favors” or “support” are likely to be quite balanced.

Role: what role does the actor perform in the network?

Role is defined in terms of Actors’ neighborhoods.

The neighborhood is the set of ties and actors connected directly to the current actor.

Actors with similar or identical neighborhoods are assigned the same role.

What is the related idea from semiotics?

Paradigm: interchangability. Actors with the same role are interchangable in the network.

SOCIAL NETWORKS BASIC QUESTIONS

Prestige: How important is the actor in the network?

Related notions are status and centrality.

Centrality reifies the notion of “peripheral vs. central participation” from communities of practice.

Key notions of centrality were developed in the 1970’s, e.g.

“eigenvalue centrality” by Bonacich.

Most of these measures were rediscovered as quality measures for web pages:

Indegree

Pagerank = eigenvalue centrality

HITS ?= two-mode eigenvalue centrality

SOCIAL NETWORK CONCEPTS

Actor

An “actor” is a basic component for SNs. Actors can be:

Individual people, Corporations, Nation-States, Social groups

Modes

If all the actors are of the same type, the network is called a onemode network. If there are two groups of actor then it is a twomode network.

E.g. an affiliation network is a two-mode network. One mode is individuals, the other is groups to which they belong. Ties represent the relation: person A is a member of group B.

Ties

A tie is the relation between two actors. Common types of ties include:

Friendship, Amount of communication, Goods exchanged, Familial relation (kinship), Institutional relations

PRACTICAL ISSUES: BOUNDARIES AND

SAMPLES

Because human relations are rich and unbounded, drawing meaningful boundaries for network analysis is a challenge.

There are two main approaches:

Realist: boundaries perceived by actors themselves, e.g. gang members or ACM members.

Nominalist: Boundaries created by researcher: e.g. people who publish in ACM CHI.

To deal with large networks, sampling is necessary. Unfortunately, randomly sampled graphs will typically have completely different structure. Why?

One approach to this is “snowballing”. You start with a random sample. Then extend with all actors connected by a tie. Then extend with all actors connected to the previous set by a tie…

THE WEB AS A SOCIAL NETWORK

Social networks are formed between Web pages by hyperlinking to other Web pages.

A hyperlink is usually an explicit indicator that one

Web page author believes that another page is related or relevant.

The possibility to publish and gather personal information, a major factor in the success of the Web

Two Major Tasks

Social Network Extraction from the Web

Social Network Analysis

Social Networking Services (SNS).

Friendster; Orkut

INFERRING COMMUNITIES IN WEB

Bibliographic Metrics

 bibliographic coupling

 co-citation coupling

BLOGSPHERE AS A SOCIAL NETWORK

Weblogs have become prominent social media on the Internet that enable users to quickly and easily publish content including highly personal thoughts.

Bloggers might list one another’s blogs in a Blogroll and might read, link to a post , or comment on other blogs’ posts (A post is the smallest part of a blog which has some contents and readers can comment on it. A post also has a date of publish).

SEMANTIC WEB AND SOCIAL NETWORK

Semantic Web: having data on the Web defined and linked in a way that it can be used by people and processed by machines in a ”wide variety of new and exciting applications”

SW and SN models support each other:

Semantic Web enables online and explicitly represented social information

 social networks, especially trust networks, provide a new paradigm for knowledge management in which users

”outsource” knowledge and beliefs via their social networks

SEMANTIC WEB AND SOCIAL NETWORK

Drawbacks to Centralized Social Networks

 the information is under the control of the database owner

 centralized systems do not allow users to control the information they provide on their own terms

The friend-of-a-friend(FOAF) project is a first attempt at a formal, machine processable representation of user profiles and friendship networks.

The Swoogle Ontology Dictionary shows that the class foaf:Person currently has nearly one million instances spread over about 45,000

Web documents.

The FOAF ontology is not the only one used to publish social information on the Web.

For example, Swoogle identifies more than 360 RDFS or OWL classes defined with the local name ”person”.

SW AND SNA (ISSUES)

Knowledge representation.

Small number of common ontologies

Knowledge management.

 efficient and effective mechanisms for accessing knowledge, especially social networks, on the Semantic Web

Social network extraction, integration and analysis

 extracting social networks correctly from the noisy and incomplete knowledge on the (Semantic) Web

Provenance and trust aware distributed inference.

 manage and reduce the complexity of distributed inference by utilizing provenance of knowledge

SOCIAL NETWORKS AND KMS

Why Social Networks in

KMS?

People

Technology

KM

Organization

Processes

Knowledge Management involves people, technology, and processes in

Overlapping parts.

SOCIAL NETWORKS AND KMS

Why are we studying

Social Networks ?

Social

Networks

What ties Information Architecture,

Knowledge Management and

Social Network Analysis more closely together is the reciprocal relationship between people and content.

Information

Architecture

Knowledge

Management

Systems

SOCIAL NETWORK ANALYSIS

Social network analysis [SNA] is the mapping and measuring of relationships and flows between people, groups, organizations, computers or other information/knowledge processing entities.

The nodes in the network are the people and groups while the links show relationships or flows between the nodes.

SOCIAL NETWORK ANALYSIS (SNA)

We measure Social Network in terms of:

1. Degree Centrality:

The number of direct connections a node has. What really matters is where those connections lead to and how they connect the otherwise unconnected.

2. Betweenness Centrality:

A node with high betweenness has great influence over what flows in the network indicating important links and single point of failure.

3. Closeness Centrality:

The measure of closeness of a node which are close to everyone else.

The pattern of the direct and indirect ties allows the nodes any other node in the network more quickly than anyone else. They have the shortest paths to all others.

Application of SNA: Building the 9/11 Al- Qaeda Network .

DIRECTIONS

Reduce Complexity

Geo-social networks

Integrating concepts from semantic web, social network, and knowledge management

Geo-social semantic web

Visualizing social networks

Security and Privacy

Mining and analysis of social networks

Predicting what the memebrs would do next

OUTLINE OF PART II

Social Networks

Social Networks and 9/11 Terrorists

Social Networks and Baseball Drug Use

Social Networks and Expert Finder

SOCIAL NETWORKS

HTTP://WWW.FLAIRANDSQUARE.COM/ARCHI

VES/167

A social network site allows people who share interests to build a

‘trusted’ network/ online community. A social network site will usually provide various ways for users to interact, such as IM (chat/ instant messaging), email, video sharing, file sharing, blogging, discussion groups, etc.

The main types of social networking sites have a ‘theme’, they allow users to connect through image or video collections online (like Flicker or You Tube) or music (like My Space, lastfm). Most contain libraries/ directories of some categories, such as former classmates, old work colleagues, and so on (like Face book, friends reunited, Linked in, etc).

They provide a means to connect with friends (by allowing users to create a detailed profile page), and recommender systems linked to trust.

SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS

(WWW.ORGNET.COM)

Early in 2000, the CIA was informed of two terrorist suspects linked to al-Qaeda.

Nawaf Alhazmi and Khalid Almihdhar were photographed attending a meeting of known terrorists in Malaysia. After the meeting they returned to Los Angeles, where they had already set up residence in late 1999.

SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS

What do you do with these suspects? Arrest or deport them immediately?

No, we need to use them to discover more of the al-Qaeda network.

Once suspects have been discovered, we can use their daily activities to uncloak their network. Just like they used our technology against us, we can use their planning process against them. Watch them, and listen to their conversations to see...

•who they call / email

•who visits with them locally and in other cities

•where their money comes from

The structure of their extended network begins to emerge as data is discovered via surveillance.

SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS

A suspect being monitored may have many contacts -- both accidental and intentional. We must always be wary of 'guilt by association'. Accidental contacts, like the mail delivery person, the grocery store clerk, and neighbor may not be viewed with investigative interest.

Intentional contacts are like the late afternoon visitor, whose car license plate is traced back to a rental company at the airport, where we discover he arrived from Toronto (got to notify the

Canadians) and his name matches a cell phone number (with a Buffalo, NY area code) that our suspect calls regularly. This intentional contact is added to our map and we start tracking his interactions -- where do they lead? As data comes in, a picture of the terrorist organization slowly comes into focus.

How do investigators know whether they are on to something big? Often they don't. Yet in this case there was another strong clue that Alhazmi and Almihdhar were up to no good -- the attack on the

USS Cole in October of 2000. One of the chief suspects in the Cole bombing [Khallad] was also present [along with Alhazmi and Almihdhar] at the terrorist meeting in Malaysia in January 2000.

Once we have their direct links, the next step is to find their indirect ties -- the 'connections of their connections'. Discovering the nodes and links within two steps of the suspects usually starts to reveal much about their network. Key individuals in the local network begin to stand out. In viewing the network map in Figure 2, most of us will focus on Mohammed Atta because we now know his history.

The investigator uncloaking this network would not be aware of Atta's eventual importance. At this point he is just another node to be investigated.

Figure 2 shows the two suspects and

SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS

Figure 2 shows the two suspects and

Atta's eventual importance. At this point he is just another node to be investigated.

SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS

SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS

We now have enough data for two key conclusions:

All 19 hijackers were within 2 steps of the two original suspects uncovered in 2000!

Social network metrics reveal Mohammed Atta emerging as the local leader

With hindsight, we have now mapped enough of the 9-11 conspiracy to stop it. Again, the investigators are never sure they have uncovered enough information while they are in the process of uncloaking the covert organization. They also have to contend with superfluous data.

This data was gathered after the event, so the investigators knew exactly what to look for.

Before an event it is not so easy.

As the network structure emerges, a key dynamic that needs to be closely monitored is the activity within the network. Network activity spikes when a planned event approaches. Is there an increase of flow across known links? Are new links rapidly emerging between known nodes?

Are money flows suddenly going in the opposite direction? When activity reaches a certain pattern and threshold, it is time to stop monitoring the network, and time to start removing nodes.

The author argues that this bottom-up approach of uncloaking a network is more effective than a top down search for the terrorist needle in the public haystack -- and it is less invasive of the general population, resulting in far fewer "false positives".

Figure 2 shows the two suspects and

SOCIAL NETWORK ANALYSIS OF STEROID USAGE IN

BASEBALL (WWW.ORGNET.COM)

When the Mitchell Report on steroid use in Major League Baseball [MLB], was published, people were surprised at who and how many players were mentioned. The diagram below shows a human network created from data found in the Mitchell Report. Baseball players are shown as green nodes. Those who were found to be providers of steroids and other illegal performance enhancing substances appear as red nodes. The links reveal the flow of chemicals -from provider to player.

SOCIAL NETWORKING FOR KNOWLEDGE MANAGEMENT

EXAMPLES

WWW.ORGNET.COM

Managing the 21st Century Organization

Networks of Adaptive/Agile Organizations

Best Practice: Organizational Network Mapping

Discovering Communities of Practice

Data-Mining E-mail

Finding Leaders on your Team

Post-Merger Integration

Knowledge Sharing in Organizations

Innovation happens at the Intersections

Partnerships and Alliances in Industry

Decision-Making in Organizations

New Organizational Structures

Figure 2 shows the two suspects and

KNOWLEDGE SHARING NETWORK: FINDING EXPERTS

(WWW.ORGNET.COM)

Organizational leaders are preparing for the potential loss of expertise and knowledge flow due to turnover, downsizing, outsourcing, and the coming retirements of the baby boom generation. The model network (previous chart) is used to illustrate the knowledge continuity analysis process.

Each node in this sample network (previous chart) represents a person that works in a knowledge domain. Some people have more / different knowledge than others. Employees who will retire in 2 years or less have their nodes colored red. Those who will retire in 3-4 years are colored yellow.

Those retiring in 5 years or later are colored green.

A gray, directed line is drawn from the seeker of knowledge to the source of expertise. A-->B indicates that A seeks expertise / advice from B. Those with many arrows pointing to them are sought often for assistance.

The top subject matter experts -- SMEs -- in this group are nodes 29, 46, 100, 41, 36 and 55.

The SMEs were discovered using a network metric in InFlow that is similar to how the

Google search engine ranks web pages -- using both direct and indirect links.

Of the top six SMEs in this group, half are colored red[100] or yellow[46, 55]. The loss of person 46 has the greatest potential for knowledge loss. 90% of the network is within

3 steps of accessing this key knowledge source.

Figure 2 shows the two suspects and

KNOWLEDGE SHARING IN ORGANIZATIONS: FINDING

EXPERTS

OTHER APPLICATIONS

Detecting coalitions and subgroups

Conducting a political campaign

Marketing a drug by a pharmaceutical company

Forming a travel network

Many more - - - - -

OUTLINE OF PART IV

Introduction to Social Networks

Properties of Social Networks

Social Network Analysis Basics

Examples

Data Privacy Basics

Privacy and Social Networks

SOCIAL NETWORKS

Social networks have important implications for our daily lives.

Spread of Information

Spread of Disease

Economics

Marketing

Social network analysis could be used for many activities related to information and security informatics.

Terrorist network analysis

ENRON SOCIAL GRAPH*

* http://jheer.org/enron/

SOCIAL NETWORKS

ROMANTIC RELATIONS AT “JEFFERSON HIGH SCHOOL”

“SMALL-WORLD” EXAMPLE: SIX DEGREES OF

KEVIN BACON

SOCIAL NETWORK MINING

Social network data is represented a graph

Individuals are represented as nodes

Nodes may have attributes to represent personal traits

Relationships are represented as edges

Edges may have attributes to represent relationship types

Edges may be directed

Common Social Network Mining tasks

Node classification

Link Prediction

GRAPH MODEL

Lindamood et al. 09 &

Heatherly et al. 09

Graph represented by a set of homogenous vertices and a set of homogenous edges

Each node also has a set of Details, one of which is considered private .

COLLECTIVE INFERENCE

Lindamood et al. 09 &

Heatherly et al. 09

Collection of techniques that use node attributes and the link structure to refine classifications.

Uses local classifiers to establish a set of priors for each node

Uses traditional relational classifiers as the iterative step in classification

RELATIONAL CLASSIFIERS

Lindamood et al. 09 &

Heatherly et al. 09

Class Distribution Relational Neighbor

Weighted-Vote Relational Neighbor

Network-only Bayes Classifier

Network-only Link-based Classification

EXPERIMENTAL DATA

Lindamood et al. 09 &

Heatherly et al. 09

167,000 profiles from the Facebook online social network

Restricted to public profiles in the

Dallas/Fort Worth network

Over 3 million links

GENERAL DATA PROPERTIES

Lindamood et al. 09 &

Heatherly et al. 09

Diameter of the largest component

Number of nodes

Number of friendship links

Total number of listed traits

Total number of unique traits

Number of components

Probability Liberal

Probability Conservative

16

167,390

3,342,009

4,493,436

110,407

18

.45

.55

INFERENCE METHODS

Lindamood et al. 09 &

Heatherly et al. 09

Details only: Uses Naïve Bayes classifier to predict attribute

Links Only: Uses only the link structure to predict attribute

Average: Classifies based on an average of the probabilities computed by Details and Links

PREDICTING PRIVATE DETAILS

Lindamood et al. 09 &

Heatherly et al. 09

Attempt to predict the value of the political affiliation attribute

Three Inference Methods used as the local classifier

Relaxation labeling used as the Collective

Inference method

REMOVING DETAILS

Lindamood et al. 09 &

Heatherly et al. 09

Ensures that no ‘false’ information is added to the network, all details in the released graph were entered by the user

Details that have the highest global probability of indicating political affiliation removed from the network

REMOVING LINKS

Lindamood et al. 09 &

Heatherly et al. 09

Ensures that the link structure of the released graph is a subset of the original graph

Removes links from each node that are the most like the current node

MOST LIBERAL TRAITS

Trait Name Trait Value

Group

Group

Group

Group

Group

Group

Group legalize same sex marriage every time i find out a cute boy is conservative a little part of me dies equal rights for gays the democratic party not a bush fan people who cannot understand people who voted for bush government religion disaster

Lindamood et al. 09 &

Heatherly et al. 09

Weight Liberal

46.16066789

39.68599463

33.83786875

32.12011605

31.95260895

30.80812425

29.98977927

Trait Name

Group

Group

Group

Group

Group

Group

Group

Group

Group

MOST CONSERVATIVE TRAITS

Lindamood et al. 09 &

Heatherly et al. 09

Trait Value george w bush is my homeboy college republicans texas conservatives bears for bush kerry is a fairy aggie republicans keep facebook clean i voted for bush protect marriage one man one woman

Weight Conservative

45.88831329

40.51122488

32.23171423

30.86484689

28.50250433

27.64720818

23.653477

23.43173116

21.60830487

MOST LIBERAL TRAITS PER TRAIT NAME

Lindamood et al. 09 &

Heatherly et al. 09

Trait Name activities

Employer favorite tv shows grad school hometown

Relationship Status religious views looking for

Trait Value amnesty international hot topic queer as folk computer science mumbai in an open relationship agnostic whatever i can get

Weight Liberal

4.659100601

2.753844959

9.762900035

1.698146579

3.566007713

1.617950632

3.15756412

1.703651985

EXPERIMENTS

Lindamood et al. 09 &

Heatherly et al. 09

Conducted on 35,000 nodes which recorded political affiliation

Tests removing 0 details and 0 links, 10 details and 0 links, 0 details and 10 links, and 10 details and 10 links

Varied Training Set size from 10% of available nodes to 90%

Results are documented in papers

Download