Creating Models of Real-World Communities with ReferralWeb Henry Kautz Bart Selman

advertisement
Creating Models of Real-World
Communities with ReferralWeb
Henry Kautz
University of Washington
Bart Selman
Cornell University
Recommender Systems
New category of software: programs that make
personalized recommendations of goods,
services, and people
• Amazon.com - books
• Jango.com - stores
• Whowhere.com - friends
Current methods
Content-based: find things similar to ones you like
Collaborative-filtering: find things liked by people
who are similar to you
Explosive growth
• Viewed as crucial for e-commerce sites
• Excite: 100,000,000+ recommendations per day!
2
Anonymous Opinions
Most recommender systems hide the identity of
the sources of the recommendations
• E-communities: fictitious identities
• Matchmaker systems: deliberately hide true
identities
• Collaborative filtering: aggregation - no one to trust
(or blame!)
Result: anonymous opinions
• Okay choosing a movie or CD
• But would you bet your job on that
“recommendation”?!
– Gee, boss, the project failed, but somebody on the
net, I don't know who, said it was a good approach...
3
Trusted Recommendations
For serious life / business decisions, you want
the opinion of a trusted expert
• If an expert not personally known, then want to find
a reference to one via a chain of friends and
colleagues
Referral-chain provides:
• Way to judge quality of expert's advice
• Reason for the expert to respond in a trustworthy
manner
Finding good referral-chains is slow, timeconsuming, but vital
• business gurus on “networking”
4
Example Tasks
• You are an associate editor for JAIR. Find a
reviewer for a paper that claims new results on
“expander graphs”.
• You are considering transferring to a different
division of your company. Is that division
head a good guy to work for?
• You are putting together a project team to
launch a new internet service. Who in your
company should you tap for expertise on
image compression?
5
ReferralWeb
Set of all possible referral-chains = a social
network
System for modeling, visualizing, and searching
social networks
• in a company
• in an e-community
• in the WWW as a whole
Integrates IR search with a model of personal
connections
6
Social Networks
Social network model specifies:
• Who knows who
• Who knows what
How to create?
• Ask users to register with system and provide lists
of contacts and interests
– sixdegrees, 6DOS, Firefly, Whowhere?
• High startup cost
• Incomplete, out of date, untrustworthy information
• Best experts will actively avoid
– a network of the lonely and disenfranchised?
7
Mining Social Networks
Alternative: automatically generate network
models from pre-existing data
• Email logs (not)
• Bibliographic databases
• Corporate records of organizational structure,
project teams, in-house documents
• Arbitrary web pages
– personal web pages more accurate / up to date than
official corporate records!
Can extract evidence for both relationships and
expertise
8
Discovering Names
Proper name extraction
• Can accurately identify names in arbitrary
documents
• Frequency of co-occurrence of names can be
quickly determined using IR search engines
Canonizing names
• John Zack, J. C. Zack, Jim Zack
• Match names / initials / nicknames as long as
unambiguous
– closed world assumption
• Improvement: use context
– “Henry A. Kautz” matches “Harry A. Kautz”
if both strongly linked to “Bart Selman”
9
Disambiguating Names
Problem: different individuals with the same
name
Observation: Within even large organizations
the vast majority (90%+) of full names are
unique
• 3,000 employees in R&D at AT&T
• 10,000 research scientists in AI, NL, and theory
For medium size networks - considered as noise
• Key interface issue: ability to explain each link in
path to users
Further scaling: name + additional context
10
User Profiles
Manually-entered profiles incomplete,
impossible to maintain
• impossible in principle to create complete a-priori
list of kinds of expertise
Many services today create highly specialized
profiles
• your book buying habits
Simple, robust profile: “bag of words” of all
documents in which your name appears
• standard IR vector space model to match queries,
people
11
Test Networks
1. Proof of concept: 1,000 node network
• Created by combination of web crawling and Altavista
queries, centered on a professor at M.I.T.
• Test group of users could usually find experts on
given topics
– but small size of network led to distant referrals
2. 10,000 Researchers in AI, Theory, and NL
• Based on 30,000 bibliography entries from high-quality
conferences
– AAAI, STOC, FOCS, ACL...
• links between co-authors (not citations)
• http://www.research.att.com/kautz/referralweb
• “paper-reviewer finder”
12
Exploring the Network
13
Who can I ask to review a paper on “expander graphs”?
14
Experts on Expander Graphs
15
Paths to Experts
16
Request Details on Frieze
17
Frieze Home Page
18
Observations
Quickly found short chains to experts
• Could not be found using IR search alone
User can select chain that is most likely to
succeed
• Do not want to bother busiest, most famous
experts with every request
Chains cross disciplines
•
•
•
•
Kautz - AI
Kearns - AI, Machine Learning
Blum - Machine Learning, Theory
Frieze - Theory, Mathematics
Useful tool for strengthening ties both within
and between communities
19
Why Does it Work?
The Small World Phenomena
Milgram (1967) - any two individuals in the U.S.A. are
linked by a chain of 6 or fewer first-name
acquaintances
– “6 degrees of separation”
– Erdös numbers
– “6 degrees of Kevin Bacon”
But
• No formal model to explain short paths!
• Due to high average degree?
• True for acquaintances or co-stars, but false for our
computer science co-author database!
– 100’s versus 61 versus 4.28!
20
Small-world Networks
Due to randomness?
• Random graphs have short average path lengths
• But social networks are not random
– nodes are highly clustered (many cliques)
– random graph model predicts that high clustering
corresponds to long average paths!
Better model: Small-world networks
• Idea: a highly structured (clustered) network with
just a few random links (Watts & Stogatz, 1998)
• Result: high clustering + short paths!
• Random edges correspond to shortcuts
– direct relationships between people who primarily
participate in different sub-communities
21
Small-world vs. Random Networks
Size
CS Co-authors
Random 1
Film Co-stars
Random 2
Neurons
Random 3
Avg
Degree
Avg Path Clustering
Length
Coeff.
8,070
8,070
4.28
4.28
7.9
6.4
0.72
0.072
225,000
225,000
61
61
3.65
2.99
0.79
0.00027
282
282
14
14
2.66
2.25
0.28
0.05
Clustering Coefficient = Average value of C(n) over all nodes, where
C ( n) 
(number of edges between neighbors of n)
(number of neighbors of n) 2
22
Corporate Communities
Finding good internal experts a strategic
business problem
• “intellectual assets” worthless if not consulted!
AT&T: 170,000 employees, 3,000 in the R&D
community
• How to build a project team?
• What R&D people to consult for a new business
venture?
• What business people to contact about a new
technological breakthrough?
In practice: successful projects based on
grassroots cross-organizational networking
23
Modeling the AT&T Corporate
Network
Model integrates information from
•
•
•
•
Official organizational charts (online)
Personal web pages (+ crawling)
External publication databases
Internal technical document databases
Informal structure will prove vital for
• finding shorter paths to experts
• finding people who can reliably evaluate experts
• synergy between official and unofficial channels
24
Who can tell me about the Director of Speech
Processing research at AT&T?
25
Paths With All Link Types
26
Filtering link types
27
Paths With Only Organizational Links
28
Paths With Only Web/Article Links
29
Observations
Official company hierarchy only a sparse subset
of the corporate social network
Shortest (and often best) paths involve a
combination of official and unofficial links
• Conditions for trust and evaluation may greatly
differ
• Global social network is the union of many different
kinds of sub-networks
Search greatly aided when user can choose different
views of the network
+ types of edge
+ strength of edge
30
Who can help out my project with some great
image compression software?
31
A Note on Believability
Observation: the recommendations made by
(any) recommender system tend to be either
astonishingly accurate, or absolutely
ridiculous
• true for any AI-complete problem
How can a recommender system be trusted
enough for “serious” use?
• Make system transparent: able to explain its
reasoning
• indicate to user where the data is ambiguous
• Any link or node can be explained by viewing the
data on which it is based
32
Checking the Expert’s Expertise
33
Checking the Reason for an Edge
34
Verifying the Edge Context
35
Summary
Many uses of recommender system require
connecting people to people, not just
providing “oracular” advice
• Find people, not just documents - access to
information that may not even be online!
• Help users evaluate quality of information
Need to automatically model existing, real-world
communities
• Cannot require everyone to sign up in advance!
• Can improve and strengthen the “weak ties” that
are crucial for effective organizations
ReferralWeb: a tool for generating and
searching social networks
36
Status and Future Work
ReferralWeb
• Version 2.0 for the Computer Science research
community
http://www.research.att.com/~kautz/referralweb
• Corporate version undergoing trials in AT&T Labs
Current research topics
• Automatic clustering - discovery of subcommunities
• Combining uncertain information
• Scale-up to WWW-size communities
• Analysis of more accurate formal models of smallworld networks
– accurately predict search performance
37
Bibliography
• Kautz, H., Selman, B. & Shah, M. 1997. The
Hidden Web. AI Magazine 18(2): 27-36.
• Milgram, S. 1967. The Small-World Problem.
Psychology Today 1(1): 60-76.
• Resnick, P., ed. 1996. Special Section on
Recommender Systems. Communications of the
ACM 30(3).
• Watts, D. & Stogatz, S. 1998. Collective dynamics
of ‘small-world’ networks. Nature 393: 440-442.
38
Download