SYSTEMS SUPPORT FOR GRAPHICAL LEARNING 9/18/2014 Ken Birman

advertisement
CS6410 Fall 2014
SYSTEMS SUPPORT FOR
GRAPHICAL LEARNING
9/18/2014
Ken Birman
1
Graphical models and applications
2

Artificial intelligence and machine learning is the
core technology in many modern cloud settings
 Support
for social networking mechanisms
 Creating product placement recommendations
 Understanding the flow of “influence” within communities

Graphical processing can also matter in systems
 Understand
what to cache and what not to cache
 Learning common patterns to optimize
CS5412 Spring 2014 (Cloud Computing: Birman)
What makes this hard?
3

Prior generation of solutions was too general
 Programming
languages can do anything, but they
aren’t at all specialized for graph structured data
 Database systems are awesome for tabular data but
much less optimized for graphical data

There is also an issue of scale
 We’re
good at what can be done on one computer
 But a company like Facebook has billions of users and
their infrastructure runs on massive data centers
CS5412 Spring 2014 (Cloud Computing: Birman)
Today’s papers
4

TAO paper (I’ll start with this) gives a sense of the
challenge Facebook confronts
Like an entire distributed operating system
 But the whole role of the solution is to manage graphical
data and support queries against it
 Massive loads and surreal scale


Things to notice?
How does the architecture of the solution reflect the special
environment in which it runs?
 How did they identify and optimize the critical paths?

CS5412 Spring 2014 (Cloud Computing: Birman)
Dryad/LINQ
5

Here we see two concepts combined
 At
Microsoft, LINQ has become very popular
 It embeds a kind of query processing into C# code

Dryad takes this one step further
 Given
a LINQ expression, Dryad can run it on a
distributed “computing engine” of their own design
 Idea is to obtain massive parallelism
CS5412 Spring 2014 (Cloud Computing: Birman)
Basic LINQ concepts
6

LINQ (“language integrated queries”) starts by
allowing you to code lambda expressions
 In-line
functions
 Evaluated when the value is needed, not when defined

For example:
myPets.Select(a => a.name);
myFriends.Select(f => (f.name, f.loc, f.phone.mobile)).
Where(f => distance(myloc, f.loc) < 1miles);
CS5412 Spring 2014 (Cloud Computing: Birman)
How Dryad works
7



Takes a LINQ expression, unevaluated
Maps it to a collection of processor nodes that all
have access to the same (read-only, unchanging)
data files
This spreads out the work and gains parallelism!
CS5412 Spring 2014 (Cloud Computing: Birman)
Basic architecture of Dryad
8
CS5412 Spring 2014 (Cloud Computing: Birman)
Execution of a LINQ expression
9
CS5412 Spring 2014 (Cloud Computing: Birman)
A join, done in two ways
10
CS5412 Spring 2014 (Cloud Computing: Birman)
A join, done in two ways
11
CS5412 Spring 2014 (Cloud Computing: Birman)
MapReduce in Dryad/LINQ
12
CS5412 Spring 2014 (Cloud Computing: Birman)
Beyond Dryad
13

In follow-on work these guys did something called
Naiad…
 In
that paper, they assert that social networking often
comes down to finding fixed points of functions on
graphs
 For
example, “look for poker players who are
physically within a mile of me and are friends of me or
one of my friends”
CS5412 Spring 2014 (Cloud Computing: Birman)
Social network computations
14



They believe that most parallel social networking
computations can be re-expressed as fixed points
In essence, define a function (S) for a set S, then
iterate until (S) = S. This is the fixed point.
They want to compute all the fixed points
concurrently for some very large community
CS5412 Spring 2014 (Cloud Computing: Birman)
Can we really find use cases?
15



All the vehicles on Highway 101 need to
continuously “watch for the vehicles that could cut
me off if they change path”
Define this indirectly too: if truck T changes its
trajectory this way, car C might move that way, and
then C would cut me off, so include T into the set…
The idea is to do all such computations at once!
CS5412 Spring 2014 (Cloud Computing: Birman)
Naiad and Dryad
16

Then they map Naiad onto Dryad
 First
write functions that compute these sets
 Next
express the fixed-point property over functions
 Last,
seed the data set and then run Dryad to iterate
until all the fixed points are found (or until a time-limit
is reached, to cover non-convergent functions)
CS5412 Spring 2014 (Cloud Computing: Birman)
Issue?
17



By the time Naiad is finished, the style of code is
very hard to read, although those who write it find
it pretty natural to work this way
In fact many social networking companies do use
this style of functional programming (like
JaneStreet, famous for using O’CaML for financial
analytics)
But is it systems research?
CS5412 Spring 2014 (Cloud Computing: Birman)
Other major systems in this space
18



Check out
http://en.wikipedia.org/wiki/Graph_database
They list 50 or so graphical databases and
processing systems
Some popular ones in research settings are Pregel
(from Google), GraphLab (CMU) and Vowpal
Wabbit (“Fast Learning”) (Yahoo)
CS5412 Spring 2014 (Cloud Computing: Birman)
Take away?
19

Computer systems need to be responsive to
 Styles
of use (what our “customers” are doing)
 Common patterns of load (optimize for this case)


In today’s major cloud computing settings, graphical
data and graphical learning solutions are becoming
a highly dominant form of load and focus
Computer systems need to evolve to track this need
CS5412 Spring 2014 (Cloud Computing: Birman)
Download