presentation

advertisement
INDRA - A Distributed In-memory
Cache for Online Social Networks
Long Kai
Anjali Sridhar
Sreeram Kannan
Siva Theja Maguluri
Motivation – “big multi-get”
• Memcached
– In-memory distributed hash table service used in
Facebook
• 400k connections to any Memcached server
– Estimated 5 GB memory is required to maintain
TCP connections
– Replace TCP with UDP
• High Communication overhead
Related Work - SPAR
• In the consistent storage(SPAR):
– Scalability: on average 7 copies are stored
– System flexibility: confined to multi-get
applications of small data items
– Algorithmic flexibility: inefficient dynamic
adaptation of new usage pattern
– Reliability: complicated failure recovery
mechanism
– Load-balancing
J.M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez. The little engine (s) that could: Scaling online
social networks. InACM SIGCOMM Computer Communication Review ,volume 40, pages 375–386. ACM, 2010.
Indra
CLIENT
INDRA SERVER
(MEMORY-CACHE)
CONSISTENT STORAGE
Guarantees:
1. Most recent copy is
present in the primary.
2. Secondary copies are
eventually consistent.
Design principles:
1. Reliance on the consistent
storage for failure
recovery.
2. Idempotent operations
only
Advantages
• Modularity: Data reliability is decoupled from the
partition and replication algorithm module.
• Flexibility: Use of caching and eviction at individual
servers to dynamically adapt to new usage patterns.
Algorithm
• Indra objectives
– To place friends’ data together
– To replicate popular data items
– Based on access log
• Weighted graph; weights denote joint access frequency
e
a
d
c
c
Placement
b
d
Replication
f
Mathematical Model
Collocate user
with friends
Maximize
Balance the load
among servers
Minimize
replicas
Collocation Server Load
Replication
Cost
Gain
Cost
Among all
• Placement plans
• Replication plans
Problem incorporates several NP-Hard problems!
Problem Decomposition
Key Idea:
Separate the problem into two simpler problems!
Partitioning
Problem
Replication
Problem
Maximize
Collocation Server Load
Gain
Cost
Among all
Placement plans
Maximize
Collocation
Replication
Gain
Cost
Replication plans
Among all
Online Algorithm
• Current state of system
• User 𝑔 arrives
• Compare the two
possible placements
h
e
a
d
c
c
d
– Assigned to server 1
• Replicated at server 2
• User ℎ arrives
– Placement trades off
collocation gain and load
balancing cost
b
g
g
g
f
Evaluation
• Data Set
– Random Walk on Facebook Data Set from Max Planck
Institute for Software Systems
– 6373 vertices and 183,734 edges; average 28.83 neighbors
• Metrics
– Number of Connections
– Number of TCP packets
• Experimental Setup
–
–
–
–
–
10 / 15 Servers,
Consistent Storage interface,
Offline algorithm
1000 read requests
Tcpdump and Wireshark for bandwidth analysis
Bandwidth : Indra Vs Random
50000
46757
45000
38893
40000
35000
30000
Number of
25000
TCP
Packets
20000
Indra
Random
15538
15000
12799
Replication Factor:
10 Servers: 1.2
15 Servers :1.8
10000
5000
0
10
15
Number of Servers
Trade off between Replication and Connections
Contributions
• Proposed In-Memory Distributed Cache
– Takes advantage of data access relationships
– Retrieves small data items in Online Social
Networks
– Uses Dynamic Partition and Replication Algorithm
• Results show
– Factor of 4 decrease in the number of TCP packets
– Can trade off Replication for number of
connections
Thanks!
Download