SDIMS: A Scalable Distributed Information Management System

advertisement
SDIMS: A Scalable Distributed
Information Management System
Praveen Yalagandula
Mike Dahlin
Laboratory for Advanced Systems Research
(LASR)
University of Texas at Austin
July 12, 2016
Department of Computer Sciences, UT Austin
1
Goal
 A Distributed Operating System Backbone
 Information collection and management
 Core functionality of large distributed systems
 Monitor, query and react to changes in the system
 Examples:
System administration and management
Service placement and location
Sensor monitoring and control
Distributed Denial-of-Service attack
detection
File location service
Multicast tree construction
Naming and request routing
…………
 Benefits




July 12, 2016
Ease development of new services
Facilitate deployment
Avoid repetition of same task by different services
Optimize system performance
Department of Computer Sciences, UT Austin
2
Contributions – SDIMS
 Provides an important basic building block
 Information collection and management
 Satisfies key requirements
 Scalability
• With both nodes and attributes
• Leverage Distributed Hash Tables (DHT)
 Flexibility
• Enable applications to control the aggregation
• Provide flexible API: install, update and probe
 Autonomy
• Enable administrators to control flow of information
• Build Autonomous DHTs
 Robustness
• Handle failures gracefully
• Perform re-aggregation upon failures
– Lazy (by default) and On-demand (optional)
July 12, 2016
Department of Computer Sciences, UT Austin
3
Outline
 SDIMS: a distributed operating system backbone
 Aggregation abstraction
 Our approach
 Design
 Prototype
 Simulation and experimental results
 Conclusions
July 12, 2016
Department of Computer Sciences, UT Austin
4
Aggregation Abstraction
Astrolabe [VanRenesse et al TOCS’03]
f(f(a,b),
9 f(c,d))
f(a,b)
5
 Attributes
4
f(c,d)
 Information at machines
 Aggregation tree
 Physical nodes are leaves
2
d
1
2c
4a
b
 Each virtual node represents a logical group of nodes
A2
A1
A0
• Administrative domains, groups within domains, etc.
 Aggregation function, f, for attribute A
 Computes the aggregated value Ai for level-i subtree
• A0 = locally stored value at the physical node or NULL
• Ai = f(Ai-10, Ai-11, …, Ai-1k) for virtual node with k children
 Each virtual node is simulated by one or more machines
 Example: Total users logged in the system
 Attribute: numUsers
 Aggregation Function: Summation
July 12, 2016
Department of Computer Sciences, UT Austin
5
Outline
 SDIMS: a distributed operating system backbone
 Aggregation abstraction
 Our approach
 Design
 Prototype
 Simulation and experimental results
 Conclusions
July 12, 2016
Department of Computer Sciences, UT Austin
6
Scalability with nodes and attributes
 To be a basic building block, SDIMS should support
 Large number of machines
• Enterprise and global-scale services
• Trend: Large number of small devices
 Applications with a large number of attributes
• Example: File location system
– Each file is an attribute
– Large number of attributes
 Challenges: build aggregation trees in a scalable
way
 Build multiple trees
• Single tree for all attributes  load imbalance
 Ensure small number of children per node in the tree
• Reduces maximum node stress
July 12, 2016
Department of Computer Sciences, UT Austin
7
Building aggregation trees: Exploit DHTs
 A DHT can be viewed as multiple aggregation trees
 Distributed Hash Tables (DHTs)
 Each node assigned an ID from large space (160-bit)
 For each prefix (k), each node has a link to another node
• Matching k bits
• Not matching on the k+1 bit
 Each key from ID space mapped to a node with closest ID
 Routing for a key: Bit correction
• Example: from 0101 for key 1001
• 0101  1110  1000  1000  1001
 Lots of different algorithms with different properties
• Pastry, Tapestry, CAN, CHORD, SkipNet, etc.
• Load-balancing, robustness, etc.
 A DHT can be viewed as a mesh of trees
 Routes from all nodes to a particular key form a tree
 Different trees for different keys
July 12, 2016
Department of Computer Sciences, UT Austin
8
DHT trees as aggregation trees
 Key 111
111
11x
1xx
000
July 12, 2016
100
010
110
001
101
Department of Computer Sciences, UT Austin
011
111
9
DHT trees as aggregation trees
 Keys 111 and 000
000
111
00x
11x
0xx
1xx
000
July 12, 2016
100
010
110
001
101
Department of Computer Sciences, UT Austin
011
111
10
API – Design Goals
 Expose scalable aggregation trees from DHT
 Flexibility: Expose several aggregation mechanisms
 Attributes with different read-to-write ratios
• “CPU load” changes often: a write-dominated attribute
– Aggregate on every write  too much communication cost
• “NumCPUs” changes rarely: a read-dominated attribute
– Aggregate on reads  unnecessary latency
 Spatial and temporal heterogeneity
• Non-uniform and changing read-to-write rates across tree
• Example: a multicast session with changing membership
 Support sparse attributes of same functionality
efficiently
 Examples: file location, multicast, etc.
 Not all nodes are interested in all attributes
July 12, 2016
Department of Computer Sciences, UT Austin
11
Design of Flexible API
 New abstraction: separate attribute type from
attribute name
 Attribute = (attribute type, attribute name)
 Example: type=“fileLocation”, name=“fileFoo”
 Install: an aggregation function for a type
 Amortize installation cost across attributes of same type
 Arguments up and down control aggregation on update
 Update: the value of a particular attribute
 Aggregation performed according to up and down
 Aggregation along tree with key=hash(Attribute)
 Probe: for an aggregated value at some level
 If required, aggregation done to produce this result
 Two modes: one-shot and continuous
July 12, 2016
Department of Computer Sciences, UT Austin
12
Scalability and Flexibility
 Install time aggregation and propagation strategy
 Applications can specify up and down
 Examples
• Update-Local
up=0, down=0
• Update-All
up=ALL, down=ALL
• Update-Up
up=ALL, down=0
 Spatial and temporal heterogeneity
 Applications can exploit continuous mode in probe API
• To propagate aggregate values to only interested nodes
• With expiry time, to propagate for a fixed time
 Also implies scalability with sparse attributes
July 12, 2016
Department of Computer Sciences, UT Austin
13
Scalability and Flexibility
 Install time aggregation and propagation strategy
 Applications can specify up and down
 Examples
• Update-Local
up=0, down=0
• Update-All
up=ALL, down=ALL
• Update-Up
up=ALL, down=0
 Spatial and temporal heterogeneity
 Applications can exploit continuous mode in probe API
• To propagate aggregate values to only interested nodes
• With expiry time, to propagate for a fixed time
 Also implies scalability with sparse attributes
July 12, 2016
Department of Computer Sciences, UT Austin
14
Scalability and Flexibility
 Install time aggregation and propagation strategy
 Applications can specify up and down
 Examples
• Update-Local
up=0, down=0
• Update-All
up=ALL, down=ALL
• Update-Up
up=ALL, down=0
 Spatial and temporal heterogeneity
 Applications can exploit continuous mode in probe API
• To propagate aggregate values to only interested nodes
• With expiry time, to propagate for a fixed time
 Also implies scalability with sparse attributes
July 12, 2016
Department of Computer Sciences, UT Austin
15
Scalability and Flexibility
 Install time aggregation and propagation strategy
 Applications can specify up and down
 Examples
• Update-Local
up=0, down=0
• Update-All
up=ALL, down=ALL
• Update-Up
up=ALL, down=0
 Spatial and temporal heterogeneity
 Applications can exploit continuous mode in probe API
• To propagate aggregate values to only interested nodes
• With expiry time, to propagate for a fixed time
 Also implies scalability with sparse attributes
July 12, 2016
Department of Computer Sciences, UT Austin
16
Scalability and Flexibility
 Install time aggregation and propagation strategy
 Applications can specify up and down
 Examples
• Update-Local
up=0, down=0
• Update-All
up=ALL, down=ALL
• Update-Up
up=ALL, down=0
 Spatial and temporal heterogeneity
 Applications can exploit continuous mode in probe API
• To propagate aggregate values to only interested nodes
• With expiry time, to propagate for a finite time
 Also implies scalability with sparse attributes
July 12, 2016
Department of Computer Sciences, UT Austin
17
Autonomy
 Systems spanning multiple administrative domains
 Allow a domain administrator control information flow
• Prevent external observer from observing the information
in the domain
• Prevent external failures from affecting the operations in
the domain
 Support for efficient domain wise aggregation
 DHT trees might not conform
 Autonomous DHT
 Two properties
110
101
• Path Locality
• Path Convergence
100
 Reach domain root first
July 12, 2016
111
010 000
001
Domain 1
Domain 2
Department of Computer Sciences, UT Austin
011
Domain 3
18
Outline
 SDIMS: a distributed operating system backbone
 Aggregation abstraction
 Our approach




Leverage Distributed Hash Tables
Separate attribute type from name
Flexible API
Prototype and evaluation
 Conclusions
July 12, 2016
Department of Computer Sciences, UT Austin
19
Prototype and Evaluation
 SDIMS prototype
 Built on top of FreePastry [Druschel et al, Rice U.]
 Two layers
• Bottom: Autonomous DHT
• Top: Aggregation Management Layer
 Methodology
 Simulation
• Scalability
• Flexibility
 Prototype
• Micro-benchmarks on real networks
– PlanetLab
– CS Department
July 12, 2016
Department of Computer Sciences, UT Austin
20
Simulation Results - Scalability

Methodology

Two points
 Sparse attributes: multicast sessions with small size membership
 Node Stress = Amt. of incoming and outgoing info
 Max Node Stress an order of magnitude less than Astrolabe
MAXdecreases
node stressaswith
8 increases
 Max Node Stress
theparticipant
number ofsize
nodes
1e+07
Astrolabe 65536
Node Stress
1e+06
Astrolabe 4096
Astrolabe 256
100000
SDIMS 256
10000
SDIMS 4096
1000
SDIMS 65536
100
10
1
1
July 12, 2016
10
100
1000
Number of sessions
Department of Computer Sciences, UT Austin
10000
100000
21
Simulation Results - Flexibility
Number of messages per operation
 Simulation with 4096 nodes
 Attributes with different up and down strategies
10000
1000
Update-All
Update-Local
Up=5,
down=0
100
10
1
Up=all,
down=5
Update-Up
0.1
0.0001
July 12, 2016
0.01
1
100
Read to Write ratio
Department of Computer Sciences, UT Austin
10000
22
Prototype Results
 CS department: 180 machines (283 SDIMS nodes)
 PlanetLab: 70 machines
Planet Lab
Latency (ms)
Department Network
800
3500
700
3000
600
2500
500
2000
400
1500
300
1000
200
500
100
0
0
Update - All
July 12, 2016
Update - Up Update - Local
Update - All Update - Up Update - Local
Department of Computer Sciences, UT Austin
23
Conclusions
 SDIMS – basic building block for large-scale
distributed services
 Provides information collection and management
 Our Approach
 Scalability and Flexibility through
• Leveraging DHTs
• Separating attribute type from attribute name
• Providing flexible API: install, update and probe
 Autonomy through
• Building Autonomous DHTs
 Robustness through
• Default lazy reaggregation
• Optional on-demand reaggregation
July 12, 2016
Department of Computer Sciences, UT Austin
24
For more information:
http://www.cs.utexas.edu/users/ypraveen/sdims
July 12, 2016
Department of Computer Sciences, UT Austin
25
Download