filename*=utf8''Consistency_in_the_cloud_2013_11_28_%E5%94%90%E5%AE%87&response-cache

advertisement
唐宇
2013-11-28
目录
 Perspective on the CAP theorem
 Making Geo-Replicated Systems Fast as Possible,
Consistent when Necessary
 Don’t Settle for Eventual: Scalable Causal Consistency
for Wide-Area Storage with COPS
 Summary
CAP Theorem
Exist a total order of all
Operations such that each
operation looks as if it were
completed at a single instant
Each request eventually
receives a response
The system continues to operate despite arbitrary
message loss or failure of part of the system
CAP证明
Theoretical Context
 C与A的权衡问题是不可靠系统中安全性(safety)与活
性(liveness)之间权衡问题的一个范例
 安全性(safety):Something bad will not happen
 Consistency requirements are typically safety properties
 活性(liveness): Something good eventually happens

可用性是活性的一个范例
 The relationship between safety and liveness
properties has been a long-standing challenge in
distributed computing.
 The problem of consensus
Consensus问题
 问题场景
 多个进程各自拥有一个初始值
 所有进程对这些初始值中的某一个达成一致
 三个要求
 Agreement: every process must output the same value;
 Validity: every value output must have been provided as the
input for some process;
 Termination: every process must eventually output a value.
 Agreement and validity are safety properties, while
termination is a liveness property.
 已知的结论:
 In the case of consensus, safety and liveness are impossible if
the system is even potentially slightly faulty
Practical Implications
 C与A的权衡不可避免,如何设计分布式系统?
 Best-effort availability

Chubby
 Best-effort consistency

网络缓存
 Balancing consistency and availability

TACT (Tunable Availability and Consistency Trade-offs)工具
 Segmenting consistency and availability

支付 vs 购物车
目录
 Perspective on the CAP theorem
 Making Geo-Replicated Systems Fast as Possible,
Consistent when Necessary
 Don’t Settle for Eventual: Scalable Causal Consistency
for Wide-Area Storage with COPS
 Summary
Motivation
 Geo-replication
 Internet users are globally distributed
 Applications replicate data across datacenters
 Reduce network latencies to users
 矛盾:
 Cross-site consistency latency
 The problems are magnified with WAN latency
 Observation:
 Strong consistency is not always required: Depend on the
applications
 Goal:
 RedBlue Consistency: Mixing strong consistency (for
application semantics) & eventual consistency (for fast
responses) in a same system
Red and Blue
 Blue operations: order of execution can vary from site
to site
 Red operations: must be executed in the same order at
all sites.
RedBlue Consistency
 RedBlue Order
 Red operations must be totally ordered
 The order of Blue operations can vary
from site to site
 Causal serialization
 A site has a causal serialization of the
RedBlue order if
a)
b)
the ordering is a linear extension of the
RedBlue order
for any two operations u, v, if site(v) = i
and u < v in Oi, then u < v in O
 RedBlue Consistency
 Each site applies operations according
to the causal serialization of the
RedBlue order
State convergence
 保证可用性/性能
 Deposit
 Accrueinterest
 执行顺序影响结果
 State convergence
 原因
 两操作不能交换
 解决方法
 传递操作的执行
结果
Generator & Shadow operations
 很多操作不能直接交换,如Deposit与Accrueinterest
 将每一个操作分解为generator operations和 shadow
operations
 Generator operations




Only executed at the primary site against a system state
Produces no side effects
Determines state transitions that would occur
Produces shadow operations
 Shadow operations
 Applies the state transitions to all the sites including the
primary site
 Must produce the same effects as the original operation given
the original state for the Generator operation
An example
Converged but Invalid
 通信的实时性
 需要Red
Red or Blue?
Correct Result
Design & Implementation
Local site
Client
Remote site
Request
Proxy
Produce shadow operation(s)
Admit or reject this
operation according to
RedBlue consistency
Single MySQL node
Shadow
operation(s)
Coordinator
Admit
Data writer
Write
Storage
engine
Coordinator
Propagate the
admit operation(s)
Note:
When a shadow
operation is rejected,
the proxy server
re-executes the
generator operation
And restarts the
Process.
Optimistic concurrency control
 Timestamp - Logical clock form <<b0; b1; … ; bk-1>, r>
 bi is the local count of shadow operations initially
executed by site i;
 r is the global count of red shadow operations.
 保证全局Red操作全序 - token passing scheme
 每个site轮流占有全局唯一token,拥有token的site才能
增加全局计数r,每个site占有token的时间为1s
 当Blue操作执行完成,增加bi的值
 当Red操作执行完成,增加bi和r的值
Experimental Setup
 Amazon EC2 using extra large virtual machine
instances located in five sites:
 US east (UE)
 US west (UW)
 Ireland (IE)
 Brazil (BR)
 Singapore (SG)
User observed latency
 Each user issues a single outstanding request at a time
Case Studies
 TPC-W modles an online bookstore
 RUBiS emulates an online auction website modeled
after eBay
 Quoddy is an open source Facebook-like social
networking site
Single workload
Proportion of Blue and read
Apps
workload
Originally
With shadow ops
Read-only(%) Update(%)
Blue(%)
Red(%)
Shopping mix
85
15
99.2
0.8
Browsing mix
96
4
99.5
0.5
Ordering mix
63
37
93.6
6.4
RUBiS
Bidding mix
85
15
97.4
2.6
Quoddy
Mix
85
15
100
0
TPC-W
Mixed workload
Summary
 RedBlue consistency combines strong and eventual
consistency into a single system
 The decompositon of generator/shadow operations
expands the space of possible Blue operations
 A simple rule for labeling is provably state convergent
and invariant preserving
Discussion
 人工干预较多
 将操作分解为generator和shadow
 将shadow操作标记为Blue或Red
 系统设计局限性
 无容错机制
 Token轮询机制
 疑惑
 不同site间的causal如何实时获得?
目录
 Perspective on the CAP theorem
 Making Geo-Replicated Systems Fast as Possible,
Consistent when Necessary
 Don’t Settle for Eventual: Scalable Causal Consistency
for Wide-Area Storage with COPS
 Summary
Wide-Area Storage
Desired Properties: ALPS
 Availability
 All operations issued to the data store complete
successfully
 Low Latency
 Client operations complete “quickly.”
 Partition Tolerance
 The data store continues to operate under network
partitions
 Scalability
 The data store scales out linearly
Consistency with ALPS
 由于选择了A+P,就不能得到C( linearizability )了
 Sequential consistency与AP冲突
 Causal+可与AP共同获得
 Causal + Convergent Conflict Handling
Causal
Conflicts in Causal
V=3
V=4
Causal + Conflict Handling
V=4
V=4
V=4
以前的Causal+系统
 Bayou 1994, TACT 2000, PRACTI 2006
Limited Scalability
All data should fit on same machine (Bayou)
The set of keys over which causal+ consistency is
provided are still limited to what a single
machine can handle (PRACTI)
Causal in COPS
 确定因果关系的依据
 版本(versions)
 依赖关系(dependencies)
 Versions – 同一个Key不同的值
 To reason about different values of a key
 Each replica returns non decreasing versions of a key
 Dependencies – 多个操作之间
 yj depends on xi if and only if put(xi) → put (yj)
 Writing a version only after writing all of its
dependencies
COPS系谱
 标准COPS
 COPS-GT(get transactions)
 Provides a superset of COPS’ functionalities by also
introducing support for get transactions
 COPS-CD(conflict detection)
 COPS with conflict detection
Get操作
Put操作
COPS architecture
Returns a consistent view of
multiple key-value pairs in a
single call
Gets and puts are linearizable
across the nodes in the cluster
Operations between
clusters occur
asynchronously
Causal dependency
 依赖检查时由Nearest Dependencies决定
 Get_trans操作才会使用All Dependencies
Get_trans in COPS-GT
 保证一致性视角
 尽量保证最新版本
获取一遍所有数据,
更新客户端版本
检查版本是否一致
若不一致需重读
更新客户端信息
返回结果
Garbage Collection Subsystem
 Version Garbage Collection
 COPS-GT only

在一段时间阈值内,旧版本数据不被读取时
 Dependency Garbage Collection
 COPS-GT only

当数据被同步到所有数据中心一段时间后
 Client Metadata Garbage Collection
 COPS + COPS-GT

当新的数据被同步到所有数据中心后(get操作返回标记位)
Conflic Detection
 The default COPS system avoids conflict detection using a
last-writer wins strategy
 “Last-writer” is determined by comparing version numbers
 COPS with conflict detection (COPS-CD)
 需要在put操作中增加prev参数,表示当前可见的版本号
 Detect a conflict

prev ≠ curr if and only if new and curr conflict
 COPS-CD has an application specified convergent
conflict handler that is invoked when a conflict is detected.
Microbenchmarks
 Put_after(x) 表示有x个dependencies
Experiment config
Dynamic Workloads(1)
variances
Dynamic Workloads(2)
Scalability
 LOG mimics systems based on log serialization and
exchange, which can only provide causal+ consistency with
single node replicas.
目录
 Perspective on the CAP theorem
 Making Geo-Replicated Systems Fast as Possible,
Consistent when Necessary
 Don’t Settle for Eventual: Scalable Causal Consistency
for Wide-Area Storage with COPS
 Summary
两个系统
 Gemini
 RedBlue consistency: Linearizability + Eventual
 COPS
 Causal+
 特点
 操作执行的正确性
 高性能(相对的)
 良好的可扩展性
Download