PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY

advertisement
PROGRAMMING SUPPORT AND
ADAPTIVE CHECKPOINTING FOR
HIGH-THROUGHPUT DATA SERVICES
WITH LOG-BASED RECOVERY
Jingyu Zhou
Shanghai Jiao Tong Univ
Jiesheng Wu
Microsoft
DSN 2010
Caijie Zhang
Google Inc.
Hong Tang
Yahoo!
Tao Yang
UC Santa Barbara
Backgrounds

Many large-scale data-mining and offline
applications in Google, Yahoo, Microsoft, Ask.com,
etc. require
 High
data parallelism/throughput
 Data persistence. But not so stringent availability

E.g., URL property service (UPS) at Ask.com search
offline mining platform
 Hundreds
of app. modules access UPS
Examples of high-throughput data services for web
mining/search
Internet
Web documents
Data/ info
service
Crawler
Crawler
Crawler
Document
DB
Document
DB
Document
DB
10-50 billion URLs
Data/info
service
Data mining
Data mining
job
job
…
Data mining
job
e.g. URL property
service. 100K-500K/s
Existing Approaches for Highperformance and Persistence

Database systems
 suffer
from high overhead, limits its performance while
supporting general features
 Need more machine resources

Related work and well-known techniques for high
availability
 Data
replication
 Log-based recovery
 Checkpointing
Challenges and Focus of this work

System design with careful selection and integration
of fault-tolerant techniques for high throughput
computing.
 Trade
off in availability, but allow some down time.
 Low cost: logging/checkpoint.
 Fine-grain
for minimum service disruption.
 Local data recovery. Periodic remote backup.

Programming support
 Lightweight,
services.
simplifying construction of robust data
SLACH: Selective Logging & Adaptive
CHeckpointing

Targeted data services
 Request-driven
thread model.
 In-memory objects.
 Data independence.
 Similar
to key-value stores in BigTable/Dynamo, but higher
throughput.
Architecture of SLACH
Main Techniques

Selective operation logging

Only log write operations



(oid, op_type, parameters, timestamp)
Write-ahead log, i.e., write then apply operations
Object-level checkpoint to avoid service disruptions
with adaptive load control
Ckpt objects one-by-one. Still allow concurrent access of
other objects
 Perform checkpointing when load is low to amortize cost of
checkpointing


Light weight API while supporting legacy code.
Object-level Checkpoints
Adaptive Checkpointing Control

Goal is to balance ckpt. cost and recovery speed
 Ckpt.
less frequently-> larger logs -> lengthy recovery
 Ckpt. too often -> higher overhead

Ideally,
 High
server load -> ckpt. less frequently
 Low server load -> ckpt. more frequently

Adjust between a Low Watermark (LW) & High
Watermark (HW) of service loads
 Loadcurr
= α×loadprev+(1-α)×sample
Adaptive Checkpointing Frequency

Ckpt. threshold between LB and UB
 LB,
UB are log size parameters, determined by app.
Threshold  LB  F(load)  (UB  LB),
where

SLACH Programming Support

Application developers
Call SLACH function log() to log an object operation
 Define 3 callback functions:

1) what to checkpoint (call SLACH’s ckpt() for each selected
object,
 2) recover one object from a checkpoint,
 3) replay a log operation.


SLACH
Provide functions log() and ckpt().
 Call user’s checkpoint callback fun during checkpoint.
 Call a user’s recover function during checkpoint recover.
 Call a user’s replay function when recovering from a log.

SLACH API for Applications
class SLACH::API {
public:
/* register ckpt. policy and parameters */
void register_policy(const Policy& p);
/* log one write operation */
void log(int64_t obj_id, int op, ...);
/* checkpoint one object */
void ckpt(int64_t obj_id, const void* addr, uint32_t size);
};
SLACH Interface
class SLACH::Application {
…
protected:
/* application checkpoint callback function */
virtual void ckpt_callback()=0;
/* callback of loading one object checkpoint*/
virtual void load_one_callback(int64_t obj_id, const void *addr,uint32_t
size)=0;
/* callback of replaying one operation log */
virtual void replay_one_callback(int64_t obj_id, int op, const para_vec&
args)=0;
};
An Example: Application-level code
struct Item {
double price;
int quantity;
};
class MyService : public SLACH::Application {
private:
Item obj[1000];
SLACH::API slach_; /* SLACH API */
static const int OP_PRICE=0;/* an op type */
public:
void update_price(int id, double p) {
slach_.log(id, OP_PRICE, &p, sizeof(p));
obj[id].price = p;
}
Application objects
being accessed
Log selected object
update operation
An Example: Call-back functions
SLACH calls this user function
during checkpointing.
void ckpt_callback() {
for (int i=0; i<1000 ; i++)
slach_.ckpt(i, &obj[i], sizeof(obj[i]));
}
SLACH calls this when
recovering an object from a
checkpoint .
void load_one_callback(int64_t id, const void
*p, uint32_t size) {
memcpy(&obj[id], p, size);
}
SLACH calls this when
recovering an object by log
replaying
void replay_one_callback(int64_t id, int op,
const para_vec& args) {
switch (op) {
case OP_PRICE:
obj[id].price = *(double*)args[0].second;
break;
// ...
}
SLACH Implementation and
Applications


Part of Ask.com middleware infrastructure in C++ for
data mining and search offline platform
Application samples:
UPS (URL property service) for recording property of all
URLs crawled/collected.
 HIS (Host information service) for recording property of all
hosts crawled on the web.
 20-80% of write traffic. Running on a cluster of hundreds of
machines. In production for last 3 years.


Significantly reduced development time (1-2 months vs.
few days).
Characteristics of UPS/HIS


Perfor. characteristics of UPS/HIS per partition.
Data
Max. Read
Max. Write
UPS
1.9GB
110K Req/s
56K Req/s
HIS
2.1GB
58K Req/s
16K Req/s
Parameters for adaptive ckpt. Control
UPS
HIS
0.8
α
Moving avg.
0.8
LB/UB
low/upper b.
1M-8M entries 0.3M-1.8M
LW/HW
L/H
watermark
20%-85%
35%-85%
β
Scaling
3
6
w
Sampling win.
5s
5s
Evaluation




Impact of logging overhead
System behavior during checkpointing
Effectiveness of adaptive checkpoint control
Performance comparison of hash table
implementation using SLACH and BerkeleyDB

Evaluation Setting

Benchmarks
 UPS
(URL property service)
 HIS (Host-level property service)
 Persistent Hash Table (PHT)

Metric: throughput loss percent
SuccessfulRe quests
LossPercent  (1
) 100
TotalRe quests

Hardware: a 15 node cluster, gigabit link
Selective Logging Overhead of UPS
• Base: logging is
disabled
• Log: selective
logging is enabled
Negligible impact when
server load < 40%.
System Performance During
Checkpointing (100% server load)
During ckpt, 8.9%
throughput drop
During ckpt, 57.6%
increase of response
time
Effectiveness of Adaptive Threshold Controller –
Performance Comparison in UPS
• Fixed threshold policy, 8M has lower runtime overhead – less frequent ckpt
• Adaptive approach has comparable performance as fixed policy of 8M.
Effectiveness of Threshold Controller –
Recovery Speed
• Fixed threshold -> fixed log size -> same recovery time
• Adaptive approach: small log for light load (less recovery time), large log for higher
load (more recovery time)
SLACH is better for all value sizes, because
1. BDB incurs more per-operation
overhead
2. BDB involves more disk I/Os
PHT vs. Berkeley DB
30-B value, SLACH is 5.3 times higher
SLACH ckpt has less overhead
1. BDB ckpt is not async
2. SLACH fuzzy ckpt still allow access
Conclusions

SLACH contributions

A lightweight programming framework for very highthroughput, persistent data services
Simplify application construction while meeting reliability
demands
 Selective logging to enhance performance


System design with careful integration of multiple techniques
Dynamic adjust ckpt. frequency to meet throughput demands
 Fine-grained ckpt without service disruptions


Evaluation of integrated scheme in production
applications.
Data and Failure Models

Data independence and object-oriented access
model
 Key-value
store as in Dynamo/BigTable, but with much
higher throughput demand per machine
 Each object is a continuous memory block
 Middleware

infrastructure can handle noncontiguous ones
Fail-stop
 Focus
on local recovery due to app. failures
 OS/Hardware failure can be dealt with remote ckpt.
 Implemented,
but not the scope of this paper
Download