(2) (3) Shared Scan Batch Scheduling in Cloud Computing

advertisement
Shared Scan Batch Scheduling in
Cloud Computing
Xiaodan Wang
Randal Burns
Johns Hopkins University
Chris Olston
Anish Das Sarma
Yahoo! Research
Project Goals
Eliminate redundant data processing for concurrent
workflows that access the same dataset in the Cloud

Batch MapReduce workflows to enable scan sharing
–
–
–

Data-intensive workloads (tens of minutes to hours)
–
–

Single pass scan of shared data segments
Alleviate contention and improve scalability
Utilize fewer map/reduce slots under load
Joins across multiple datasets
User specified rewards for early completion
Trade-offs between efficient resource utilization and deadlines
Shared Scan Batch Scheduling in Cloud Computing
Data-Driven Batch Scheduling




Q1 R1
R2
R3
Q2 R2
R3
R4
Q3 R1
R2
Co-schedule by Sub-query
Batch Sched.
Data Access by Query
Decomposition
Turbulence DB
R2
Q1
Q2
R1
Q1
Q3
R3
Q1
Q2
R3
Q2
Q3
Query
Results
Throughput scales with contention (Astro. & Turbulence)
Decompose into sub-queries based on data access
Co-schedule sub-queries to amortize I/O
Evaluate data atoms based on utility metric
–
–
–
Reordering based on contention vs. arrival order (CIDR’09)
Adaptive starvation resistance
Job-aware (queries with data dependency) (SC’10)
Shared Scan Batch Scheduling in Cloud Computing
Application in Cloud Computing

Fixed Cloud (fixed resources)
–
–
–
–

Single pass scan of shared data
Alleviate contention (utilize less map/reduce slots, shared
loading and shuffling of data)
Earn rewards for early completion (soft deadlines)
Local improvement w/ simulated annealing, greedy ordering
Elastic Cloud
–
–
–
–
Machine charge = (# of machines) x (# hours)
Speed-up factors w/ more machines (i.e. more parallelism)
Add machines to meet soft deadlines
Aggressive batching to minimize machine charge (efficiency)
Shared Scan Batch Scheduling in Cloud Computing
Nova Workflow Platform

What is Nova?
–
–
Content mgmt and workflow scheduling for the Cloud
Leverages existing resources



Cloud Data: HDFS/Zebra storage
Cloud Computing: Oozie, Pig/MR/Hadoop
Users define complex workflows in Oozie that
consume the data
App 1 App 2 App 3
Sample Pig
Oozie
A = load ‘input1' as (a, b, c);
Workflow
engineA for
MapB = filter
bycoordinating
a > 5;
Reduce/Pig jobs in Hadoop (i.e. Workflow
B into
'output1';
DAGstore
in which
nodes
are MR tasks and edges
C = group B by b;
are dataflows)
store C into 'output2';
Shared Scan Batch Scheduling in Cloud Computing
Advanced workflow:
Nova
Simple workflow: Oozie
Dataflow: Pig
Processing: Hadoop MR
Storage: HDFS
Sample Nova Workflow
crawle
r
output
Nova Data
Nova Data
Nova Data
candidate
entity
occurrences
crawled
pages
(url, content)
validated
entity
occurrences
(url, entity string)
Nova Tasks
cand.
entity
extractor
Nova Data
entity
occurrence
counts
(url, entity id)
(entity id, count)
Nova Task
Nova Task
join
groupwise
count
Nova Data
entities
editors
(entity id, entity string)
Shared Scan Batch Scheduling in Cloud Computing
Shared Scan via Workflow Merging
Nova Workflow 1
c1s0
c3s0
c2s0
c4s0
Nova Workflow 1.2
(scans c2s0 once)
c1s0
Workflow
Merger
Nova Workflow 2
c3s0
c4s0
c2s0
c5s0
c2s0
Input
Data
Output
Data
Pig/MR
Sample
Use Cases in Nova
Tasks
–
–
–
Concurrent research, production, maintenance workflows over same data
Content enrichment workflows (i.e. dedup, clustering) over news content
Webmap workflows consuming same URL table
Shared Scan Batch Scheduling in Cloud Computing
c5s0
Performance Impact
Input Data
(1)
Split(Tuple)
Nested
Nested
Plan
Nested
Plan
Plan
…
(2)
Map1
(1) Shared Loading
(network, redundant proc.)
Map2
Mapn
(2) Consolidated computation
(shared startup/tear down)
Combine(Tuple)
(3) Reducer parallelism
(Max/Sum # of reducers)
Shuffle
Demux(Tuple)
Reduce1
Nested
Nested
Plan
Nested
Plan
Plan
…
Reduce2
(3)
Reducem
Output
Data Output
Data Output
Data
Output
OutputData
Data Output
OutputData
Data Output
OutputData
Data
Shared Scan Batch Scheduling in Cloud Computing
Completion Time by Scheduling Strategy
3000000
Sequential-NoMerge
2500000
Concurrent-NoMerge
Merged
Time (ms)
2000000
1500000
1000000
500000
0
1
2
3
4
5
6
# of Shingling Workflows
Performance in Nova for different enrichment workflows (ie. de-dup)
on news content (SIGMOD’11)
Shared Scan Batch Scheduling in Cloud Computing
Utilization of Grid Resources (Slot Time)
4000000
Concurrent-NoMerge Map
3500000
Concurrent-NoMerge Reduce
Slot Time (ms)
3000000
Merge Map
2500000
Merge Reduce
2000000
1500000
1000000
500000
0
1
2
3
4
# of Shingling Workflows
Shared Scan Batch Scheduling in Cloud Computing
5
6
7
PigMix: Load Cost Savings
Shared Scan Batch Scheduling in Cloud Computing
PigMix: Estimating Makespan
Shared Scan Batch Scheduling in Cloud Computing
Ongoing Work

Starvation resistance
–
–
–
–

Predicting workflow runtime and frequency
–
–

Account for heterogeneity in workflow sizes
Provide soft deadline guarantees
Handling cascading failures
Prefer jobs with high load cost (less dilation, high slot time
savings, map-only jobs)
Robustness to inaccuracies in cost estimates
Conserve or expend Cloud resources based on deadline
requirements and system load
Jobs that join/scan multiple input sources
Shared Scan Batch Scheduling in Cloud Computing
Questions?
Shared Scan Batch Scheduling in Cloud Computing
Nova Workflow Platform

Nova features
–
Abstraction for complex workflows that consume data



Incrementally arriving data (logs, crawls, feeds, ...)
Incremental processing of arriving data
– Stateless: shingle every newly-crawled page
– Stateful: maintain inlink counts as web grows
Scheduling processing steps
– Periodic: run inlink counter once per week
–
–
Triggered: run inlink counter after link extractor
Provides provenance, metadata management, incremental
processing (i.e. joins), data replication, transactional
guarantees
Shared Scan Batch Scheduling in Cloud Computing
PigMix: Reducer Parallelism
Shared Scan Batch Scheduling in Cloud Computing
Optimizing for Shared Scan

Define a job J (i.e. MapReduce or Pig)
–
–

d(J) defines a soft deadline of each job
–
–


Scans files f(J) = (F1, …, Fi), scan time per file: s(Fi)
Fixed processing cost c(J)
Step: d defined by n pairs of (ti, pi) where 0<ti< ti+1 and
pi>pi+1 (a job that completes by ti is award pi points)
Linearly decay: enforce eventual completion w/ negative pts
Cost of shared scan for Jobs J1 and J2
c(J1) + c(J2) + ∑Fє(f(J1) U f(J2)) s(F)
Maximize points and minimize resources
–
–
Local improvement w/ simulated annealing, greedy ordering
Aggressive batching when load is high
Shared Scan Batch Scheduling in Cloud Computing
Performance Evaluation

Experimental Setup
–
–
Nova with Shared Scan Module
200 node Hadoop cluster



–
Shingling workflow (offline content enrichment)




–
128MB HDFS block size
1GB RAM per node
640 mapper and 320 reducer slots
De-duplication of news
Filter and extract features from content
Cluster content by feature and pick one per cluster
Execution of multiple de-dup workflows using different clustering alg.
Scheduling strategies compared



Sequential-NoMerge (slower, conserve Grid resources)
Concurrent-NoMerge (fast, elastic Grid resources)
Merged (fast, conserve Grid resources)
Shared Scan Batch Scheduling in Cloud Computing
Download