Work Queue: A Scalable Master/Worker Framework

advertisement
Work Queue: A Scalable
Master/Worker Framework
Peter Bui
June 29, 2010
Master/Worker Model
• Central Master application
o
o
o
Divides work into tasks
Sends tasks to Workers
Gathers results
• Distributed collection of Workers
o
o
o
Receives input and executable files
Runs executable files
Returns output files
Work Queue versus MPI
Work Queue
MPI

– Number of workers dynamic
– Scale up to large number of
workers (100s - 1000s)
– Reliable and fault tolerant at
the task level
– Allows for heterogeneous
deployment environments
– Workers communicate only
with Master
– Number of workers static
– Scale up to limited number of
workers (16, 32, 64)
– Reliable at application level
but no fault tolerance
– Requires homogeneous
deployment environment
– Workers can communicate
with anyone
Success Stories
All-Pairs
Makeflow
Wavefront
SAND
Architecture (Overview)
Architecture (Master)
• Uses Work Queue library
o
o
Creates a Queue
Submits Tasks
 Command
 Input files
 Output files
o
Library keeps tracks of Tasks
 When a Worker is available, the
library sends Tasks
o
When Tasks complete
 Retrieve output files
Architecture (Workers)
• User start workers on any
machine
• Contact Master and request
work
• When Task is received,
perform commutation,
return results
• After set idle timeout, quit
and cleanup
API Overview (Work Queue)
Simple C API
• Work Queue
o
work_queue_create(int port)
Create a new work queue.
o
work_queue_delete(struct work_queue *q)
Delete a work queue.
o
work_queue_empty(struct work_queue *q)
Determine whether there are any known tasks queued, running, or waiting
to be collected.
API Overview (Task)
Simple C API
• Task
o
work_queue_task_create(const char *command)
Create a new task specification.
o
work_queue_task_delete(struct work_queue_task *t)
Delete a task specification.
o
work_queue_task_specify_input_file(struct work_queue_task *t, const char
*fname, const char *rname);
Add input file specification.
o
work_queue_task_specify_output_file(struct work_queue_task *t, const
char *rname, const char *fname);
Add output file specification.
API Overview (Execution)
Simple C API
• Execution
o
work_queue_submit(struct work_queue *q, struct work_queue_task *t)
Submit a job to a work queue.
o
work_queue_wait(struct work_queue *q, int timeout)
Wait for tasks to complete.
Software Configuration
Web Information
http://cse.nd.edu/~ccl/software/installed.shtml
AFS
$ setenv PATH ~ccl/software/cctools/bin:$PATH
$ setenv PATH ~condor/software/bin:$PATH
CRC
$ module use /afs/nd.edu/user37/ccl/software/modulefiles
$ module load cctools
$ module load condor
Example 1: DConvert
• Goal: convert set of input images to specified format in
parallel
o
o
Input: <format> <input_image1> <input_image2> ...
Output: converted images in specified format
• Skeleton:
o
~pbui/www/scratch/workqueue-tutorial.tar.gz
DConvert (Preparation)
Setup scratch workspace
$ mkdir /tmp/$USER-scratch
$ cd /tmp/$USER-scratch
$ pwd
Copy source tarball and extract it
$ cp ~pbui/www/scratch/workqueue-tutorial.tar.gz .
$ tar xzvf workqueue-tutorial.tar.gz
$ cd workqueue-tutorial
$ ls
Open dconvert.c source file for editting
$ gedit dconvert.c &
DConvert (TODO 1, 2, and 3)
// TODO 1: include work queue header file
#include "work_queue.h"
// TODO 2: declare work queue and task structs
struct work_queue *q;
struct work_queue_task *t;
// TODO 3: create work queue using default port
q = work_queue_create(0);
DConvert (TODO 4, 5, 6)
// TODO 4: create task, specify input and output file, submit task
t = work_queue_task_create(command);
work_queue_task_specify_input_file(t, input_file, input_file);
work_queue_task_specify_output_file(t, output_file, output_file);
work_queue_submit(q, t);
// TODO 5: while work queue is empty wait for task, then delete returned task
while (!work_queue_empty(q)) {
t = work_queue_wait(q, 10);
if (t) work_queue_task_delete(t);
}
// TODO 6: delete work queue
work_queue_delete(q);
DConvert (Demonstration)
Build and prepare application
$ make
$ cp /usr/share/pixmaps/*.png .
Start batch of workers
$ condor_submit_workers `hostname` 9123 5
Start application
$ ./dconvert jpg *.png
Tips and Tricks (Debugging)
Debugging
• Enable cctools debugging system
o
In master application:
 debug_flags_set("wq");
 debug_flags_set("debug");
o
In workers:
 work_queue_worker -d debug -d wq <hostname> <port>
• Incrementally test number of workers
Failed Execution
• Include executable and dependencies as input files
• Right target platform (32-bit vs 64-bit, OS, etc.)
Tips and Tricks (Tasks)
Tag Tasks
• Give a task an identifying tag so Master can keep track of it
Use input and output buffers
•
•
work_queue_task_specify_input_buf
o Contents of buffer will be materialized as a file
task->output
o Buffer that contains standard output of task
Check task results
•
•
task->result: result of task
task->return_status: exit code
of command line
at worker
Tips and Tricks (Batch)
Custom Worker Environment
• Modify batch system specific submit scripts
o
condor_submit_workers
 Set requirements
o
sge_submit_workers
 Set environment
 Set modules
Tips and Tricks (CRC)
Submit master, find host, submit workers
•
qsub myscript.sh
#!/bin/csh
master
•
qstat -u <afsid> | grep myscript.sh
•
sge_submit_workers <hostname> <port>
Example 2: Mandelbrot Generator
• Goal: generate mandelbrot image
o
o
Input: <width> <height> <xmin> <xmax> <ymin> <ymax> <max_iterations>
Output: mandelbrot image in PPM format
• Skeleton:
o
~pbui/www/scratch/workqueue-tutorial.tar.gz
Mandelbrot (Overview)
z(n+1) = z^2 + c
Escape Time Algorithm
• For each pixel (r, c) in image calculate if corresponding point (x,
y) escapes boundary
• Iterative algorithm where each pixel computation is
independent
Application design
• Master partitions image into tasks
• Workers compute Escape Time Algorithm on partitions
Mandelbrot (Naive Approach)
Master
• For each pixel (r, c) in image (width x height)
o
o
Computer corresponding x, y
Submit task with for pixel with x, y
 Pass x, y parameters as input buffer
 Tag task with r, c values
• Wait for each task to complete:
o
o
o
Retrieve output of worker from task->output
Retrieve r, c from task->tag
Store pixel[r, c] = output
• Output pixels in PPM format
Mandelbrot (Naive Approach)
Worker
• Read in parameters from input file:
o
x0, y0, max_iterations, black_value
• Perform Mandelbrot computation as specified from Wikipedia:
o
http://en.wikipedia.org/wiki/Mandelbrot_set#For_programmers
• Output result (iterations) to standard out
Mandelbrot (Analysis)
Problem
• Processing each pixel as a single task is inefficient
o Too-fine grained
o Overhead of sending parameters, running tasks, and
retrieving results > than computation time
Work Queue Golden Rule:
Computation Time > Data Transfer Time + Task
setup overhead
Mandelbrot (Better Approach)
Send Rows
• Process groups of pixels rather than individual ones:
o Send a row and have the worker return a series of results
o Perhaps send multiple rows?
• Should take execution time from minutes to seconds
Mandelbrot (Demonstration)
Build application
$ make
Start batch of workers
$ condor_submit_workers `hostname` 9123 10
Start application
$ ./mandelbrot_master 512 512 -2 1 -1.5 1.5 250 > output.ppm
$ display output.ppm
Advanced Features
Fast Abort
• Allow Work Queue to pre-emptively kill slow tasks
•
work_queue_activate_fast_abort(q, X)
o X is the fast abort multiplier
o if (runtime >= average_runtime
* X) fast_abort
Scheduling
• Change how workers are selected
o
o
o
FCFS: first come, first serve
FILES: has the most cached files
TIME: fastest average turn around time
• Can be set for queue or for task
Advanced Features (More)
Automatic Master Detection
• Start master with a project name:
o
setenv WORK_QUEUE_NAME="project_name"
• Enable master auto selection mode with workers
o
o
work_queue_worker -a -N "project_name"
work_queue_pool -T condor -a -N "project_name"
• Checkout master at http://chirp.cse.nd.edu
Shut down workers
•
work_queue_shut_down_workers
Web Resources
Website
http://www.nd.edu/~ccl/software/workqueue/
• User manual and C API documentation
Bug Reports and Suggestions
http://www.cse.nd.edu/~ccl/software/help.shtml
Python-API
http://bitbucket.org/pbui/python-workqueue/
• Experimental Python binding
Download