Lecture slides - Cooperative Computing Lab

advertisement
Building Scalable Scientific
Applications using Makeflow
Dinesh Rajan and Peter Sempolinski
University of Notre Dame
Cooperative Computing Lab
University of Notre Dame
http://www.nd.edu/~ccl
The Cooperative Computing Lab
• We collaborate with people who have large
large computing
scale computing
problems
in science,
scale
problems
in science,
engineering, and other fields.
• We operate computer systems on the
• We
operatecores:
computer
systems
on grids.
the
O(10,000)
clusters,
clouds,
O(10,000) cores: clusters, clouds, grids.
• We conduct computer science research in
the develop
context of
realsource
peoplesoftware
and problems.
• We
open
for large
distributed
• scale
We develop
opencomputing.
source software for large
scale distributed computing.
http://www.nd.edu/~ccl
3
Plan for Today’s Tutorial
1. Our CCTools Software
i.
Makeflow, Work Queue, Parrot, Chirp
2. Makeflow
i.
Lecture: Overview, features
ii. Tutorial: Write simple Makeflows
3. Work Queue
i.
Lecture: Overview, features
ii. Tutorial: Write simple WQ programs
Science Depends on Computing!
The Good News:
Computing is Plentiful!
6
Superclusters by the Hour
http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
9
I have a standard, debugged, trusted
application that runs on my laptop.
A toy problem completes in one hour.
A real problem will take a month (I think.)
Can I get a single result faster?
Can I get more results in the same time?
Last year,
I heard about
this grid thing.
This year,
I heard about
this cloud thing.
10
I have allocations on
clusters (unlimited) + grids (limited) + clouds ($)!
How do I run my application
on those machines?
Should I port my program to MPI or Hadoop?
Learn MPI / Hadoop
Learn C / Java
Re-architect
Re-write
Re-test
Re-debug
Re-certify
And my application looks like this…
Makeflow & Work Queue
• Easy to scale from one desktop to national scale
infrastructure.
• Harness all available resources:
desktops, clusters, clouds, grids.
• Portable across operating systems, storage
systems, batch systems.
• No special privileges required.
Makeflow
part1 part2 part3: input.data split.py
./split.py input.data
out1: part1 mysim.exe
./mysim.exe part1 >out1
out2: part2 mysim.exe
./mysim.exe part2 >out2
out3: part3 mysim.exe
./mysim.exe part3 >out3
result: out1 out2 out3 join.py
./join.py out1 out2 out3 > result
15
Work Queue Library
#include “work_queue.h”
while( not done ) {
while (more work ready) {
task = work_queue_task_create();
// add some details to the task
work_queue_submit(queue, task);
}
task = work_queue_wait(queue);
// process the completed task
}
http://www.nd.edu/~ccl/software/workqueue
16
Parrot and Chirp
Ordinary
Appl
Filesystem Interface:
open/read/write/close
Parrot Virtual File System
Local
Web
Servers
HTTP
CVMFS
CVMFS
Network
Chirp
Chirp
Server
iRODS
iRODS
Server
17
http://www.nd.edu/~ccl/software/manuals
Source code in GitHub
http://github.com/cooperative-computing-lab/cctools
Makeflow & Work Queue
• Federate/harness all available resources:
desktops, clusters, clouds, grids.
• Simple interfaces & API
• Part of CCTools software
– No special privileges required to install.
Makeflow Lecture: Outline
1. What is Makeflow?
– Portable: One Makeflow program for SGE, Condor, PBS
2. How to write an application using Makeflow?
– Simple rule-based syntax
3. How to run Makeflow?
– Features, commands, using Work Queue
An Old Idea: Makefiles
part1 part2 part3: input.data split.py
./split.py input.data
out1: part1 mysim.exe
./mysim.exe part1 >out1
out2: part2 mysim.exe
./mysim.exe part2 >out2
out3: part3 mysim.exe
./mysim.exe part3 >out3
result: out1 out2 out3 join.py
./join.py out1 out2 out3 > result
22
Makeflow Language - Rules
• Each rule specifies:
– a set of target files to
create;
– a set of source
files needed to create
them;
– a command that
generates the target files
from the source files.
part1 part2 part3: input.data split.py
./split.py input.data
out1: part1 mysim.exe
./mysim.exe part1 >out1
out2: part2 mysim.exe
./mysim.exe part2 >out2
out3: part3 mysim.exe
./mysim.exe part3 >out3
result: out1 out2 out3 join.py
./join.py out1 out2 out3 > result
out1 : part1 mysim.exe
mysim.exe part1 > out1
You must state
all the files
needed by the command.
sims.mf
out.10 : in.dat calib.dat sim.exe
sim.exe –p 10 in.data > out.10
out.20 : in.dat calib.dat sim.exe
sim.exe –p 20 in.data > out.20
out.30 : in.dat calib.dat sim.exe
sim.exe –p 30 in.data > out.30
Makeflow = Make + Workflow
http://www.nd.edu/~ccl/software/makeflow
•
•
•
•
Provides portability across batch systems.
Enable parallelism (but not too much!)
Fault tolerance at multiple scales.
Data and resource management.
Makeflow
Local
Condor
SGE
Work
Queue
Makeflow + Batch System
Makefile
XSEDE
Cluster
Private
Cluster
Campus
Condor
Pool
Public
Cloud
Provider
Makeflow
Local Files and
Programs
How to run a Makeflow
• Run a workflow local
% makeflow -T local sims.mf
• Run the workflow on SGE:
% makeflow -T sge sims.mf
• Run the workflow on Condor:
% makeflow -T condor sims.mf
• Clean up the workflow outputs:
% makeflow -c sims.mf
Makeflow Syntax Checker
• Makeflow can verify if your Makeflow file is
syntactically correct
% makeflow -k sims.mf
Makeflow: Syntax OK.
• Makeflow will point out syntax errors if any
% makeflow -k sims.mf
makeflow: out10 is defined multiple times at out.10:1 and out.10:4
Makeflow Visualization
• Makeflow can output a makeflow file as a Dot graph.
% makeflow -D dot sims.mf
digraph {
node [shape=ellipse,color = green,style = unfilled,fixedsize = false];
N2 [label="sim.exe"];
N1 [label="sim.exe"];
N0 [label="sim.exe"];
node [shape=box,color=blue,style=unfilled,fixedsize=false];
F3 [label = "out.30"];
F0 [label = "sim.exe"];
F5 [label = "out.10"];
F2 [label = "in.dat"];
F1 [label = "calib.dat"];
F4 [label = "out.20"];
..
..
Example App: Biocompute Portal
BLAST
SSAHA
SHRIMP
EST
MAKER
…
Progress
Bar
Transaction
Log
Generate Makefile
Update
Status
Run
Workflow
Make
flow
Submit
Tasks
Condor
Pool
Makeflow + Work Queue
Makeflow + Batch System
Makefile
XSEDE
Cluster
Private
Cluster
Campus
Condor
Pool
Public
Cloud
Provider
Makeflow
Local Files and
Programs
Makeflow + Work Queue
sge_submit_workers
W
XSEDE
Cluster
Makefile
Makeflow
submit
tasks
W
W
W
W
W
Private
Cluster W
Campus
Condor
Pool
W
Thousands of
Workers in a
Personal Cloud
W
W
W
W
Public
Cloud
Provider
Local Files and
Programs
condor_submit_workers
ssh
W
Advantages of Work Queue
• Scalability: Harness multiple infrastructure
simultaneously.
• Elasticity: Scale resources up & down as needed.
• Data Management: Remote data caching.
• Data Locality: Matches tasks to nodes with data.
Fault Tolerance
• MF +WQ is fault tolerant :
– If Makeflow crashes (or killed), it recovers by reading log
and continues where it left off.
– If a worker crashes, the master will detect and restart the
task elsewhere.
– Workers can be added and removed any time during
execution.
Makeflow and Work Queue
To start the Makeflow
% makeflow -T wq sims.mf
Could not create work_queue on port 9123.
% makeflow -T wq -p 0 sims.mf
Listening for workers on port 8374…
To start one worker:
% work_queue_worker ccl.cse.nd.edu 8374
Start Workers Everywhere!
Submit workers to SGE:
% sge_submit_workers ccl.cse.nd.edu 8374 25
Submit workers to Condor:
% condor_submit_workers ccl.cse.nd.edu 8374 25
Submit workers to Torque:
% torque_submit_workers ccl.cse.nd.edu 8374 25
Keeping track of port numbers
gets old fast…
Project Names
makeflow …
–a –N myproject
work_queue_worker
-a –N myproject
connect to
ccl.cse.nd.edu:4057
Makeflow
Worker
(port 4057)
advertise
query
Catalog
“myproject”
is at ccl.cse.nd.edu:4057
Makeflow with Project Names
Start Makeflow with a project name:
% makeflow -T wq -p 0 -a -N xsede-tutorial sims.mf
Listening for workers on port XYZ…
Start one worker:
% work_queue_worker -N xsede-tutorial
Start many workers:
% sge_submit_workers -N ccgrid-tutorial 5
http://www.nd.edu/~ccl/software/makeflow/
Makeflow
The Cooperative
Computing Lab
•• Portable:
One program
for clusters,
grids, clouds
We collaborate
with people
who have
large scale computing problems in science,
• Simple syntax: inputs, outputs, command
engineering, and other fields.
• Alloperate
files needed
by command
be specified
• We
computer
systemsmust
on the
O(10,000) with
cores:
clusters,
clouds, grids.
• Makeflow
Work
Queue
• We conduct computer science research in
• Federation, Elasticity, Data management
the context of real people and problems.
• Project
Names
We develop
open source software for large
scale
distributed
computing.
• Easy
to remember
locations of Makeflow masters
43
Acknowledgements
• Chris Hempel (TACC)
• David Gignac (TACC)
Go to: http://www.nd.edu/~ccl
Click on “Tutorial at XSEDE 2013”
Click on “Tutorial” under Makeflow
Download