mdcsworkshop120313presentation

advertisement
Parallelization with the Matlab®
Distributed Computing Server
(MDCS) @ CBI cluster
Overview
• Parallelization with Matlab using Parallel
Computing Toolbox(PCT)
• Matlab Distributed Computing Server
Introduction
• Benefits of using the MDCS
• Hardware/Software/Utilization @ CBI
• MDCS Usage Scenarios
• Hands-on Training
2
Parallelization with Matlab PCT
• The Matlab Parallel Computing Toolbox provides
access to multi-core, multi-system(MDCS), GPU
parallelism.
• Many built-in Matlab functions directly support
parallelism ( e.g. FFT ) transparently.
• Parallel constructs such as going from for loops to
parfor loops.
• Allows handling of many different types of parallel
software development challenges.
• MDCS allows scaling of locally developed parallel
enabled Matlab applications.
3
Parallelization with Matlab PCT
• Distributed / Parallel algorithm characteristics
– Memory Usage & CPU Usage
• Load a 4 Gigabyte file into Memory  Calculate averages
– Communication/Data IO patterns
• Read file 1 ( 10 Gigabytes )  Run a function
• Worker B Send data to worker A  run a function  return data to
worker B
– Dependencies
• Function 1 Function 2 Function 3
• Hardware resource contention ( e.g. 16 cores each trying to read
/write a set of files, bandwidth limitations on RAM )
• Managing large #’s of small files  Filesystem contention
4
Parallelization with Matlab PCT
Applications have layers of parallelism:
For optimal solution, must look at the
application as a whole.
Matlab PCT +
MDCS framework
automates much
of the complexity
in developing
parallel &
distributed apps
Scalability: use as many
workers as possible in an
efficient manner
Clusters
CPU’s, Multi-Cores
GPU Cards/External Accelerator Cards
5
Parallelization with Matlab PCT &
MDCS
CPU’s, Multi-Cores
Distributed loops: parfor
Interactive development mode
(matlabpool/pmode)
MDCS Cluster
Scale out with the
MDCS Cluster in
Batch Job
Submission Mode
Distributed Arrays(spmd)
6
MDCS Benefits
MDCS Worker Processes ( a.k.a. “Labs”)
– The workers never request regular Matlab or toolbox licenses.
– The only license an MDCS worker ever uses is an MDCS worker license( of
which we have up to 64 ).
– Toolboxes are unlocked to an MDCS worker based on the licenses owned
by the client during the job submission process.
– Wonderful parallel algorithm development environment with the
superior visualization & profiling capabilities of the Matlab environment.
– Many built-in functions are parallel enabled: fft, lu, svd…
– Distributed arrays allow development of data – parallel algorithms
– Enable the scaling of codes that cannot be compiled using the Matlab
Compiler Toolbox.
– Allows you to go from development on a laptop directly to running on up
to 64 MDCS Labs. ( Some simulations can go from years of runtime to
days of runtime on 64 MDCS Labs)
7
MDCS Structure
8
Hardware/Software/Utilization @
CBI





MDCS worker processes run on 4 physical servers
Dell PowerEdge M910: Four x 16 core systems,
4x64GB RAM, 2x Intel Xeon 2.26 Ghz/system
with 8 cores per processor
Total of 64 cores, with 256 GB total RAM
distributed among systems
Max 64 MDCS worker licenses available
Subsets of MDCS workers can be created based
on project needs
9
Usage scenarios

Local system: Interactive Use: ( matlabpool /
spmd / pmode / mpiprofile )
– Local system(e.g. one of the Workstations @ CBI ) as part of initial
algorithm development.

MDCS: Non-interactive Use: Job&Task based
– 2 main types: Independent vs. Communicating Jobs
• Both types can be used with either the local( on a non-cluster
workstation ) or MDCS profile.
10
MDCS Workloads
2 main types of workloads can be implemented with the MDCS:
– A job is logically decomposed into a set of tasks. The job may
have 1 or more tasks, and each task may or may not have
additional parallelism within it.
CASE 1: Independent

Within a job the parallelism is fully independent, we have the opportunity
to use MDCS workers to offload some of the independent work units. The
code will not make use parallel language features such as parfor, spmd.
Note: In many cases, parfor can be transformed into a set of tasks.
– createJob() + createTask(), createTask(), … createTask()
CASE 2: Communicating

Within a single job the parallelism is more complex, requiring the workers
to communicate or when parfor, spmd, codistributed arrays(language
features are used from Parallel Compute Toolbox).
– createCommunicatingJob(), createTask()
11
MDCS Working Environment
12
MDCS Working Environment
13
Interactive Mode Sample(parfor)
For well mapping workloads, parfor can yield exceptional performance improvement
From years to days / days to hours for certain workloads: ideally case are long running
jobs with little or no inter-job communication.
Standard for
loop
Parfor enabled
on the MDCS
14
MDCS Scaling ( Batch Mode )
15
MDCS Scaling( Batch mode )
16
MDCS Scaling ( Batch mode )
17
Summary
• Applied examples of using MDCS in Batch mode
available as part of hands-on section or via
consulting appointment for more in-depth
MDCS usage information.
• We can allocate a subset of MDCS workers on a
per project basis.
18
Summary
• Wonderful parallel algorithm design &
development environment
• Scale out codes up to 64 Matlab MDCS workers
– Both distributed compute & memory
• Standard Matlab+Toolbox license usage
minimization
• Many options to approach parallelization of
computational workloads.
19
Acknowledgements
• This project received computational, research &
development, software design/development support
from the Computational System Biology
Core/Computational Biology Initiative, funded by the
National Institute on Minority Health and Health
Disparities (G12MD007591) from the National
Institutes of Health. URL: http://www.cbi.utsa.edu
20
Contact Us
http://cbi.utsa.edu
21
Appendix A
22
Local Mode: Matlab Worker
Process/Thread Structure
Parallel Toolbox constructs can be tested in local mode, the “lab” abstraction allows the
actual process used for a lab to reside either locally or on a distributed server node.
MPI used
for inter-process
communication
between “Labs”,
Matlab Worker
Processes
23
Local Mode Scaling Sample(parfor)
24
Interactive Mode
Sample(pmode/spmd)
Each lab handles
a piece of the
data.
Results are
gathered on lab
1.
Client session
requests the
complete data
set to be sent
to it using
lab2client
25
Local vs. MDCS Mode Compare
(parfor)
26
Appendix B: MDCS Access
• Access to MDCS provided via Cheetah Cluster.
– On Linux: ssh –Y username@cheetah.cbi.utsa.edu
– qlogin
– matlab &
27
Appendix B: MDCS Access
• Access to MDCS provided via Cheetah Cluster.
– On Windows: Using PuTTY + Xming w/X11
forwarding
– qlogin
– matlab &
28
References
[1] http://www.mathworks.com/products/parallel-computing/ ( Parallel Computing Toolbox reference )
[2] http://www.mathworks.com/help/toolbox/distcomp/f1-6010.html#brqxnfb-1 (Parallel Computing Toolbox)
[3] http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html ( Parallel Computing Toolbox )
[4] http://www.mathworks.com/products/distriben/supported/license-management.html ( MDCS License Management )
[5] http://www.mathworks.com/cmsimages/dm_interact_wl_11322.jpg ( MDCS Architecture Overview )
[6] http://www.mathworks.com/cmsimages/62006_wl_mdcs_fig1_wl.jpg ( MDCS Architecture Overview: Scalability )
[7] http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html ( Built-in MDCS support )
[8] http://www.mathworks.com/products/datasheets/pdf/matlab-distributed-computing-server.pdf ( MDCS Licensing )
[9] http://www.psc.edu/index.php/matlab ( MDCS @ PCS)
[10] http://www.mathworks.com/products/compiler/supported/compiler_support.html ( Compiler Support for MATLAB and Toolboxes )
[11] http://www.mathworks.com/support/solutions/en/data/1-2MC1RY/?solution=1-2MC1RY ( SGE Integration )
[12] http://www.mathworks.com/company/events/webinars/wbnr30965.html?id=30965&p1=70413&p2=70415 ( MDCS Administration )
[13] http://www.mathworks.com/help/toolbox/mdce/f4-10664.html ( General MDCE Workflow )
[14] http://www.mathworks.com/help/toolbox/distcomp/f3-10664.html ( Independent Jobs with MDCS )
[15] http://cac.engin.umich.edu/swafs/training/pdfs/matlab.pdf ( MDCS @ Umich )
[16] http://www.mathworks.com/products/optimization/examples.html?file=/products/demos/shipping/optim/optimparfor.html ( Optimization toolbox example )
[17] http://www.mathworks.com/products/distriben/examples.html ( MDCS Examples )
[18] http://www.mathworks.com/support/product/DM/installation/ver_current/ ( MDCS Installation Guide R2012a )
[19] http://www.psc.edu/index.php/matlab ( MDCS @ PSC )
[20] http://rcc.its.psu.edu/resources/software/dmatlab/ ( MDCS @ Penn State )
[21] http://ccr.buffalo.edu/support/software-resources/compilers-programming-languages/matlab/mdcs.html ( MDCS @ U of Buffalo)
[22] http://www.cac.cornell.edu/wiki/index.php?title=Running_MDCS_Jobs_on_the_ATLAS_cluster ( MDCS @ Cornell )
[23] http://www.mathworks.com/products/distriben/description3.html ( MDCS Licensing )
[24] http://www.mathworks.com/cmsimages/dm_interact_wl_11322.jpg ( MDCS Architecture )
29
References
[25] http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/bqxooam-1.html ( Built-in functions that work with distributed arrays )
[26] http://www.rz.rwthaachen.de/aw/cms/rz/Themen/hochleistungsrechnen/nutzung/nutzung_des_rechners_unter_windows/~sxm/MATLAB_Parallel_Computing_Toolbox/?lang=de
( MDCS @ Aachen University )
[27] http://www.mathworks.com/support/solutions/en/data/1-9D3XVH/index.html?solution=1-9D3XVH ( Compiled Matlab Applications using PCT + MDCS)
[28] http://www.hpc.maths.unsw.edu.au/tensor/matlab ( MDCS @ UNSW )
[29] http://blogs.mathworks.com/loren/2012/04/20/running-scripts-on-a-cluster-using-the-batch-command-in-parallel-computing-toolbox/ ( Batch command )
[30] http://www.rcac.purdue.edu/userinfo/resources/peregrine1/userguide.cfm#run_pbs_examples_app_matlab_licenses_strategies ( MDCS @ Purdue )
[31] http://www.mathworks.com/help/pdf_doc/distcomp/distcomp.pdf ( Parallel Computing Toolbox R2012a )
[32] http://www.nccs.nasa.gov/matlab_instructions.html ( MDCS @ Nasa )
[33] http://www.mathworks.com/help/toolbox/distcomp/rn/bs8h9g9-1.html ( PCT, MDCS R2012a interface changes )
[34] http://www.mathworks.com/help/toolbox/distcomp/createcommunicatingjob.html ( Communicating jobs )
[35] http://www.mathworks.com/products/parallel-computing/examples.html?file=/products/demos/shipping/distcomp/paralleltutorial_dividing_tasks.html (
Moving parfor loops to jobs+tasks )
[36] http://people.sc.fsu.edu/~jburkardt/presentations/fsu_2011_matlab_tasks.pdf ( MDCS @ FSU: Task based parallelism )
[37] http://www.icam.vt.edu/Computing/fdi_2012_parfor.pdf ( MDCS @ Virginia Tech: Parfor parallelism )
[38] http://www.hpc.fsu.edu/ ( MDCS @ FSU, HPC main site )
[39] http://www.mathworks.com/help/toolbox/distcomp/rn/bs8h9g9-1.html ( PCT Updates in R2012a )
[40] http://www.mathworks.com/help/distcomp/using-matlab-functions-on-codistributed-arrays.html ( Built in functions available for Co-Distributed arrays )
[41] http://scv.bu.edu/~kadin/Tutorials/PCT/matlab-pct.html ( Matlab PCT @ Boston University )
[42] http://www.circ.rochester.edu/wiki/index.php/MatlabWorkshop#Example_using_distributed_arrays_for_FFT
[43] http://www.advancedlinuxprogramming.com/alp-folder/alp-ch04-threads.pdf
[44] http://www.mathworks.com/products/distriben/parallel/accelerate.html
[45] http://www.mathworks.com/products/distriben/examples.html?file=/products/parallel-computing/includes/parallel.html
[46] http://en.wikipedia.org/wiki/Gustafson%27s_law
[47] http://www.mathworks.com/help/distcomp/index.html
[48] http://www.mathworks.com/cmsimages/43623_wl_dm_using_paralles_forloops_wl.jpg
[49] http://www.mathworks.com/help/distcomp/mpiprofile.html
30
Download