HPC-ABDS - Community Grids Lab

advertisement
HPC-ABDS: The Case for an
Integrating Apache Big Data Stack
with HPC
1st JTC 1 SGBD Meeting
SDSC San Diego March 19 2014
Judy Qiu
Shantenu Jha (Rutgers)
Geoffrey Fox
gcf@indiana.edu
http://www.infomall.org
School of Informatics and Computing
Digital Science Center
Indiana University Bloomington
Enhanced
Apache Big
Data Stack
ABDS
• ~120 Capabilities
• >40 Apache
• Green layers have
strong HPC Integration
opportunities
• Goal
• Functionality of ABDS
• Performance of HPC
Broad
Layers
in
HPC-ABDS
Workflow-Orchestration
•
• Application and Analytics
• High level Programming
• Basic Programming model and runtime
– SPMD, Streaming, MapReduce, MPI
• Inter process communication
– Collectives, point to point, publish-subscribe
•
•
•
•
•
•
•
•
•
In memory databases/caches
Object-relational mapping
SQL and NoSQL, File management
Data Transport
Cluster Resource Management (Yarn, Slurm, SGE)
File systems(HDFS, Lustre …)
DevOps (Puppet, Chef …)
IaaS Management from HPC to hypervisors (OpenStack)
Cross Cutting
–
–
–
–
Message Protocols
Distributed Coordination
Security & Privacy
Monitoring
Getting High Performance on Data
Analytics (e.g. Mahout, R …)
• On the systems side, we have two principles
– The Apache Big Data Stack with ~120 projects has important broad
functionality with a vital large support organization
– HPC including MPI has striking success in delivering high performance
with however a fragile sustainability model
• There are key systems abstractions which are levels in HPC-ABDS software
stack where Apache approach needs careful integration with HPC
– Resource management
– Storage
– Programming model -- horizontal scaling parallelism
– Collective and Point to Point communication
– Support of iteration
– Data interface (not just key-value)
• In application areas, we define application abstractions to support
– Graphs/network
– Geospatial
– Images etc.
4 Forms of MapReduce
(a) Map Only
Input
(b) Classic
MapReduce
(c) Iterative
MapReduce
Input
Input
(d) Loosely
Synchronous
Iterations
map
map
map
Pij
reduce
reduce
Output
BLAST Analysis
High Energy Physics
Expectation maximization
Classic MPI
Parametric sweep
(HEP) Histograms
Clustering e.g. Kmeans
PDE Solvers and
Pleasingly Parallel
Distributed search
Linear Algebra, Page Rank
particle dynamics
Domain of MapReduce and Iterative Extensions
MPI
Science Clouds
Giraph
MPI is Map followed by Point to Point or Collective Communication
7
– as in style c) plus d)
HPC ABDS
System (Middleware)
HPC-ABDS
Hourglass
120 Software Projects
System Abstractions/standards
• Data format
• Storage
•
•
•
•
HPC Yarn for Resource management
Horizontally scalable parallel programming model
Collective and Point to Point communication
Support of iteration
Application Abstractions/standards
Graphs, Networks, Images, Geospatial ….
High performance
Applications
SPIDAL (Scalable Parallel
Interoperable Data Analytics Library)
or High performance Mahout, R,
Matlab …..
Integrating Yarn with HPC
We are sort of working on Use Cases with HPC-ABDS
• Use Case 10 Internet of Things: Yarn, Storm, ActiveMQ
• Use Case 19, 20 Genomics. Hadoop, Iterative MapReduce, MPI,
Much better analytics than Mahout
• Use Case 26 Deep Learning. High performance distributed GPU
(optimized collectives) with Python front end (planned)
• Variant of Use Case 26, 27 Image classification using Kmeans:
Iterative MapReduce
• Use Case 28 Twitter with optimized index for Hbase, Hadoop and
Iterative MapReduce
• Use Case 30 Network Science. MPI and Giraph for network
structure and dynamics (planned)
• Use Case 39 Particle Physics. Iterative MapReduce (wrote
proposal)
• Use Case 43 Radar Image Analysis. Hadoop for multiple individual
images moving to Iterative MapReduce for global integration over
“all” images
• Use Case 44 Radar Images. Running on Amazon
Features of Harp Hadoop Plug in
• Hadoop Plugin (on Hadoop 1.2.1 and Hadoop
2.2.0)
• Hierarchical data abstraction on arrays, key-values
and graphs for easy programming expressiveness.
• Collective communication model to support
various communication operations on the data
abstractions.
• Caching with buffer management for memory
allocation required from computation and
communication
• BSP style parallelism
• Fault tolerance with check-pointing
Architecture
Application
MapReduce Applications
Map-Collective Applications
Harp
Framework
MapReduce V2
Resource Manager
YARN
Performance on Madrid Cluster (8
nodes)
K-Means Clustering Harp v.s. Hadoop on Madrid Increasing
1600
Identical Computation
1400
Communication
Execution Time (s)
1200
1000
800
600
400
200
0
100m 500
10m 5k
1m 50k
Problem Size
Hadoop 24 cores
Harp 24 cores
Hadoop 48 cores
Harp 48 cores
Hadoop 96 cores
Harp 96 cores
Note compute same in each case as product of centers times points identical
Mahout and Hadoop MR – Slow due to MapReduce
Python slow as Scripting
Spark Iterative MapReduce, non optimal communication
Harp Hadoop plug in with ~MPI collectives
Increasing
MPI fastest as C not Java Identical Computation
Communication
Performance of MPI Kernel Operations
10000
MPI.NET C# in Tempest
FastMPJ Java in FG
OMPI-nightly Java FG
OMPI-trunk Java FG
OMPI-trunk C FG
MPI.NET C# in Tempest
FastMPJ Java in FG
OMPI-nightly Java FG
OMPI-trunk Java FG
OMPI-trunk C FG
5000
Performance of MPI send and receive operations
10000
4MB
1MB
256KB
64KB
16KB
4KB
1KB
64B
16B
256B
Message size (bytes)
Performance of MPI allreduce operation
1000000
OMPI-trunk C Madrid
OMPI-trunk Java Madrid
OMPI-trunk C FG
OMPI-trunk Java FG
1000
5
4B
Average time (us)
512KB
128KB
32KB
8KB
2KB
512B
Message size (bytes)
128B
32B
8B
2B
1
0B
Average time (us)
100
OMPI-trunk C Madrid
OMPI-trunk Java Madrid
OMPI-trunk C FG
OMPI-trunk Java FG
10000
Performance of MPI send and receive on
Infiniband and Ethernet
Message Size (bytes)
4MB
1MB
256KB
64KB
16KB
4KB
1KB
256B
64B
1
16B
512KB
128KB
Message Size (bytes)
32KB
8KB
2KB
512B
128B
32B
8B
2B
0B
1
100
4B
10
Average Time (us)
Average Time (us)
100
Performance of MPI allreduce on Infiniband
and Ethernet
Pure Java as
in FastMPJ
slower than
Java
interfacing
to C version
of MPI
Use case 28: Truthy: Information diffusion research from Twitter Data
• Building blocks:
–
–
–
–
–
Yarn
Parallel query evaluation using Hadoop MapReduce
Related hashtag mining algorithm using Hadoop MapReduce:
Meme daily frequency generation using MapReduce over index tables
Parallel force-directed graph layout algorithm using Twister (Harp) iterative MapReduce
Use case 28: Truthy: Information diffusion research from
Twitter Data
Two months’ data loading
for varied cluster size
Scalability of iterative graph
layout algorithm on Twister
Hadoop-FS
not indexed
Pig Performance
Hadoop
Harp-Hadoop
Pig +HD1 (Hadoop)
Pig + Yarn
Different Kmeans Implementation
Total execution time vs. mapper number
2000
Total execution time (s)
1800
1600
1400
1200
1000
800
600
400
200
0
24
48
96
number of mappers
Hadoop 100m,500
Harp 100m,500
Pig HD1 100m,500
Pig Yarn 100m,500
Hadoop 10m,5000
Harp 10m,5000
Pig HD1 10m,5000
Pig Yarn 10m,5000
Hadoop 1m,50000
Harp 1m,50000
Pig HD1 1m,50000
Pig Yarn 1m,50000
Lines of Code
Java
Pig
Python / Bash
Total Lines
Pig Kmeans
Hadoop Kmeans
~345
10
~40
395
780
0
0
780
Pig IndexedHBase
meme-cooccurcount
152
10
0
162
IndexedHBase
meme-cooccurcount
~434
0
28
462
DACIDR for Gene Analysis (Use Case 19,20)
• Deterministic Annealing Clustering and Interpolative
Dimension Reduction Method (DACIDR)
• Use Hadoop for pleasingly parallel applications, and
Twister (replacing by Yarn) for iterative MapReduce
applications
• Sequences – Cluster  Centers
• Add Existing data and find Phylogenetic Tree
Pairwise Clustering
All-Pair
Sequence
Alignment
Streaming
Multidimensional
Scaling
Simplified Flow Chart of DACIDR
Visualization
Summarize a million Fungi Sequences
Spherical Phylogram Visualization
Spherical Phylogram from new MDS
method visualized in PlotViz
RAxML result visualized in FigTree.
Lessons / Insights
• Integrate (don’t compete) HPC with “Commodity Big
data” (Google to Amazon to Enterprise data Analytics)
– i.e. improve Mahout; don’t compete with it
– Use Hadoop plug-ins rather than replacing Hadoop
– Enhanced Apache Big Data Stack HPC-ABDS has 120
members – please improve!
• HPC-ABDS+ Integration areas include
–
–
–
–
–
–
–
file systems,
cluster resource management,
file and object data management,
inter process and thread communication,
analytics libraries,
Workflow
monitoring
Download