Applying Twister for Scientific Applications NSF Cloud PI Workshop March 17, 2011 Judy Qiu http://salsahpc.indiana.edu School of Informatics and Computing Indiana University SALSA Twister v0.9 March 15, 2011 New Infrastructure for Iterative MapReduce Programming SALSA Group • • • • • • Auto generation of partition files and configureMaps Auto configuration to setup Twister environment automatically on a cluster Concurrent file loading in Mapper configuration phase and file loading balancing Performance improvement (e.g. JVM Tuning) Scalability Bingjing Zhang, Yang Ruan, Tak-Lon Wu, Judy Qiu, Adam Hughes, Geoffrey Fox, Applying Twister to Scientific Applications, Proceedings of IEEE CloudCom 2010 Conference, Indianapolis, November 30-December 3, 2010 K-Means Clustering map map reduce Compute the distance to each data point from each cluster center and assign points to cluster centers Time for 20 iterations Compute new cluster centers User program Compute new cluster centers • Iteratively refining operation • Typical MapReduce runtimes incur extremely high overheads – New maps/reducers/vertices in every iteration – File system based communication • Long running tasks and faster communication in Twister enables it to perform close to MPI Motivation Data Deluge Experiencing in many domains MapReduce Classic Parallel Runtimes (MPI) Data Centered, QoS Efficient and Proven techniques Expand the Applicability of MapReduce to more classes of Applications Iterative MapReduce Map-Only Input map Output MapReduce More Extensions iterations Input map Input map reduce Pij reduce SALSA A Programming Model for Iterative MapReduce • Distributed data access • In-memory MapReduce • Distinction on static data and variable data (data flow vs. δ flow) • Cacheable map/reduce tasks (long running tasks) • Combine operation • Support fast intermediate data transfers Static data Configure() Iterate User Program Map(Key, Value) δ flow Reduce (Key, List<Value>) Combine (Map<Key,Value>) Close() SALSA Cloud Technologies and Their Applications SaaS Applications/ Workflow Data Mining Services in the Cloud Smith Waterman Dissimilarities, PhyloD Using DryadLINQ, Clustering, Multidimensional Scaling, Generative Topological Mapping, etc Higher Level Languages Apache PigLatin/Microsoft DryadLINQ/Google Sawzall Cloud Platform Cloud Infrastructure Apache Hadoop / Twister Microsoft Dryad / Twister Nimbus, Eucalyptus, OpenStack, OpenNebula Linux Virtual Machines Linux Virtual Machines Windows Virtual Machines Hypervisor/ Virtualization Xen, KVM Hardware Bare-metal Nodes Windows Virtual Machines MPI & Iterative MapReduce papers • • • • • • • MapReduce on MPI Torsten Hoefler, Andrew Lumsdaine and Jack Dongarra, Towards Efficient MapReduce Using MPI, Recent Advances in Parallel Virtual Machine and Message Passing Interface Lecture Notes in Computer Science, 2009, Volume 5759/2009, 240-249 MPI with generalized MapReduce Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, Geoffrey Fox Twister: A Runtime for Iterative MapReduce, Proceedings of the First International Workshop on MapReduce and its Applications of ACM HPDC 2010 conference, Chicago, Illinois, June 20-25, 2010 Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski Pregel: A System for Large-Scale Graph Processing, Proceedings of the 2010 international conference on Management of data Indianapolis, Indiana, USA Pages: 135-146 2010 Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst HaLoop: Efficient Iterative Data Processing on Large Clusters, Proceedings of the VLDB Endowment, Vol. 3, No. 1, The 36th International Conference on Very Large Data Bases, September 1317, 2010, Singapore. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica Spark: Cluster Computing with Working Sets poster at http://radlab.cs.berkeley.edu/w/upload/9/9c/Spark-retreat-poster-s10.pdf Russel Power, Jinyang Li, Piccolo: Building Fast, Distributed Programs with Partitioned Tables, OSDI 2010, Vancouver, BC, Canada SALSA Features of Existing Architectures Google, Apache Hadoop, Sector/Sphere, Dryad/DryadLINQ (DAG based) • Programming Model (SPMD) – MapReduce (Optionally “map-only”) – Focus on Single Step MapReduce computations (DryadLINQ supports more than one stage) • Input and Output Handling – Distributed data access (HDFS in Hadoop, Sector in Sphere, and shared directories in Dryad) – Outputs normally goes to the distributed file systems • Intermediate data – Transferred via file systems (Local disk-> HTTP -> local disk in Hadoop) – Easy to support fault tolerance – Considerably high latencies • Fault Tolerance SALSA Twister Architecture Main program Broker Network Twister Driver B B B 4 Broker Connection Receive static data (1) OR Variable data (key,value) via the brokers (2) 2 1 M Read static data from local disk Twister Daemon Twister Daemon 1 Local 3 Map output goes directly to reducer R 4 Reduce output goes to local disk OR to Combiner Local • Scripts for file manipulations • Twister daemon is a process, but Map/Reduce tasks are Java Threads (Hybrid approach) SALSA Twister Programming Model Worker Nodes configureMaps(..) Local Disk configureReduce(..) Cacheable map/reduce tasks while(condition){ runMapReduce(..) May send <Key,Value> pairs directly Iterations Map() Reduce() Combine() operation updateCondition() } //end while close() User program’s process space Communications/data transfers via the pub-sub broker network Two configuration options : 1. Using local disks (only for maps) 2. Using pub-sub bus SALSA Twister API 1.configureMaps(PartitionFile partitionFile) 2.configureMaps(Value[] values) 3.configureReduce(Value[] values) 4.runMapReduce() 5.runMapReduce(KeyValue[] keyValues) 6.runMapReduceBCast(Value value) 7.map(MapOutputCollector collector, Key key, Value val) 8.reduce(ReduceOutputCollector collector, Key key,List<Value> values) 9.combine(Map<Key, Value> keyValues) SALSA Pagerank – An Iterative MapReduce Algorithm Partial Adjacency Matrix Current Page ranks (Compressed) M Partial Updates R Iterations C Partially merged Updates • Well-known pagerank algorithm [1] • Used ClueWeb09 [2] (1TB in size) from CMU • Reuse of map tasks and faster communication pays off [1] Pagerank Algorithm, http://en.wikipedia.org/wiki/PageRank [2] ClueWeb09 Data Set, http://boston.lti.cs.cmu.edu/Data/clueweb09/ SALSA Overhead OpenMPI v Twister negative overhead due to cache http://futuregrid.org 13 Dimension Reduction Algorithms • Multidimensional Scaling (MDS) [1] • Generative Topographic Mapping (GTM) [2] o Given the proximity information among points. o Optimization problem to find mapping in target dimension of the given data based on pairwise proximity information while minimize the objective function. o Objective functions: STRESS (1) or SSTRESS (2) o Find optimal K-representations for the given data (in 3D), known as K-cluster problem (NP-hard) o Original algorithm use EM method for optimization o Deterministic Annealing algorithm can be used for finding a global solution o Objective functions is to maximize loglikelihood: o Only needs pairwise distances ij between original points (typically not Euclidean) o dij(X) is Euclidean distance between mapped (3D) points [1] I. Borg and P. J. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer, New York, NY, U.S.A., 2005. [2] C. Bishop, M. Svens´en, and C. Williams. GTM: The generative topographic mapping. Neural computation, 10(1):215–234, 1998. SALSA Multi-dimensional Scaling (EM) While(condition) { <X> = [A] [B] <C> C = CalcStress(<X>) } While(condition) { <T> = MapReduce1([B],<C>) <X> = MapReduce2([A],<T>) C = MapReduce3(<X>) } • Maps high dimensional data to lower dimensions (typically 2D or 3D) • SMACOF (Scaling by Majorizing of COmplicated Function)[1] [1] J. de Leeuw, "Applications of convex analysis to multidimensional scaling," Recent Developments in Statistics, pp. 133-145, 1977. SALSA Next Generation Sequencing Pipeline on Cloud MapReduce Pairwise clustering FASTA File N Sequences Blast block Pairings Pairwise Distance Calculation Dissimilarity Matrix N(N-1)/2 values 1 2 Clustering MPI 3MDS Visualization Visualization Plotviz Plotviz 4 5 4 • Users submit their jobs to the pipeline and the results will be shown in a visualization tool. • This chart illustrate a hybrid model with MapReduce and MPI. Twister will be an unified solution for the pipeline mode. • The components are services and so is the whole pipeline. • We could research on which stages of pipeline services are suitable for private or commercial Clouds. 16 Scale-up Sequence Clustering Model with Twister Gene Sequences (N = 1 Million) O(N2) Select Reference Pairwise Alignment & Distance Calculation Reference Sequence Set (M = 100K) Distance Matrix Reference Coordinates N-M Sequence Set (900K) Interpolative MDS with Pairwise Distance Calculation x, y, z MultiDimensional Scaling (MDS) O(N2) N-M Coordinates x, y, z Visualization O(N2) 3D Plot SALSA Current Sequence Clustering Model with MPI * MPI.NET Implementation Smith-Waterman / Needleman-Wunsch with Kimura2 / Jukes-Cantor / Percent-Identity C# Desktop Application based on VTK Pairwise Clustering Cluster Indices Gene Sequences Pairwise Alignment & Distance Calculation Visualization Distance Matrix 3D Plot Coordinates MultiDimensional Scaling Chi-Square / Deterministic Annealing MPI.NET Implementation MPI.NET Implementation * Note. The implementations of Smith-Waterman and Needleman-Wunsch algorithms are from Microsoft Biology Foundation library SALSA Twister MDS Interpolation Performance Test SALSA 300+ Students learning about Twister & Hadoop MapReduce technologies, supported by FutureGrid. July 26-30, 2010 NCSA Summer School Workshop http://salsahpc.indiana.edu/tutorial Washington University University of Minnesota Iowa IBM Almaden Research Center University of California at Los Angeles San Diego Supercomputer Center Michigan State Univ.Illinois at Chicago Notre Dame Johns Hopkins Penn State Indiana University University of Texas at El Paso University of Arkansas University of Florida SALSA http://salsahpc.indiana.edu/b534/ 22 SALSA 23 SALSA Summary MapReduce and MPI are SPMD programming model Twister extends the MapReduce to iterative algorithms Dataming in the Cloud (Data Analysis in the Cloud) Several iterative algorithms we have implemented K-Means Clustering Pagerank Matrix Multiplication Breadth First Search Multi Dimensional Scaling (MDS) Integrating a distributed file system Integrating with a high performance messaging system Programming with side effects yet support fault tolerance MapReduceRoles4Azure Several iterative algorithms we have implemented Will have prototype Twister4Azure by May 2011 SALSA Twister for Azure Scheduling Queue Worker Role Job Bulleting Board MapID ……. Status Map Workers Map 1 Map 2 Map n Reduce Workers Red 1 Map Task Table MapID ……. Status Red 2 Red n In Memory Data Cache Task Monitoring Role Monitoring 26 SALSA Sequence Assembly Performance SALSA Acknowledgements to: SALSA HPC Group Indiana University http://salsahpc.indiana.edu SALSA