Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou Outline ► Motivations ► Map-Reduce Framework ► Large-scale Multimedia Processing Parallelization ► Machine Learning Algorithm Transformation ► Map-Reduce Drawbacks and Variants ► Conclusions Motivations ► Why we need Parallelization? “Time is Money” ►Simultaneously ►Divide-and-conquer Data is too huge to handle ►1 trillion (10^12) unique URLs in 2008 ►CPU speed limitation Motivations ► Why we need Parallelization? Increasing Data ►Social Networks ►Scalability! “Brute Force” ►No approximations ►Cheap clusters v.s. expensive computers Motivations ► Why we choose Map-Reduce? Popular ►A parallelization framework Google proposed and Google uses it everyday ►Yahoo and Amazon also involve in Popular Good? ►“Hides” parallelization details from users ►Provides high-level operations that suit for majority algorithms Good start on deeper parallelization researches Map-Reduce Framework ► Simple idea inspired by function language (like LISP) map ►a type of iteration in which a function is successively applied to each element of one sequence reduce ►a function combines all the elements of a sequence using a binary operation Map-Reduce Framework ► Data representation <key,value> map generates <key,value> pairs reduce combines <key,value> pairs according to same key ► “Hello, world!” Example Map-Reduce Framework data split0 map reduce split1 map reduce split2 map reduce reduce output Map-Reduce Framework ► Count the appearances of each different word in a set of documents void map (Document) for each word in Document generate <word,1> void reduce (word,CountList) int count = 0; for each number in CountList count += number generate <word,count> Map-Reduce Framework ► Different Implementations Distributed computing ►each computer acts as a computing node ►focusing on reliability over distributed computer networks ►Google’s clusters closed source GFS: distributed file system ►Hadoop open source HDFS: hadoop distributed file system Map-Reduce Framework ► Different Implementations Multi-Core computing ► each core acts as a computing node ► focusing on high speed computing using large shared memories ► Phoenix++ a two dimensional <key,value> table stored in the memory where map and reduce read and write pairs open source created by Stanford ► GPU 10x higher memory bandwidth than a CPU 5x to 32x speedups on SVM training Large-scale Multimedia Processing Parallelization ► Clustering k-means Spectral Clustering ► Classifiers training SVM ► Feature extraction and indexing Bag-of-Features Text Inverted Indexing Clustering ► k-means Basic and fundamental Original Algorithm 1. 2. Pick k initial center points Iterate until converge 1. Assign each point with the nearest center 2. Calculate new centers Easy to parallel! Clustering k-means ► a shared file contains center points map 1. 2. for each point, find the nearest center generate <key,value> pair key: center id value: current point’s coordinate reduce 1. collect all points belonging to the same cluster (they have the same key value) 2. calculate the average new center iterate Clustering ► Spectral Clustering L I D 1 / 2 SD 1 / 2 S is huge: 10^6 points (double) need 8TB Sparse It! ► Retain only S_ij where j is among the t nearest neighbors of i ► Locality Sensitive Hashing? It’s an approximation ► We can calculate directly Parallel Clustering ► Spectral Clustering Calculate distance matrix ►map creates <key,value> so that every n/p points have the same key p is the number of node in the computer cluster ►reduce collect points with same key so that the data is split into p parts and each part is stored in each node ►for each point in the whole data set, on each node, find t nearest neighbors Clustering ► Spectral Clustering Symmetry ►x_j in t-nearest-neighbor set of x_i ≠ x_i in t-nearestneighbor set of x_j ►map for each nonzero element, generates two <key,value> first: key is row ID; value is column ID and distance second: key is column ID; value is row ID and distance ►reduce uses key as row ID and fills columns specified by column ID in value Classification ► SVM yi ( wT xi b) / 2 2 || w || arg min ( w) wT w arg max ( ) i 1 / 2 i j yi y j K ( xi , x j ) Classification ► SVM SMO instead of solving all alpha together coordinate ascent ► pick one alpha, fix others ► optimize alpha_i Classification ► SVM SMO But we cannot optimize only one alpha for SVM We need to optimize two alpha each iteration arg max ( ) i 1 / 2 i j yi y j K ( xi , x j ) 0 i C , i yi 0 n 1 y1 i yi i 2 Classification ► SVM repeat until converge: ►map given two alpha, updating the optimization information ►reduce find the two maximally violating alpha Feature Extraction and Indexing ► Bag-of-Features features feature clusters histogram feature extraction ►map takes images in and outputs features directly feature clustering ►clustering algorithms, like k-means Feature Extraction and Indexing ► Bag-of-Features feature quantization histogram ►map for each feature on one image, find the nearest feature cluster generates <imageID,clusterID> ►reduce <imageID,cluster0,cluster1…> for each feature cluster, updating the histogram generates <imageID,histogram> Feature Extraction and Indexing ► Text Inverted Indexing Inverted index of a term ►a document list containing the term ► each item in the document list stores statistical information frequency, position, field information map ► for each term in one document, generates <term,docID> reduce ►<term,doc0,doc1,doc2…> ► for each document, update statistical information for that term ► generates <term,list> Machine Learning Algorithm Transformation ► How can we know whether an algorithm can be transformed into a Map-Reduce fashion? if so, how to do that? ► Statistical Query and Summation Form All we want is to estimate or inference ►cluster id, labels… From sufficient statistics ►distances between points ►points positions statistic computation can be divided Machine Learning Algorithm Transformation ► Linear Regression T arg min Error (Y Xb) (Y Xb) reduce map ( X T X )1 X TY T ( x x i i) ( x y ) Summation Form i i Machine Learning Algorithm Transformation ► Naïve Bayesian vMAP arg max v jV P(v j | a1...an ) vMAP arg max v jV [ P(a1...an | v j ) P(v j ) | P(a1...an )] vMAP arg max v jV [ P(a1...an | v j ) P(v j )] vNB arg max v jV P(v j ) P(ai | v j ) map reduce Machine Learning Algorithm Transformation ► Solution Find statistics calculation part Distribute calculations on data using map Gather and refine all statistics in reduce Map-Reduce Systems Drawbacks ► Batch based system “pull” model ►reduce must wait for un-finished map ►reduce “pull” data from map no iteration support directly ► Focusing too much on distributed system and failure tolerance local computing cluster may not need them Map-Reduce Systems Drawbacks ► Focusing too much on distributed system and failure tolerance Map-Reduce Variants ► Map-Reduce online “push” model ►map “pushes” data to reduce reduce can also “push” results to map from the next job build a pipeline ► Iterative Map-Reduce higher level schedulers schedule the whole iteration process Map-Reduce Variants ► Series Map-Reduce? Map-Reduce? MPI? Condor? Multi-Core Map-Reduce Multi-Core Map-Reduce Multi-Core Map-Reduce Multi-Core Map-Reduce Conclusions ► Good parallelization framework Schedule jobs automatically Failure tolerance Distributed computing supported High level abstraction ►easy ► Too to port algorithms on it “industry” why we need a large distributed system? why we need too much data safety? References [1] Map-Reduce for Machine Learning on Multicore [2] A Map Reduce Framework for Programming Graphics Processors [3] Mapreduce Distributed Computing for Machine Learning [4] Evaluating mapreduce for multi-core and multiprocessor systems [5] Phoenix Rebirth: Scalable MapReduce on a Large-Scale Shared-Memory System [6] Phoenix++: Modular MapReduce for Shared-Memory Systems [7] Web-scale computer vision using MapReduce for multimedia data mining [8] MapReduce indexing strategies: Studying scalability and efficiency [9] Batch Text Similarity Search with MapReduce [10] Twister: A Runtime for Iterative MapReduce [11] MapReduce Online [12] Fast Training of Support Vector Machines Using Sequential Minimal Optimization [13] Social Content Matching in MapReduce [14] Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce [15] Parallel Spectral Clustering in Distributed Systems Thanks Q&A