SINGA: Putting Deep Learning into the Hands of Multimedia Users SINGA: Putting Deep Learning into the Hands of Multimedia Users http://singa.apache.org/ Wei Wang, Gang Chen, Tien Tuan Anh Dinh, Jinyang Gao, Beng Chin Ooi, Kian-Lee Tan, and Sheng Wang 1 SINGA: Putting Deep Learning into the Hands of Multimedia Users • Introduction • Multimedia data and application • Motivations • Deep learning models and training, and design principles • SINGA • Usability • Scalability • Implementation • Experiment 2 SINGA: Putting Deep Learning into the Hands of Multimedia Users Introduction VocallIQ (acquired by Apple) Audio Social Media Madbits (acquired by Twitter) Perceptio (acquired by Apple) E-commerce Multimedia Data Image/video LookFlow (acquired by Yahoo! Flickr) Deepomatic (e-commerce product search) Descartes Labs (satellite images) Health-care Text Clarifai (tagging) Ldibon ParallelDots Deep Learning has been noted for its effectiveness for multimedia applications! AlchemyAPI (acquired by IBM) Semantria (NLP tasks >10 languages) 3 SINGA: Putting Deep Learning into the Hands of Multimedia Users Motivations Model Categories Feedforward Models CNN, MLP, Auto-encoder Image/video classification CNN Krizhevsky, Sutskever, and Hinton, 2012; Szegedy et al., 2014; Simonyan and Zisserman, 2014a 4 SINGA: Putting Deep Learning into the Hands of Multimedia Users Motivations Model Categories Feedforward Models CNN, MLP, Auto-encoder Image/video classification DBN, RBM, DBM Speech recognition Energy models DBN RBM Dahl et al., 2012 5 SINGA: Putting Deep Learning into the Hands of Multimedia Users Motivations Model Categories Feedforward Models Energy models Recurrent Neural Networks CNN, MLP, Auto-encoder Image/video classification DBN, RBM, DBM Speech recognition RNN, LSTM, GRU Natural language processing 6 Mikolov et al., 2010; Cho et al., 2014 SINGA: Putting Deep Learning into the Hands of Multimedia Users Motivations Model Categories Feedforward Models Energy models CNN, MLP, Auto-encoder Image/video classification DBN, RBM, DBM Speech recognition RecurrentDesign Goal I Neural Usability: easy to implement various Networks models RNN, LSTM, GRU Natural language processing 7 SINGA: Putting Deep Learning into the Hands of Multimedia Users Motivations: Training Process • Training process • Update model parameters to minimize prediction error • Training algorithm • Mini-batch Stochastic Gradient Descent (SGD) Back-propagation (BP) Contrastive Divergence (CD) • Training time • (time per SGD iteration) x (number of SGD iterations) • Long time to train large models over large datasets, e.g., 2 weeks for training Overfeat (Pierre, et al.) reported by Intel (https://software.intel.com/sites/default/files/managed/74/15/SPCS008.pdf). 8 SINGA: Putting Deep Learning into the Hands of Multimedia Users Motivations: Distributed Training Frameworks • Synchronous training (Google Sandblaster, Dean et al., 2012; Baidu AllReduce, Wu et al., 2015) • Reduce time per iteration • Scalable for single-node with multiple GPUs • Cannot scale to large cluster Design Goal II Scalability: not just flexible, but also efficient and adaptive to run different training frameworks • Asynchronous training (Google Downpour, Dean et al., 2012, Hogwild!, Recht et al., 2011) • Reduce number of iterations per machine • Scalable for big cluster with commodity machine(CPU) • Not stable • Hybrid frameworks 9 SINGA: Putting Deep Learning into the Hands of Multimedia Users SINGA: A Distributed Deep Learning Platform 10 SINGA: Putting Deep Learning into the Hands of Multimedia Users Usability: Abstraction NeuralNet stop Layer class Layer { TrainOneBatch vector<Blob> data, grad; vector<Param*> param; ... void Setup(LayerProto& conf, vector<Layer*> src); void ComputeFeature(int flag, vector<Layer*> src); void ComputeGradient(int flag, vector<Layer*> src); }; Driver::RegisterLayer<FooLayer>("Foo"); // register new layers Input layers Output layers load raw data (and label) output feature (and prediction results) Neuron layers transform features, e.g., convolution and pooling Loss layers measure training loss, e.g., cross-entropy loss Connection layers connect layers due to neural net partition 11 SINGA: Putting Deep Learning into the Hands of Multimedia Users Usability: Neural Net Representation NeuralNet stop Loss labels Hidden Layer TrainOneBatch Input Feedforward models (e.g., CNN) RNN RBM 12 SINGA: Putting Deep Learning into the Hands of Multimedia Users Usability: TrainOneBatch NeuralNet stop Loss labels Hidden Layer TrainOneBatch Input Feedforward models (e.g., CNN) Back-propagation (BP) Contrastive Divergence (CD) RNN RBM Just need to override the TrainOneBatch function to implement other algorithms! 13 SINGA: Putting Deep Learning into the Hands of Multimedia Users Scalability: Partitioning for Distributed Training 1 NeuralNet Partitioning: 1. Partition layers into different subsets 2. Partition each singe layer on batch dimension. 3. Partition each singe layer on feature dimension. 4. Hybrid partitioning strategy of 1, 2 and 3. 2 Worker 1 Worker 2 3 Users just need to CONFIGURE the partitioning scheme and Worker Worker 1takes care Worker 2 Worker 1 connect SINGA of 2the realWorker work1(eg. slice and layers) 14 SINGA: Putting Deep Learning into the Hands of Multimedia Users Scalability: Training Framework Cluster Topology Parameters Legends: Worker Server Server Server Server Server Group Node Neural Net Group Inter-node Communication Worker Worker Worker Synchronous training cannot scale to large group size 15 SINGA: Putting Deep Learning into the Hands of Multimedia Users Scalability: Training Framework Cluster Topology Legends: Worker Server Node Group Inter-node Communication Communication is the bottleneck! 16 SINGA: Putting Deep Learning into the Hands of Multimedia Users Scalability: Training Framework Cluster Topology Legends: Worker Server Node Group Inter-node Communication sync async SINGA is able to configure most known frameworks. (a) Sandblaster (b) AllReduce (c) Downpour 17 (d) Distributed Hogwild SINGA: Putting Deep Learning into the Hands of Multimedia Users Implementation SINGA Software Stack CNN RBM RNN Driver::Train() Legend: Stub::Run() Worker Stub Main Thread HDFS Remote Nodes SINGA Component Optional Component While(not stop): Server::Update() Zookeeper Worker thread Driver Mesos While(not stop): Worker::TrainOneBatch() Server Server thread DiskFile Docker Ubuntu CentOS MacOS 18 SINGA: Putting Deep Learning into the Hands of Multimedia Users Deep learning as a Service (DLaaS) Third party APPs Developers (Web app, Mobile,..) (Browser) ---------------------API ---------------------GUI http request http request Rafiki Server User, Job, Model, Node Management Routing(Load balancing) Data Base http request http request File Storage System (e.g. HDFS) Rafiki Agent Rafiki Agent Timon Timon Timon Timon (c++ wrapper) … (c++ wrapper) … (c++ wrapper) … (c++ wrapper) the SINGA Usability of SINGA; SINGA SINGA SINGA 1. To improve 2. To “level” the playing field by taking care of complex system plumbing work, its reliability, efficiency and scalability. SINGA’s RAFIKI 19 SINGA: Putting Deep Learning into the Hands of Multimedia Users Comparison: Features of the Systems Deep Learning Models Distributed Training Frameworks Hardware Cloud Software Binding MXNet on 28/09/15 Feature SINGA Caffe CXXNET cuda-convnet H2O Feed-forward (CNN) ✔ ✔ ✔ ✔ MLP Energy model (RBM) ✔ x x x x Recurrent networks (RNN) ✔ ✔ x x x Synchronous ✔ ✔ ✔ ✔ ✔ Asynchronous ✔ ✔ x x x Hybrid ✔ x x x x CPU ✔ ✔ ✔ x ✔ GPU V0.2.0 ✔ ✔ ✔ x HDFS ✔ x x x ✔ Resource management ✔ x x x ✔ Virtualization ✔ x x x ✔ Python (P), Matlab(M), R ongoing (P) P+M P P P+R Comparison with other open source projects 20 SINGA: Putting Deep Learning into the Hands of Multimedia Users Experiment --- Usability • Used SINGA to train three known models and verify the results Hinton, G. E. and Salakhutdinov, R. R. (2006) Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006. … RBM Deep Auto-Encoders 21 SINGA: Putting Deep Learning into the Hands of Multimedia Users Experiment --- Usability W. Wang, X. Yang, B. C. Ooi, D. Zhang, Y. Zhuang: Effective Deep Learning Based MultiModal Retrieval. VLDB Journal - Special issue of VLDB'14 best papers, 2015. W. Wang, B.C. Ooi, X. Yang, D. Zhang, Y. Zhuang: Effective MultiModal Retrieval based on Stacked AutoEncoders. Int'l Conference on Very Large Data Bases (VLDB), 2014. Deep Multi-Model Neural Network CNN MLP 22 SINGA: Putting Deep Learning into the Hands of Multimedia Users Experiment --- Usability Mikolov Tomá, Karafiát Martin, Burget Luká, Èernocký Jan, Khudanpur Sanjeev: Recurrent neural network based language model, INTERSPEECH 2010), Makuhari, Chiba, JP Comparison of SINGA and RNNLM toolkit PPL (Perpllexity) value PPL in SINGA PPL in RNNLM 160 140 120 100 80 60 40 20 0 0 1 2 3 4 5 6 7 8 No. of epochs 23 SINGA: Putting Deep Learning into the Hands of Multimedia Users Experiment --- Efficiency and Scalability Train DCNN over CIFAR10: https://code.google.com/p/cuda-convnet Single Node Cluster 4 NUMA nodes (Intel Xeon 7540, 2.0GHz) Each node has 6 cores hyper-threading enabled 500 GB memory Quad-core Intel Xeon 3.1 GHz CPU and 8GB memory, 1Gbps switch 32 nodes, 4 workers per node Caffe, GTX 970 Synchronous 24 SINGA: Putting Deep Learning into the Hands of Multimedia Users Experiment --- Scalability Train DCNN over CIFAR10: https://code.google.com/p/cuda-convnet Single Node Cluster Caffe SINGA Asynchronous 25 SINGA: Putting Deep Learning into the Hands of Multimedia Users Conclusions • Programming Model, Abstraction, and System Architecture • Easy to implement different models • Flexible and efficient to run different frameworks • Experiments • Train models from different categories • Scalability test for different training frameworks • SINGA • Usable, extensible, efficient and scalable • Apache SINGA v0.1.0 has been released • V0.2.0 (with GPU-CPU, DLaaS, more features) out next month • Being used for healthcare analytics, product search, … 26 SINGA: Putting Deep Learning into the Hands of Multimedia Users 27