How Microsoft had made deep learning red-hot in IT

How Microsoft Had Made Deep Learning Red-Hot in IT Industry Zhijie Yan, Microsoft Research Asia USTC visit, May 6, 2014 Self Introduction  @MSRA鄢志杰  996 – studied in USTC from 1999 to 2008  Graduate student – studied in iFlytek speech lab from 2003 to 2008, supervised by Prof. Renhua Wang  Intern – worked in MSR Asia from 2005 to 2006  Visiting scholar – visited Georgia Tech in 2007  FTE – worked in MSR Asia since 2008  Research interests  Speech, deep learning, large-scale machine learning In Today’s Talk  Deep learning becomes very hot in the past few years  How Microsoft had made deep learning hot in IT industry  Deep learning basics  Why Microsoft can turn all these ideas into reality  Further reading materials How Hot is Deep Learning  “This announcement comes on the heels of a $600,000 gift Google awarded Professor Hinton’s research group to support further work in the area of neural nets.” – U. of T. website How Hot is Deep Learning How Hot is Deep Learning How Hot is Deep Learning How Hot is Deep Learning Microsoft Had Made Deep Learning Hot in IT Industry  Initial attempts made by University of Toronto had shown promising results using DL in speech recognition on TIMIT phone recognition task  Prof. Hinton’s student visited MSR as an intern, good results were obtained on Microsoft Bing voice search task  MSR Asia and Redmond collaborated and got amazing results on Switchboard task, which shocked the whole industry Microsoft Had Made Deep Learning Hot in IT Industry *figure borrowed from MSR principal researcher Li DENG Microsoft Had Made Deep Learning Hot in IT Industry  Followed by others and results were confirmed in various different speech recognition tasks  Google / IBM / Apple / Nuance / 百度 / 讯飞  Continuously advanced by MSR and others  Expand to solve more and more problems  Image processing  Natural language processing  Search  … Deep Learning From Speech to Image  ILSVRC-2012 competition on ImageNet  Classification task: classify an image into 1 of the 1,000 classes in your 5 bets lifeboat airliner school bus Institution Error rate (%) University of Amsterdam 29.6 XRCE/INRIA 27.1 Oxford 27.0 ISI 26.2 Deep Learning From Speech to Image  ILSVRC-2012 competition on ImageNet  Classification task: classify an image into 1 of the 1,000 classes in your 5 bets lifeboat airliner school bus Institution Error rate (%) University of Amsterdam 29.6 XRCE/INRIA 27.1 Oxford 27.0 ISI 26.2 SuperVision 16.4 Deep Learning Basics  Deep learning  deep neural networks  multi-layer perceptron (MLP) with a deep structure (many hidden layers) Output layer Output layer W1 Hidden layer W0 Input layer W3 Hidden layer W2 Hidden layer W1 Hidden layer W0 Input layer Deep Learning Basics  Sounds not new at all? Sounds familiar like you’ve learned in class?  Things not change over the years  Network topology / activation functions / …  Backpropagation (BP)  Things changed recently  Data  Big data  General-purpose computing on graphics processing units (GPGPU)  “A bag of tricks” accumulated over the years E.g. Deep Neural Network for Speech Recognition  Three key components that make DNN-HMM work Many layers of nonlinear feature transformation Tied triphones as the basis units for HMM states Long window of frames *figure borrowed from MSR senior researcher Dong YU E.g. Deep Neural Network for Image Classification  The ILSVRC-2012 winning solution *figure copied from Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks” Scale Out Deep Leaning  Training speed was a major problem of DL  Speech recognition model trained with 1,800-hour data (~650,000,000 vector frames) costs 2 weeks using 1 GPU  Image classification model trained with ~1,000,000 figures costs 1 weeks using 2 GPUs*  How to scale out if 10x, 100x training data becomes available? *Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks” DNN-GMM-HMM  Joint work with USTC-MSRA Ph.D. program student, Jian XU (许健, 0510)  The “DNN-GMM-HMM” approach for speech recognition*  DNN as hierarchical nonlinear feature extractor, trained using a sub-set of training data  GMM-HMM as acoustic model, trained using full data *Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNNderived features in GMM-HMM based acoustic modeling for LVCSR” DNN-GMM-HMM  GMM-HMM modeling of DNN-derived features: combine the best of both worlds DNNderived features PCA HLDA Tied-state WE-RDLT MMI sequence training CMLLR unsupervised adaptation Experimental Results  300hr DNN (18k states, 7 hidden layers) + 2,000hr GMM-HMM (18k states)*  Training time reduced from 2 weeks to 3-5 days Word Error Rate (%) DNN-HMM (CE) DNN-GMM-HMM (RDLT) DNN-GMM-HMM (MMI) DNN-GMM-HMM (UA) 15.4 14 14.7 10% WERR 13.8 12 15% WERR 13.1 10 *Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNNderived features in GMM-HMM based acoustic modeling for LVCSR” A New Optimization Method  Joint work with USTC-MSRA Ph.D. program student, Kai Chen (陈凯, 0700)  Using 20 GPUs, time needed to train a 1,800-hour acoustic model is cut from 2 weeks to 12 hours, without accuracy loss  The magic is to be published  We believe the scalability issue in DNN training for speech recognition is now solved! Why Microsoft Can Do All These Good Things  Research  Bridge the gap between academia and industry via our intern and visiting scholar programs  Scale out from toy problems to real-world industry-scale applications  Product team  Solve practical issues and deploy technologies to serve users worldwide via our services  All together  We continuously improve our work towards larger scale, higher accuracy, and to tackle more challenging tasks  Finally  We have big-data + world-leading computational infrastructure If You Want to Know More About Deep Learning  Neural networks for machine learning: https://class.coursera.org/neuralnets-2012-001  Prof. Hinton’s homepage: http://www.cs.toronto.edu/~hinton/  DeepLearning.net: http://deeplearning.net/  Open-source  Kaldi (speech): http://kaldi.sourceforge.net/  cuda-convent (image): http://code.google.com/p/cuda-convnet/ Thanks!

How Microsoft had made deep learning red-hot in IT

Related documents

Products

Support

How Microsoft had made deep learning red-hot in IT

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib