PRLab TUDelft NL What is the source of this magic? Deep net has more parameters than the shallow net; Deep net is deeper than the shallow net; Nets without convolution can’t learn what nets with convolution can learn; Current learning algorithms and regularization procedures work better with deep architectures than with shallow architectures; All or some of the above; None of the above? PRLab TUDelft NL Do Deep Nets Really Need to be Deep Lei Jimmy Ba University of Toronto Rich Caruana Microsoft Research arXiv:1312.6184 PRLab TUDelft NL