DeepLearningTutorial_Icare2014

Deep Learning Tutorial

Mitesh M. Khapra

IBM Research India

(Ideas and material borrowed from

Richard Socher’s tutorial @ ML Summer School 2014

Yoshua Bengio’s tutorial @ ML Summer School 2014

& Hugo Larochelle’s lecture videos & slides)

• What?

• Why?

• How?

• Where?

Roadmap

2

• What are Deep Neural Networks?

• Why?

• How?

• Where?

Roadmap

3

Roadmap


• Why should I be interested in Deep Learning?

• How?

• Where?

4

Roadmap



• How do I make a Deep Neural Network work?

• Where?

5

Roadmap



• How do I make train a Deep Neural Network work?

• Where?

6

Roadmap



• How do I train a Deep Neural Network?

• Where can I find additional material?

7

the what?

8

A typical machine learning example feature extraction number of positive words, number of negative words, length of review, author name, bag of words, etc.

feature vector data label 𝑥

1

= 1, 0, 0, 1, 0, 1 , 𝑦

1

= 1 𝑥

2

= 0, 0, 1, 1, 0, 1 , 𝑦

2

= 0 𝑥

3

= 1, 0, 1, 1, 0, 1 , 𝑦

3

= 1 𝑥

4

= 0, 0, 1, 0, 1, 1 , 𝑦

4

= 0

9

next

A typical machine learning example data label 𝑥

1

= 1, 0, 0, 1, 0, 1 , 𝑦

1

= 1 𝑥

2

= 0, 0, 1, 1, 0, 1 , 𝑦

2

= 0 𝑥

3

= 1, 0, 1, 1, 0, 1 , 𝑦

3

= 1

𝑨𝒔𝒔𝒖𝒎𝒆 𝒚 = 𝒇 𝒘 𝒙 𝑓𝑜𝑟 𝑒𝑥𝑎𝑚𝑝𝑙𝑒, 𝑓 𝑤 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑤 𝑖 𝑥 𝑖 𝑖 𝑙𝑒𝑎𝑟𝑛 𝑤 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑓 𝑤 𝑥 𝑖 𝑖𝑠 𝑎𝑠 𝑐𝑙𝑜𝑠𝑒 𝑡𝑜 𝑦 𝑖 𝑎𝑠 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑥

4

= 0, 0, 1, 0, 1, 1 , 𝑦

4

= 0

10

So, where does deep learning fit in?

• Machine Learning

– hand crafted features

– optimize weights to improve prediction

• Representation Learning

– automatically learn features

• Deep Learning

– automatically learn multiple levels of features

From Richar Socher’s tutorial @ ML Summer School, Lisbon

11

back

The basic building block 𝑎 𝑥 = 𝑏 + 𝑤 𝑖 𝑥 𝑖 𝑖 ℎ 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏 + 𝑤 𝑖 𝑥 𝑖 𝑖 𝑥

1 𝑎 𝑥 𝑤

1 𝑤

2 𝑤

3 𝑏

1 𝑥

2 𝑥

3 𝑤 𝑖

= 𝑤𝑒𝑖𝑔ℎ𝑡 𝑏 = 𝑏𝑖𝑎𝑠 𝑎 𝑥 = 𝑝𝑟𝑒 − 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 ℎ 𝑥 = 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 single artificial neuron

𝑮𝒐𝒂𝒍: 𝑔𝑖𝑣𝑒𝑛 𝑁 𝑥, 𝑦 𝑝𝑎𝑖𝑟𝑠 𝑙𝑒𝑎𝑟𝑛 𝑤, 𝑏 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 ℎ(𝑥 𝑗

) 𝑖𝑠 𝑎𝑠 𝑐𝑙𝑜𝑠𝑒 𝑡𝑜 𝑦 𝑗 𝑎𝑠 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒

12

Okay, so what can I use it for?

• For binary classification problems by treating ℎ 𝑥 𝑎𝑠 𝑝 𝑦 = 1 𝑥

• Works when data is linearly separable 𝑥

1 𝑤

1 𝑤

2 𝑤

3 𝑥

2 𝑥

3 𝑏

1 𝑥 = 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 𝑓𝑟𝑜𝑚 𝑚𝑜𝑣𝑖𝑒 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 𝑦 = 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒\𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 ℎ 𝑥 > 0.5 𝑡ℎ𝑒𝑛 𝑒𝑙𝑠𝑒

(image from Hugo Larochelles’s slides) 𝑥

1 𝑤

1 𝑤

2 𝑤

3 𝑥

2 𝑥

3 𝑏

1 𝑥 = 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 𝑓𝑟𝑜𝑚 𝑚𝑜𝑣𝑖𝑒 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 𝑦 = 𝑚𝑎𝑙𝑒 𝑎𝑢𝑡ℎ𝑜𝑟\female 𝑎𝑢𝑡ℎ𝑜𝑟 ℎ 𝑥 > 0.5 𝑡ℎ𝑒𝑛 𝑒𝑙𝑠𝑒 13

What are its limitations?

• Fails when data is not linearly separable….

(images from Hugo Larochelles’s slides)

• …unless the input is suitably transformed 𝑥 = 𝑥

1

, 𝑥

2 𝑥 ′ = 𝐴𝑁𝐷(𝑥

1

, 𝑥

2

), 𝐴𝑁𝐷(𝑥

1

, 𝑥

2

)

14

A neural network for XOR

Wait…., are you telling me that I will always have to meditate on the data and then decide the transformation/network ?

No, definitely not. The XOR example is only to give the intuition.

The key takeaway is that by adding more layers you can make the data separable.

𝐴𝑁𝐷(𝑥

1

, 𝑥

2

) 𝐴𝑁𝐷(𝑥

2

, 𝑥

1

)

Lets spend some more time in understanding this ….

𝑥

1 𝑥

2

A multi-layered neural network

15

𝑊 (2)

𝑊 (1)

(graphs from Pascal Vincent’s slides)

Capacity of a multi-layer network 𝑧

1

−1 𝑦

1

0.5

0.7

−0.4

−1.5

𝑦

2

1 1 1

1 𝑥

1 𝑥

2

16

Capacity of a multi-layer network

(image from Pascal Vincent’s slides)

17

Capacity of a multi-layer network

In particular, we can find a separator for the XOR problem

(images from from Pascal Vincent’s slides)

Universal Approximation Theorem (Hornik, 1991) :

• “a single hidden layer neural network with a linear output unit can approximate any continuous function arbitrary well, given enough hidden units”

18

Lets take a minute here…

If “a single hidden layer neural network” is enough then why go deeper?

𝑥

Hand-crafted features representations 𝑥 = 𝑥

1

, 𝑥

2

Automatically learned features representations

𝑊

(2)

′

= 𝐴𝑁𝐷(𝑥

1

, 𝑥

2

), 𝐴𝑁𝐷(𝑥

1

, 𝑥

2

) 𝑥

1

𝑊 (1) 𝑥

2

… … …

19

Multiple layers = multiple levels of features 𝑦

But why would I be interested in learning multiple levels of representations ?

Lets see where the motivation comes from…

𝑊 (4)

𝑊 (3)

𝑊 (2) 𝑥

1 𝑥

2

𝑊 (1) 𝑥

3

20

(idea from Hugo Larochelle’s slides)

The brain analogy

Layer 1 representation nose mouth eyes

Layer 2 representation face

Layer 3 representation

21

YAWN!!!! Enough With the Brain Tampering

Just tell me Why should I be interested In Deep Learning?

(“Show Me the Money”)

22

the why?

23

(from Y. Bengio’s MLSS 2014 slides)

Used in a wide variety of applications

24

Industrial Scale Success Stories

Speech Recognition

Object Recognition

Face Recognition

Cross Language

Learning

Machine Translation

Text Analytics

Dramatic improvements reported in some cases

Disclaimer: Some nodes and edges may be missing due to limited public knowledge

25


Some more success stories

26

Let me see if I understand this correctly…

• Speech Recognition, Machine Translation, etc. are more than 50 years old

• Single artificial neurons have been around for more than 50 years 𝑥

1 𝑤

1 𝑤

2 𝑤

3 𝑥

2 𝑥

3 𝑏

1 𝑦

50+ years?

𝑥

1 𝑥

2

𝑊

(4)

𝑊

(3)

𝑊

(2)

No, even deep neural networks have been around for many, many years but prior to 2006

training deep nets was unsuccessful

𝑊 (1) 𝑥

3

27


So what has changed since 2006?

• New methods for unsupervised pre-training have been developed

• More efficient parameter estimation methods

• Better understanding of model regularization

• Faster machines and more data help DL more than other algorithms

28

the how?

29

recap 𝑎 𝑥 = 𝑏 + 𝑤 𝑖 𝑥 𝑖 𝑖 ℎ 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏 + 𝑤 𝑖 𝑥 𝑖 𝑖 𝑥

1 𝑎 𝑥 𝑤

1 𝑤

2 𝑤

3 𝑏

1 𝑥

2 𝑥

3 𝑤 𝑖

= 𝑤𝑒𝑖𝑔ℎ𝑡 𝑏 = 𝑏𝑖𝑎𝑠 𝑎 𝑥 = 𝑝𝑟𝑒 − 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 ℎ 𝑥 = 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 single artificial neuron

𝑮𝒐𝒂𝒍: 𝑔𝑖𝑣𝑒𝑛 𝑁 𝑥, 𝑦 𝑝𝑎𝑖𝑟𝑠 𝑙𝑒𝑎𝑟𝑛 𝑤, 𝑏 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 ℎ(𝑥 𝑗

) 𝑖𝑠 𝑎𝑠 𝑐𝑙𝑜𝑠𝑒 𝑡𝑜 𝑦 𝑗 𝑎𝑠 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒

30

Switching to slides corresponding to lecture 2 from Hugo Larochelle’s course http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html

31

the where?

32

Some pointers to additional material

• http://deeplearning.net/

• http://info.usherbrooke.ca/hlarochelle/neural

_networks/content.html

33

DeepLearningTutorial_Icare2014

the what?

the why?

the how?

the where?

Related documents

Products

Support

DeepLearningTutorial_Icare2014

the what?

the why?

the how?

the where?

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib