Classification of Handwritten Digits Using an Artificial Neural Network Michael Mason

advertisement
The Problem
The Solution(s)
The Problem Remembered
Classification of Handwritten Digits Using an
Artificial Neural Network
Michael Mason
Colorado School of Mines
April 20, 2015
1/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
2/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
A Solution - The ANN
Results
Group
Belongie et al.
Meier et al.
Ranzato et al.
LeCun et al.
Method
K-NN
Committee of 25 NN
Large CNN
Virtual SVM
Test Classification Error
0.63%
0.39%
0.39%
0.80%
Table 1: Selection of Results
3/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
A Solution - The ANN
Description
4/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
A Solution - The ANN
Logistic Classifier
■
Determines probability of each digit for a given input.
■
The input is mapped with the softmax function(weight matrix
W and bias vector b) to a probability density
e Wi x+bi
P(Y = i|x, W, b) = ∑ W x+b
j
j
je
■
(1)
Digit (i) with the highest probability is the output of the
classifier
Operating on raw input data, a logistic classifier can achieve ˜93%
classification accuracy
5/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
Figure 1: Learned projection to
zero-digit
A Solution - The ANN
Figure 2: Learned projection to
one-digit
6/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
A Solution - The ANN
Hidden Layer
A one-hidden-layer NN is a universal function approximator and
can be formally expressed as a function f : Rinputsize → Routputsize
f (x) = G (b (2) + W (2) (s(b (1) + W (1) x)))
(2)
■
Added complexity allows for general function fitting
■
Utilizes a nonlinear ”activation function” s to transform inputs
■
Can be chained for increased complexity
7/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
A Solution - The ANN
Back Propagation
■
Maximize P(Y = i|x, W, b) = L(x, W, b; Y = i), referred to
as the likelihood
■
Use stochastic gradient descent to alter W and b as well as
internal values.
■
Optimize output layer first and propagate backwards
■
Rate of gradient traversal dictated by a learning rate
8/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
A Solution - The ANN
Known Issues
■
Prone to over-fitting
■
Can get stuck in local
minima
■
Does not respect
structure of the data
9/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
A Solution - The ANN
Implementation
Utilized Theano
Theano
■ Theano allows for manipulation of symbolic variables
■
Many useful, optimized functions for machine learning such as
softmax, gradient etc.
■
Graphics card utilization through CUDA
Tutorial at: http://deeplearning.net/tutorial/
10/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
A Solution - The ANN
My Results
Specifics
■ 150 hidden layer nodes
■
Initial learning rate of 0.02, decaying
■
Batch size of 15
Achieved a classification accuracy of 98.1%
11/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
Limitations
Figure 3: Mystery Digit (it’s an eight)
■
Difficult to ask computers to recognize what humans can’t
(without more information)
■
Lack of context clues
12/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
Figure 4: Note number of
chainz
13/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
(Immediate) Future Work
To Do - Increased Robustness
■ Deepen Network
■
Implement noise & Dropout
■
Improve on SGD
14/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
Works Cited
1 D. Ciresan, U. Meier, and J. Schmidhuber. ”Multi-column deep neural networks for image classification. In
Computer Vision and Pattern Recognition (CVPR)”, 2012 IEEE Conference on, pages 3642-3649.. IEEE,
2012.
2 R. Ghosh, and M. Ghosh, ”An Intelligent Offline Handwriting Recognition System Using Evolutionary
Neural Learning Algorithm and Rule Based Over Segmented Data Points”, Journal of Research and
Practice in Information Technology, Vol. 37, No. 1, pp. 73-87, Feb. 2005.
3 Y. LeCun, L. Bottou, G. Orr and K. Muller: Efficient BackProp, in Orr, G. and Muller K. (Eds), ”Neural
Networks: Tricks of the trade”, Springer, 1998
4 D.W. Opitz, J.W. Shavlik, ”Generating accurate and diverse members of a neural network Ensemble”, in:
Advances in Neural Information Processing Systems, MIT Press, 1996, pp. 535-541.
15/16
Michael Mason
A Tale of Ten Digits
The Problem
The Solution(s)
The Problem Remembered
Thank you for your time.
Questions?
16/16
Michael Mason
A Tale of Ten Digits
Download