Presentation

advertisement
Submitted by:
Ankit Bhutani
(Y9227094)
Supervised by:
Prof. Amitabha Mukerjee
Prof. K S Venkatesh


AUTO-ASSOCIATIVE NEURAL NETWORKS
OUTPUT SIMILAR AS INPUT



BOTTLENECK CONSTRAINT
LINEAR ACTIVATION – PCA [Baldi et al.,
1989]
NON-LINEAR PCA [Kramer, 1991] – 5 layered
network
 ALTERNATE SIGMOID AND LINEAR ACTIVATION
 EXTRACTS NON-LINEAR FACTORS




ABILITY TO LEARN HIGHLY COMPLEX
FUNCTIONS
TACKLE THE NON-LINEAR STRUCTURE OF
UNDERLYING DATA
HEIRARCHICAL REPRESENTATION
RESULTS FROM CIRCUIT THEORY – SINGLE
LAYERED NETWORK WOULD NEED
EXPONENTIALLY HIGH NUMBER OF
HIDDEN UNITS

DIFFICULTY IN TRAINING DEEP NETWORKS
 NON-CONVEX NATURE OF OPTIMIZATION
 GETS STUCK IN LOCAL MINIMA
 VANISHING OF GRADIENTS DURING
BACKPROPAGATION

SOLUTION
 -``INITIAL WEIGHTS MUST BE CLOSE TO A
GOOD SOLUTION’’ – [Hinton et. al., 2006]
 GENERATIVE PRE-TRAINING FOLLOWED BY
FINE-TUNING

PRE-TRAINING
 INCREMENTAL LAYER-WISE TRAINING
 EACH LAYER ONLY TRIES TO REPRODUCE THE
HIDDEN LAYER ACTIVATIONS OF PREVIOUS
LAYER


INITIALIZE THE AUTOENCODER WITH
WEIGHTS LEARNT BY PRE-TRAINING
PERFORM BACKPROPOAGATION AS
USUAL

STOCHASTIC – RESTRICTED BOLTZMANN
MACHINES (RBMs)
 HIDDEN LAYER ACTIVATIONS (0-1) USED TO TAKE A
PROBABILISTIC DECISION OF PUTTING 0 OR 1
 MODEL LEARNS THE JOINT PROBABILITY OF 2
BINARY DISTRIBUTIONS - 1 IN INPUT AND THE
OTHER IN HIDDEN LAYER
 EXACT METHODS – COMPUTATIONALLY
INTRACTABLE
 NUMERICAL APPROXIMATION - CONTRASTIVE
DIVERGENCE

DETERMINISTIC – SHALLOW
AUTOENCODERS
 HIDDEN LAYER ACTIVATIONS (0-1) ARE




DIRECTLY USED FOR INPUT TO NEXT LAYER
TRAINED BY BACKPROPAGATION
DENOISING AUTOENCODERS
CONTRACTIVE AUTOENCODERS
SPARSE AUTOENCODERS
TASK \ MODEL RBM
SHALLOW AE
CLASSIFIER
[Hinton et al, 2006]
and many others
since then
Investigated by
[Bengio et al, 2007],
[Ranzato et al,
2007], [Vincent et
al, 2008], [Rifai et
al, 2011] etc.
DEEP AE
[Hinton &
Salakhutdinov, 2006]
No significant
results reported
in literature - Gap

MNIST

Big and Small Digits

Square & Room

2d Robot Arm

3d Robot Arm

Libraries used
 Numpy, Scipy
 Theano – takes care of parallelization

GPU Specifications
 Memory – 256 MB
 Frequency – 33 MHz
 Number of Cores – 240
 Tesla C1060

REVERSE CROSS-ENTROPY

X – Original input
Z – Output
Θ – Parameters – Weights and Biases



RESULTS FROM PRELIMINARY
EXPERIMENTS

TIME TAKEN FOR TRAINING

CONTRACTIVE AUTOENCODERS TAKE
VERY LONG TO TRAIN

EXPERIMENT USING SPARSE
REPRESENTATIONS
 STRATEGY A – BOTTLENECK
 STRATEGY B – SPARSITY + BOTTLENECK
 STRATEGY C – NO CONSTRAINT + BOTTLENECK

MOMENTUM
 INCORPORATING THE PREVIOUS UPDATE
 CANCELS OUT COMPONENTS IN OPPOSITE
DIRECTIONS – PREVENTS OSCILLATION
 ADDS UP COMPONENTS IN SAME DIRECTION –
SPEEDS UP TRAINING

WEIGHT DECAY
 REGULARIZATION
 PREVENTS OVER-FITTING

USING ALTERNATE LAYER SPARSITY WITH
MOMENTUM & WEIGHT DECAY YIELDS
BEST RESULTS

MOTIVATION
Download