Manhattan Rule Training for Memristive Crossbar Circuit Pattern Classifiers Irina Kataeva

advertisement
Manhattan Rule Training for Memristive Crossbar
Circuit Pattern Classifiers
Elham Zamanidoost, Farnood M. Bayat, Dmitri Strukov
Irina Kataeva
Electrical and Computer Engineering Department
University Of California Santa Barbara
Santa Barbara, CA, USA
{elham, farnoodmb, strukov}@ece.ucsb.edu
Advanced Research Division
Denso Corporation
Komenoki-cho, Nisshin, Japan
irina_kataeva@denso.co.jp
Abstract—We investigated batch and stochastic Manhattan
Rule algorithms for training multilayer perceptron classifiers
implemented with memristive crossbar circuits. In Manhattan
Rule training, the weights are updated only using sign
information of classical backpropagation algorithm. The main
advantage of Manhattan Rule is its simplicity, which leads to
more compact hardware implementation and faster training
time. Additionally, in case of stochastic training, Manhattan Rule
allows performing all weight updates in parallel, which further
speeds up the training procedure. The tradeoff for simplicity is
slightly worse classification performance. For example,
simulation results showed that classification fidelity on Proben1
benchmark for memristor-based implementation trained with
batch Manhattan Rule were comparable to that of classical
backpropagation algorithm, and about 2.8 percent worse than
the best reported results.
Keywords—Crossbar memory, Memristor, Artificial neural
network, Training algorithm, Pattern classification.
I. INTRODUCTION
Artificial neural network is a biologically inspired
computing paradigm suitable for variety of applications. To
approach energy efficiency of biological neural networks in
information processing, a specialized hardware must be
developed [1]–[5]. Crossbar-based hybrid circuits [6]–[10], and
in particular those of CrossNets variety [10], [11], are
identified as one of the most promising solutions, because such
circuits can provide high integration density of artificial
synapses and high connectivity between artificial neurons,
which are two major challenges for efficient artificial neural
network hardware implementation. The disadvantages of
crossbar circuits are certain restrictions on how the signals can
be applied to the artificial synapses, which, in turn, impose
limitations on training algorithms.
The main contribution of this paper is the development of a
crossbar circuit compatible training approach for multilayer
perceptron (MLP) networks. The performance of the proposed
training approach was simulated using a standard benchmark
suite for specific memristive crossbar network, which has
recently been utilized in successful experimental demonstration
of a small-scale pattern classifier [1, 5]. While the considered
MLPs are small and hardly practical, their key features, e.g.
network topology and training algorithms, are similar to those
of more practical networks such as deep learning convolutional
neural networks [12].
This work is supported by AFOSR under MURI grant FA9550-12-1-0038
and Denso Corporation, Japan. Also we acknowledge support from the Center
for Scientific Computing from the CNSI, MRL: an NSF MRSEC (DMR1121053) and NSF CNS-0960316.
To the best of our knowledge, the presented work is novel
and Manhattan Rule training has never been investigated
before in the context of crossbar circuit hardware. The majority
of previous work is devoted to unsupervised spiking networks
[4], [7], [8]. The most relevant studies are, perhaps,
experimental demonstration of a single layer perceptron [1, 5],
and simulations of small-scale [6], [9], and larger-scale pattern
classifiers [13]. In the next section, we provide brief
background information on the considered neural network, its
crossbar circuit implementation, and memristor models used
for simulation studies in this paper.
II. BACKGROUND
A. Multi-Layer Perceptron
In its simplest form, a feedforward neural network can be
represented by a directed acyclic graph (Fig. 1a) in which
neurons and synapses are nodes and edges of a graph,
respectively. Each neuron applies a certain transfer function to
the sum of its inputs and then passes information forward to the
next layer of neurons. A synapse multiplies its weight W with
the output of a pre-synaptic neuron and passes the resulting
product to the input of the post-synaptic neuron.
Mathematically, the operation within a single layer of the
network can be formulated as
fi = tanh(βui), with ui = ƩjWijXj,
(1)
where ui and fi are the input and output of the i-th post-synaptic
neuron, respectively, Xj is the output of j-th pre-synaptic
neuron, and Wij is the synaptic weight between j-th presynaptic and i-th post-synaptic neurons. Each neuron is
(a)
(b)
synapses
(d)
V
u
X W
f
pre-synaptic postneurons synaptic
neurons
(c)
+
‐
I+ +
W
G+
I-
=
G-
‐
memristor
Fig. 1.
Feedforward artificial neural network: (a) Abstracted graph
representation of one layer with three input and two output neurons, (b) its
crossbar circuit mapping, and (c, d) memristor-based crossbar implementation.
assumed to have a tanh activation function with slope β. For
the first layer of the network, | X | ≤ 1 values correspond to the
applied input pattern.
Feedforward neural networks, and in particular
multilayered perceptron which are based on such networks,
allow performing pattern classification task, i.e. mapping of
input patterns to certain classes. The classification is
considered successful if the specific output neuron,
corresponding to the applied pattern, produces the largest
value. Such operation is achieved by properly setting weights
in the network, which in the most general case, cannot be
calculated analytically but rather is found via some
optimization procedure, e.g. using the backpropagation training
algorithm in MLP networks [14].
Backpropagation training can be implemented in batch or
stochastic mode. For stochastic (sometimes called online)
training, weights are adjusted immediately after application of
a single pattern from a training set. In the first step of this
algorithm, a randomly chosen pattern n from a training set is
applied to the network and the output is calculated according to
(1). In the second step, the synaptic weights are adjusted
according to
characterized by the misclassification rate (MCR), i.e. the
percentage of input patterns, which are classified incorrectly.
The other important metric is training time which characterizes
how quickly the training algorithm converges.
B. MLP Implementation with Crossbar Circuits
The MLP structure maps naturally to the crossbar array
circuit (Fig. 1b). In particular, X and f are physically
implemented with voltages |V | ≤ Vread, while neuron’s input u
with current I. Synapses are implemented with crosspoint
devices whose conductance G is proportional to the synaptic
weight W. Because weight values can be negative, while
physical conductance is strictly positive, one solution is to
represent each synaptic weight with a pair of crosspoint devices
(Fig. 1c), which are denoted as G+ and G-, i.e.
Wij ≡ Gij+-Gij- ,
(6)
(2)
In such configuration, neuron receives two currents – one from
the crossbar line with weights G+ and another from the line
with weights G-, so that the negative weights are implemented
due to the subtraction of these two currents inside the neuron
(Fig. 1d). With Gmax and Gmin being the maximum and the
minimum conductances of the crosspoint devices, respectively,
the effective weight ranges from -Gmax + Gmin to Gmax - Gmin.
where α is learning rate and δi is the local (backpropagated)
error of the corresponding post-synaptic neuron. δi is calculated
first for output neurons, for which it is equal to the product of
the derivative of neuron output with respect to its input and the
difference between the actual output f and the desired value of
the output f (g), i.e.
Assuming virtually-grounded inputs of the post-synaptic
neuron, input current I is equal to the product GV. The current
difference is then converted to voltage via an operational
amplifier with feedback resistor R and then applied to a
saturating operational amplifier to approximate the hyperbolic
tangent activation function [1, 5, 11], i.e. implementing (1) on
the physical level:
ΔWij (n) = - αδi(n)Xj(n),
δi(n) =[ fi (g)(n) - fi (n)]
|
,
(3)
The error is then propagated backward (i.e. from the output to
the input layer) using the recurrence relation
δjpre(n) =
|
Ʃiδipost(n)Wij(n),
(4)
(Additional superscripts are added to distinguish between preand post-synaptic variables when describing operation within
the network layer.)
The application of all patterns from a training set
constitutes one epoch of training with multiple epochs typically
required for successful training. In the simplest version of the
batch backpropagation algorithm, the synaptic weights are
adjusted by
ΔWij’ = Ʃn ΔWij (n),
(5)
only at the end of each epoch, i.e. after all training patterns are
applied to the network.
Reaching perfect mapping during training is not
guaranteed. In addition, classification performance is typically
measured on a separate set of test patterns, which are not used
in the training process. Therefore classifier performance is
Vipost = Vread tanh[R(Ii + - Ii -)],
(7a)
Ii= Ii + - Ii - = Σj (Gij+ - Gij-)Vjpre,
(7b)
In general, a crossbar classifier can be trained ex-situ or insitu. In the first method, the neural network is first
implemented in software and the proper weights are calculated
by simulating the training process. The calculated weights are
then imported into hardware, which is somewhat similar to the
write operation for conventional memory. The main difference,
however, is that imported values are analog (or multi-bit),
which dramatically increases complexity and time of the write
operation. Alternatively, for in-situ approach, training is
implemented directly in the hardware. In this case, weights are
physically adjusted in hardware during training as described by
(2) or (5).
Both ex-situ and in-situ training approaches have recently
been demonstrated experimentally for memristive crossbar
circuits [1, 5]. The advantage of the ex-situ method is that any
(i.e. state-of-the-art) training algorithm that results in the best
classification performance can be implemented in software
without incurring much overhead in hardware. In-situ training,
however, automatically adjusts to any hardware variations,
which are unavoidable in analog circuits. Note that obtaining
and supplying detailed information of circuit’s defects and
variations to the software-implemented network for ex-situ
training is hardly practical for large-scale systems. Because of
this issue, the particular focus of this paper is on in-situ
training.
III. IN-SITU MANHATTAN RULE TRAINING
For the in-situ training to be practical, its area and time
overhead
should
be
minimized.
Straightforward
implementation of the backpropagation algorithm in
memristive hardware does not seem to be practical, because
each weight must be modified by unique amount according to
(2) or (5). Such analog adjustment of weights is possible (e.g.
using feedback write algorithm [17]) and could be reasonable
for small circuits [1], but would certainly be too slow for the
desired large-scale circuits, especially taking into account that
large number of epochs is typically required to perform training
[12].
Fortunately, there are some useful variations of the
backpropagation algorithm, which allow much more efficient
implementation of training in the considered memristive
crossbar networks. Here, we consider one such example Manhattan Rule training [18] - which is a coarse-grain
variation of backpropagation algorithm. In Manhattan Rule
only sign information of weight adjustment is utilized so that
weight updates for (2) and (5) become
ΔWijM (n) = sgn[ΔWij (n)] ,
(8)
ΔWijM ’ = sgn[ΔWij’],
(9)
and
The main appeal of such a training algorithm is that all weights
are updated by the same amount, which simplifies the weight
update operation and creates an opportunity for efficient
implementation of in-situ training in hardware. Fig. 4 shows
one instance of such implementation which we considered in
this paper.
In particular, let us consider a small portion of the crossbar
consisting of 4×2 effective weights, or equivalently 4×4
-3
VresetTH
Experiment
1
Simulation
Current (A)
0.5
Vset
RESET
Vread
TH
0
-0.5
S
-1
SET
-2
-1
A
0
1
Voltage (V)
2
3
Fig. 2. Simulated and experimentally measured I-V switching characteristics for
Pt/TiO2-x/Pt memristor for an applied voltage sweep shown in the inset [16].
-5
x 10
1
-1.7 V
0.8
|ΔG| (S)
C. Memristive Devices
In general, different types of two-terminal resistive
switching devices [15], can be integrated into crossbar array
circuits to implement a pattern classifier. In this paper, a
particular type of crosspoint devices - Pt/TiO2-x/Pt memristors
- for which an accurate device model is available [16] has been
investigated. A typical switching I-V for such devices is shown
in Fig. 2. The device conductance can be gradually decreased
(reset) by applying positive voltages above VresetTH= 1.3 V and
gradually increased (set) with negative voltages below VsetTH =
-0.9 V. The rate of conductance change for both switching
transitions increases with the applied voltage (Fig. 3). The
conductance remains unchanged when small voltages, i.e. |V| ≤
Vread = 0.5 V for the considered devices, are applied to the
device. Therefore, we assumed that relatively large (exceeding
set or reset threshold) voltages were applied to adjust synaptic
weights during the training process. Alternatively, smaller
(read) voltages, which do not modify synaptic conductances,
were assumed to be used for calculation of network output
during training and/or operation of the classifier.
x 10
Vset = -0.9 V …-1.7 V
Vreset = 1.3 V… 2.6 V
2.6V
0.6
0.4
0.2
0
0
0.2
0.4
0.6
G (S)
0.8
x 10
1
-3
Fig. 3. Simulated switching dynamics for reset and set transitions for the
considered memristors, in particular, showing absolute change in conductance
as a function of initial conductance, for several values of write voltages
(incremented in 0.1 V steps). The conductances are measured at 0.5 V (i.e. at
read bias).
differential weights (Fig. 4c). According to (3), for stochastic
training, the sign of the weight update depends on peripheral
values of local error δ (associated with horizontal crossbar
lines on Fig. 4c) and input X (associated with vertical lines).
There are four possible combinations of signs of δ and X and,
therefore, adjustments of all weights can be performed in four
steps with each step corresponding to a particular combination
of signs. For example, Fig. 4c shows weight update for a
specific case δ1 < 0, δ2 > 0 and X1 > 0, X2 < 0, X3 < 0, X4 > 0.
(Note that with considered differential weight implementation,
both positive and negative synapses are adjusted during the
weight update, with the latter always updated in the opposite
direction.)
Because all updates have the same magnitude, all the
weights sharing the same sign of δ and X in each step could be
updated simultaneously. To implement this parallel update,
each crossbar line receives a specific voltage pulse sequence
shown on Fig. 4a. In any particular step of such sequence,
only one specific set of memristors, which are located at the
crossbar lines with the same signs of δ and X, receive large
enough voltage bias of a proper polarity to modify their
conductances (Fig. 4b). The remaining memristors, which are
not supposed to be modified during the same step, receive
voltages below corresponding switching thresholds, which is
ensured by using
VresetTH ≤ Vreset < 2VresetTH, and VsetTH ≥ Vset > 2VsetTH , (10)
The hardware implementation of Manhattan Rule training
is quite straightforward and involves application of pulse
sequences s1 and s2 to the vertical crossbar lines with X < 0
and X > 0, respectively, and pulse sequences s3 and s4 to the
horizontal crossbar lines with δ < 0 and δ > 0.
In batch Manhattan Rule training, the weight updates are
no longer correlated with peripheral error and input values
(Fig. 4d). In this case, memristors can be updated in parallel
for two crossbar lines (which form a differential pair) using
the scheme proposed for stochastic training. Multiple pairs of
crossbar lines, however, are updated sequentially in this case
(Figs. 4e and 4f).
The considered training approach was evaluated on three
datasets - Cancer1, Diabetes1 and Thyroid1 from Proben1
benchmark [19]. Each dataset is implemented with a two-layer
differential-weight MLP network with 4 hidden neurons. There
are 9, 8 and 21 input neurons, and 2, 2, and 3 output neurons
for Cancer1, Diabetes1, and Thyroid1 datasets, respectively.
The total number of patterns in the training set were 350, 384
and 3600 for Cancer1, Diabates1 and, Thyroid1 datasets,
respectively.
Several cases of weight updates were considered. In all
simulations, conductances were initialized randomly between
Gmin = 0.01 mS and Gmax = 1 mS, and clipped at Gmin and Gmax
during training. Also, R = 2.27 kΩ and target output values
were V(g) = ±0.29 V, which correspond to the recommended
sigmoid function from [14] for the considered Vread and range
of conductances. The benchmark inputs were scaled to [-Vread,
Vread] range. All computations were performed using 32-bit
floating point precision arithmetic.
In the first (“ideal”) case, weights were updated according
to (8) and (9) without using the device model. Table I shows
the best classification performance achieved within 4000
epochs and the corresponding number of epochs required to
achieve best performance, with both values averaged over 1500
runs. For comparison, this table also shows simulation results
for the conventional backpropagation algorithm and some of
TABLE I. CLASSIFICATION PERFORMANCE FOR IDEAL NETWORK
Dataset
Cancer1
Diabetes1
Thyroid1
Batch Manhattan
Rule
[19]
Avg.
MCR
Avg # of
epochs
Avg.
MCR
Avg # of
epochs
Best reported
MCR
.0288
.2641
.0753
800
474
305
.0291
.2726
.0774
1995
2120
1716
.0114
.2500
.0200
V
s1
s2
V
s3
V
s4
V
Vreset/2
t
t
Vset/2
t
t
step 1 2 3 4
(b)
V
s1-s3
V
s1-s4
V
s2-s3
V
s2-s4
VresetTH
t
t
t
t
VsetTH
(c)
s1
s2
(d)
δ1+ < 0
s3
δ1- > 0
s4
δ2+ > 0
s4
δ2- < 0
(e)
IV. SIMULATION RESULTS
Batch
Backpropagation
(a)
X3<0
X1>0
X2<0 X4>0
s1
s3
(f)
s2
s2
s1
s3
0
s4
0
0
s3
0
s4
Fig. 4. Manhattan Rule training implementation: (a) 4-step pulse sequences
which are applied to the crossbar lines and (b) corresponding voltage biases
across memristor (with respect to the bottom terminal) as a result of an
application of pulse sequence. (c) A specific example of desired weight update
for stochastic training in a 4×4 memristive crossbar circuit and its
corresponding implementation. (d) A specific example of desired weight
update for batch training, and (e, f) its corresponding implementations. On
panel (d), red and green backgrounds correspond to negative and positive
updates, respectively.
the best reported results.
In the remaining studies, weight updates were performed
using the memristor device model described in Sec. IIC. The
best classification performance results were chosen within
1500 and 300 epochs of training for batch and stochastic
algorithms, respectively. Fig. 5 shows simulation results for
batch training using various pairs of Vset and Vreset voltages
satisfying (10). The performance results are summarized in
Table II.
The Manhattan Rule training was also simulated for a
more realistic case with added defects and variations to the
memristive crossbar network. Fig. 6 shows simulation results
with a fraction of randomly chosen memristors stuck in either
high conductive state Gmax (stuck-on-close) or low conductive
state Gmin (stuck-on-open). In particular, defective memristors
are assumed to be equally split between stuck-on-close and
stuck-on-open, so that, e.g., the defect fraction of 0.2
MCR Diabetes1
1.4
0.038
1.6
0.036
1.8
0.034
2
0.032
2.2
0.03
2.2
2.4
0.028
2.4
-1.6
-1.4
-1.2
Vset (V)
MCR Thyroid1
0.295
1.4
0.086
1.6
0.29
1.6
0.084
0.285
1.8
0.28
2
-1
Vreset (V)
1.4
Vreset (V)
Vreset (V)
MCR Cancer1
0.275
0.27
-1.6
-1.4
-1.2
Vset (V)
-1
0.265
1.8
0.082
2
0.08
2.2
0.078
2.4
-1.6
-1.4
-1.2
Vset (V)
-1
Fig. 5. Misclassification rate for batch training as a function of Vset and Vreset. The results are averaged over 500 runs.
TABLE II. CLASSIFICATION PERFORMANCE FOR MEMRISTIVE CROSSBAR CIRCUITS (500 RUNS)
Batch Manhattan Rule
Stochastic Manhattan Rule
Dataset
Avg.
MCR
Avg. # of
epochs
Optimal
Vset / Vreset
Avg. # of
updates
Cancer1
.0268
580
-1.7 V / 1.3 V
2.32 × 10
Diabetes1
Thyroid1
.2647
.0765
388
460
-1.1 V / 2.5 V
-1.7 V / 1.3 V
Avg. MCR
Avg. # of
epochs
Optimal
Vset / Vreset
Avg. # of
updates
3
.066
35
-1.5 V / 1.5 V
12.5 × 10
3
.344
26
-0.9 V / 1.3 V
9.98 × 10
-1.7 V / 2.1 V
4
1.55 × 10
1.84 × 10
corresponds to 10% of stuck-on-open and 10% of stuck-onclose devices.
Fig. 7 shows simulation results for memristive crossbar
circuits with device-to-device switching threshold variations.
Such variations were simulated by assuming that Vset and Vreset
of each device were normally distributed with mean values
corresponding to the optimal ones reported in Table II.
V. DISCUSSION AND SUMMARY
Simulation results summarized in Table I and II show that
classification fidelity for batch Manhattan Rule training is
comparable to that of conventional backpropagation training
and somewhat worse as compared to the best reported
performance. Moreover, classification fidelity remained the
same (or even slightly improved) when performing simulations
with realistic device models. The slight improvement in
performance could be explained by more optimal training
conditions, i.e. the optimal choice of reset and set voltages,
which effectively defines the learning rate for the simulated insitu training.
Stochastic Manhattan Rule training requires fewer epochs
to converge, though its classification performance was
significantly worse as compared to batch training. A similar
outcome is quite typical for classical backpropagation training
[14]. The additional coarsening of the weight update for
Manhattan Rule algorithm seems to be the reason behind
further increase in performance gap between the two modes of
training.
As Figs. 6 and 7 show, the considered network is quite
robust to the variations in device switching dynamics and
stuck-on defects. The main effect of device-to-device
variations is on convergence speed. For example, the number
3
.0730
10
3.6 × 10
3
3
of training epochs to reach the classification fidelity of the
variation-free network increased by at least 10%, 40% and,
32% for Cancer1, Diabates1 and, Thyroid1, respectively, while
the classification performance degraded rather gracefully with
added variations.
Because in stochastic training weights are updated for each
applied pattern, it is useful to estimate training time in terms of
elementary weight updates, rather than epochs. Assuming that
application of the four-step pulse sequence is one elementary
update, the training time for the stochastic algorithm is a
product of the number of patterns in the training set and the
total number of epochs. Here we neglect other computations
during training such as described by (3) and (4) and assume
that the weights can be updated in parallel in different MLP
layers (even though error is back-propagated sequentially).
For batch training, the weights are updated only once per
epoch, however, because of the sequential update, the training
time for a particular crossbar layer is the product of the number
of post-synaptic neurons and the total number of epochs. Table
II provides training time expressed in elementary updates.
Clearly, batch training is the fastest when taking into account
implementation details. It is unclear though if this will hold for
more practical circuits with much larger crossbar arrays.
Since in this paper we only focused on the weight update
implementation, let us briefly discuss area overhead of other
operations during training. It should be noted first that the
most computationally expensive operation for error
backpropagation is vector δ by matrix W computation (4). Such
operation can easily be performed without much additional
overhead using the same crossbar array hardware but with
reverse direction of computation. Other operations, e.g.
derivative calculations in (3) and (4), are performed at the
periphery of the array, and hence their relative contribution to
the total area is expected to shrink as the crossbar arrays get
larger (which will happen for more practical applications). The
most challenging operation in batch training is calculation and
storing of temporal weight increments which must be
performed for each weight of the array. If the network does not
have to be retrained frequently, one solution would be to
implement this operation off-chip. Investigation of better
solutions, which e.g. would combine the small overhead of
stochastic Manhattan Rule training and the high classification
performance of batch training is our immediate goal.
In summary, we proposed a training approach based on
Manhattan Rule algorithm for multilayer perceptron networks
implemented with memristive crossbar circuits. The
classification performance of the proposed approach was
evaluated on Proben1 benchmark for batch and stochastic
modes of training and compared with state-of-the-art results.
We found that batch training results in better classification
performance and potentially faster tuning time among the two,
though at the price of significantly higher implementation
overhead.
X10-1
Diabetes1
X10-1
7.0
153 3.5
310 1.4
148
6.2
131 3.3
256 1.3
121
5.4
109 3.2
203 1.1
95
4.6
88 3.0
150 1.0
69
3.8
66 2.9
97 0.9
43
3.0
0.2 0.4 0.6 0.8
45 2.8
0.2 0.4 0.6 0.8
44 0.8
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Thyroid1
0.2 0.4 0.6 0.8
17
Increase in non-Converged
Cases%
MCR
X10-2 Cancer1
REFERENCES
[8]
[9]
[10]
Defect Fraction
Fig. 6. Classification performance for batch training with stuck-on-open and
stuck-on-close devices. For all panels, right vertical axis shows the percentage
of increase in the number of cases that did not converge to an acceptable
solution, namely when MCR remained above 0.1, 0.4 and 0.3 for Cancer1,
Diabetes1 and, Thyroid1, respectively, within 1500 training epochs. The results
are averaged over 1500 runs.
MCR
x10-2
Cancer1
x10-2
Diabetes1
x10-2
3.0
27.22
7.95
2.97
27.17
7.90
2.93
27.12
7.85
2.90
27.06
7.80
2.87
27.01
7.75
2.84
0
.2
.4
.6
26.95
0
.2
.4
.6
7.70
Thyroid1
[11]
[12]
[13]
[14]
[15]
0
.2
.4
.6
[16]
Standard Deviation (V)
Fig. 7. Classification performance as a function of standard deviation in set
and reset switching threshold voltages. The results are averaged over 1500
runs.
ACKNOWLEDGMENT
The authors would like to acknowledge helpful discussions
with F. Alibart, O. Bichler, C. Gamrat, K. K. Likharev, G.
Snider, and D. Querlioz.
[17]
[18]
[19]
F. Alibart, E. Zamanidoost, and D. B. Strukov, “Pattern classification by
memristive crossbar circuits using ex-situ and in-situ training”, Nature
Communications, vol. 4, Jun. 2013.
C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y.
LeCun, “NeuFlow: A runtime reconfigurable dataflow processor for
vision”, in 2011 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition Workshops (CVPRW), 2011, pp. 109–
116.
J. Lu, S. Young, I. Arel, and J. Holleman, “A 1TOPS/W analog deep
machine-learning engine with floating-gate storage in 0.13 um CMOS”,
in Solid-State Circuits Conference Digest of Technical Papers (ISSCC),
2014 IEEE International, 2014, pp. 504–505.
P. Merolla, J. Arthur, F. Akopyan, N. Imam, R. Manohar, and D. S.
Modha, “A digital neurosynaptic core using embedded crossbar memory
with 45pJ per spike in 45nm”, in 2011 IEEE Custom Integrated Circuits
Conference (CICC), 2011, pp. 1–4.
M. Prezioso, F. Merrikh-Bayat, B. Hoskins, G. Adam, K. K., Likharev,
and D. B. Strukov, “Training and operation of integrated neuromorphic
network based on metal-oxide memristors”, available online at
http://arxiv.org/abs/1412.0611.
D. Chabi, D. Querlioz, W. Zhao, and J.-O. Klein, “Robust learning
approach for neuro-inspired nanoscale crossbar architecture”, J. Emerg.
Technol. Comput. Syst., vol. 10, no. 1, pp. 5:1–5:20, Jan. 2014.
Y. Kim, Y. Zhang, and P. Li, “A digital neuromorphic VLSI architecture
with memristor crossbar synaptic array for machine learning”, in SOC
Conference (SOCC), 2012 IEEE International, 2012, pp. 328–333.
D. Querlioz, O. Bichler, and C. Gamrat, “Simulation of a memristorbased spiking neural network immune to device variations”, in The 2011
International Joint Conference on Neural Networks (IJCNN), 2011, pp.
1775–1781.
C. Yakopcic and T. M. Taha, “Energy efficient perceptron pattern
recognition using segmented memristor crossbar arrays”, in The 2013
International Joint Conference on Neural Networks (IJCNN), 2013, pp.
1–8.
K. K. Likharev, “Hybrid CMOS/nanoelectronic circuits: Opportunities
and challenges”, Journal of Nanoelectronics and Optoelectronics, vol. 3,
no. 3, pp. 203–230, Dec. 2008.
K.
K.
Likharev,
“CrossNets:
Neuromorphic
hybrid
CMOS/nanoelectronic networks”, Science of Advanced Materials, vol.
3, no. 3, pp. 322–331, Jun. 2011.
A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNet classification
with deep convolutional neural networks”, in NIPS’12, 2012, pp. 10971105.
J. H. Lee and K. K. Likharev, “Defect-tolerant nanoelectronic pattern
classifiers”, Int. J. Circ. Theor. Appl., vol. 35, no. 3, pp. 239–264, May
2007.
Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient
BackProp”, in Neural Networks: Tricks of the Trade, G. Montavon, G.
B. Orr, and K.-R. Müller, Eds. Springer Berlin Heidelberg, 2012, pp. 9–
48.
D. B. Strukov and H. Kohlstedt, “Resistive switching phenomena in thin
films: Materials, devices, and applications”, MRS Bulletin, vol. 37, no.
02, pp. 108–114, Feb. 2012.
F. Merrikh-Bayat, B. Hoskins, and D. B. Strukov, “Phenomenological
modeling of memristive devices”, Appl. Phys. A, vol. 118, no. 3, pp.
779–786, Jan. 2015.
F. Alibart, L. Gao, B. D. Hoskins, and D. B. Strukov, “High precision
tuning of state for memristive devices by adaptable variation-tolerant
algorithm”, Nanotechnology, vol. 23, no. 7, p. 075201, Feb. 2012.
W. Schiffmann, M. Joost, and R. Werner, “Optimization of the
backpropagation algorithm for training multilayer perceptrons”,
Technical Report, University of Koblenz, 1994.
L. Prechelt, “PROBEN1-A set of benchmarks and benchmarking rules
for neural network training algorithms”, Technical Report 21/94,
Fakultat fur Informatik, Universitat Karlsruhe, p. 1994.
Download