Document

Before we start ADALINE



Test the response of your Hebb and Perceptron on this following noisy version



Exercise pp98 2.6(d)

ADALINE







ADAPTIVE LINEAR NEURON

Typically uses bipolar (1, -1) activations for its input signal and its target output

The weights are adjustable, has bias whose activation is always 1

Input Unit

Output Unit

1 b

:

X

1 w

1 w

2

Y

X n

Architecture of an ADALINE

ADALINE



In general ADALINE can be trained using the delta rule also known as least mean squares ( LMS ) or Widrow-Hoff rule



The delta rule can also be used for single layer nets with several output units



ADALINE – a special one - only one output unit

ADALINE



Activation of the unit



Is the net input with identity function



The learning rule minimizes the mean squares error between the activation and the target value



Allows the net to continue learning on all training patterns, even after the correct output value is generated

ADALINE



After training, if the net is being used for pattern classification in which the desired output is either a +1 or a -1, a threshold function is applied to the net input to obtain the activation

If net_input ≥ 0 then activation = 1

Else activation = -1

Step 0:

Step 1:

The Algorithm

Initialize all weights and bias:

(small random values are usually used0

Set learning rate  (0 <  ≤ 1)

 = 0

While stopping condition is false, do steps 2-6.

Step2:For each bipolar training pair s:t, do steps 3-5

Step 3. Set activations for input units:

i = 1, …, n: x i

= s i

Step 4.Compute net input to output unit:

NET = y_in = b +  x i w i

;

Step 5.

Step 6.

The Algorithm

Update weights and bias i = 1, …, n else w i

(new) = w i

(old) + b(new) = b(old) + 



(t

(t – y_in)x

– y_in) i w i

(new) = w i

(old) b(new) = b(old)

Test stopping condition:

If the largest weight change that occurred in Step

2 is smaller than a specified tolerance, then stop; otherwise continue.

Setting the learning rate









Common to take a small value for  = 0.1 initially

If  too large, the learning process will not converge

If  too small learning will be extremely slow



For single neuron, a practical range is

0.1 ≤ n  ≤ 1.0

Application

After training, an ADALINE unit can be used to classify input patterns. If the target values are bivalent (binary or bipolar), a step function can be applied as activation function for the output unit

Step 0:

Step 1:

Initialize all weights

For each bipolar input vector x, do steps 2-4

Step 2. Set activations for input units to x

Step 3. Compute net input to output unit: net = y_in = b +  x i w i

;

Step 4. Apply the activation function

f(y_in)

1

-1 if y_in ≥ 0; if y_in < 0.

Example 1



ADALINE for AND function: binary input, bipolar targets

(x1 x2 t)

(1 1 1)

(1 0 -1)

(0 1 -1)

(0 0 -1)



Delta rule in ADALINE is designed to find weights that minimize the total error

Associated target for pattern p

4

E =  (x

1 p=1

(p) w

1

+ x

2

(p)w

2

+ w

0

– t(p)) 2

Net input to the output unit for pattern p

Example 1



ADALINE for AND function: binary input, bipolar targets






Weights that minimize this error are w

1

= 1, w

2

= 1, w

0

= -3/2



Separating lines x

1

+ x

2

– 3/2 = 0

Example 2



ADALINE for AND function: bipolar input, bipolar targets

(x1 x2 t)

(1 1 1)

(1 -1 -1)

(-1 1 -1)

(-1 -1 -1)




Associated target for pattern p

4

E =  (x

1 p=1

(p) w

1

+ x

2

(p)w

2

+ w

0

– t(p)) 2

Net input to the output unit for pattern p

Example 2



ADALINE for AND function: bipolar input, bipolar targets



Weights that minimize this error are w

1

1/2

= 1/2, w

2

= 1/2, w

0

= -



Separating lines 1/2x

1

+1/2 x

2

– 1/2 = 0

Example



Example 3: ADALINE for AND NOT function: bipolar input, bipolar targets



Example 4: ADALINE for OR function: bipolar input, bipolar targets

Derivations



Delta rule for single output unit







The delta rule changes the weights of the connections to minimize the difference between input and output unit

By reducing the error for each pattern one at a time

The delta rule for Ith weight(for each pattern) is

 w

I

=  (t – y_in)x

I

Derivations



The squared error for a particular training pattern is

E = (t – y_in) 2 .

E : function of all weights w i

, I = 1, …, n



The gradient of E is the vector consisting of the partial derivatives of E with respect to each of the weights



The gradient gives the direction of most rapid increase in E



Opposite direction gives the most rapid decrease in the error



The error can be reduced by adjusting the weight w

I direction of -

 E

 w

I in the

Derivations



Since y_in =  x i w i

,

 E

 w

I

= -2(t – y_in)  y_in

 w

I

= -2(t – y_in)x

I

The local error will be reduced most rapidly by adjusting the weights according to the delta rule

 w

I

=  (t – y_in)x

I

Derivations



Delta rule for multiple output unit



The delta rule for Ith weight(for each pattern) is

 w

IJ

=  (t – y_in

J

)x

I

Derivations



The squared error for a particular training pattern is

E = m j=1

 (t j

– y_in j

) 2 .

E : function of all weights w i

, I = 1, …, n



The error can be reduced by adjusting the weight w the direction of

I

 E

 w

IJ

= 

 w

I m

 j=1

(t j

– y_in j

) 2 in

= 

 w

I

(t

J

– y_in

J

) 2

Continued pp 88

Exercise





http://www.neural-networks-at-yourfingertips.com/adaline.html

Adaline Network Simulator

MADALINE



MANY ADAPTIVE LINEAR NEURON

1

1 b

1 X

1 b

3

Z

1 w

11 w

12 w

21 w

22

Z

2 v

1 v

2 b

2 X

2

Y

Architecture of an MADALINE with two hidden ADALINES and one output ADALINE

1

MADALINE



Derivation of delta rule for several outputs shows no change in the training process with several combination of ADALINEs



The outputs of two hidden ADALINES, z

1 determined by signal from input units X

1 and z

2 and X

2 are



Each output signal is the result of applying a threshold function to the unit’s net input



y is the non-linear function of the input vector (x

1

, x

2

)

MADALINE



Why we need hidden units???



The use of hidden units Z1 and Z2 give the net



Computational capabilities not found in single layer nets



But…complicate the training process



Two algorithms



MRI – only weights for hidden ADALINES are adjusted, the weights for output unit are fixed



MRII – provides methods for adjusting all weights in the net

ALGORITHM: MRI

1

X

1

X

2 w

11 w

12 w

21 w

22 b

1

Z

1 b

2

Z

2

1

1 v

1 b

3 v

2

Y

The weights v1 and v2 and bias b3 that feed into the output unit Y are determined so that the response of unit Y is 1 if the signal it receives from either Z1 or Z2 (or both) is 1 and is -1 if both Z1 and Z2 send a signal of -1. The unit Y performs the logic function OR on the signals it receives from Z1 and Z2

Set v

1

= ½, v

2

= ½ and b

3

= ½ see example 2.19 the OR function

1

X

X

2 w

11

2 w

1 w

21 w

22

ALGORITHM:

MRI

1 b

1

Z

1

1 v

1 b

3 v

2

Y x1 x2 t

1 1 -1

1 -1 1

-1 1 1

-1 -1 -1

Set  = 0.5

Weights into

Z

1 w

11 w

21 b

1

.05 .2 .3

Z

2

1 b

2

Z

2 w

12 w

22 b

2 v

1

Y v

2 b

.1 .2 .15 .5 .5 .5

3

Set v

1

= ½, v

2

= ½ and b

3

= ½ see example 2.19 the OR function

Step 0: Initialize all weights and bias: w i

= 0 (i= 1 to n), b=0

Set learning rate  (0 <  ≤ 1)

 = 0

Step 1: While stopping condition is false, do steps 2-8.

Step2: For each bipolar training pair s:t, do

Step 3.

Set activations for input units: steps 3-7 x i

= s i

Step 4 .Compute net input to each hidden ADALINE unit: f(x)

1 z_in z_in

1

2

= b

= b

1

2

+ x

+ x

1

2 w w

11

12

+ x

+ x

2

2 w w

21

22

;

;

Step 5. Determine output of each hidden ADALINE z

1 z

2

= f(z_in1)

= f(z_in2)

Step 6 . Determine output of net:

y_in = b

3

+ z

1 v

1

+ z

2 v

2

X

1

X

2 w

11 w

12 w

21 w

22 b

1

1 b

2

1

-1

Z

1

Z

2 if x ≥ 0 if x < 0

1 v

1 b

3 v

2

Y

The Algorithm

Step 7.

Update weights and bias if an error occurred for this pattern

If t = y, no weight updates are performed otherwise;

If t = 1, then update weights on Z

J closest to 0,

, the unit whose net input is w iJ

(new) = w iJ b

J

(new) = b

J

(old) +

(old) + 

 (1 – z_in)x i

(1 – z_in

J

)

If t = -1, then update weights on all units Z

K input,

, that have positive net w ik

(new) = w ik b k

(new) = b k

(old) +

(old) + 

 (-1 – z_in)x i

(-1 – z_in k

)

Step 8.

Test stopping condition:

Of weight changes have stopped(or reached an acceptable level), or if a specified maximum number of weight update iterations (Step 2) have been performed, then stop; otherwise continue

Document

Before we start ADALINE

ADALINE

ADALINE

ADALINE

ADALINE

The Algorithm

The Algorithm

Setting the learning rate

For single neuron, a practical range is

Application

Example 1

Example 1

Example 2

Example 2

Example

Derivations

Delta rule for single output unit

Derivations

Derivations

Derivations

Delta rule for multiple output unit

Derivations

Exercise

http://www.neural-networks-at-yourfingertips.com/adaline.html

Adaline Network Simulator

MADALINE

MADALINE

MADALINE

ALGORITHM: MRI

ALGORITHM:

MRI

The Algorithm

Related documents

Products

Support

Document

Before we start ADALINE

ADALINE

ADALINE

ADALINE

ADALINE

The Algorithm

The Algorithm

Setting the learning rate

For single neuron, a practical range is

Application

Example 1

Example 1

Example 2

Example 2

Example

Derivations

Delta rule for single output unit

Derivations

Derivations

Derivations

Delta rule for multiple output unit

Derivations

Exercise

http://www.neural-networks-at-yourfingertips.com/adaline.html

Adaline Network Simulator

MADALINE

MADALINE

MADALINE

ALGORITHM: MRI

ALGORITHM:

MRI

The Algorithm

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib