Basic Maths for ANNs

advertisement

Basic Mathematics for Understanding Artificial Neural

Networks

Properties of ANNs are best described using the mathematics of linear algebra. This topic is an introduction to linear algebra essential for understanding ANN models.

Reference:

Rumelhart, D., et at, "Parallel Distributed Processing" Vol. 1, Ch 9.

Or any book on linear algebra

Vectors

Workings of ANN models correspond to operations on vectors.

We adopt a more general and abstract view of vectors compared with the spatial representation with arrows that we are familiar with.

Most commonly, it can be an ordered list of numbers.

Useful for representing patterns with attributes or components (vector elements).

Eg, the age, height and weight of a person form a 3D vector

Joe

45

67

180

Vectors up to the 3 rd dimension can be visualised graphically in space - each axis represents one component.

In ANN models, the following are represented using vectors:

Pattern of numbers representing activation levels of neurons

Set of weights on the input lines to a neuron

Set of inputs to a neuron

726866400 1

Basic operations on vectors

Multiplication by Scalars (single real numbers)

A vector can be multiplied by a scalar by multiplying each of its components by the scalar.

Eg,

5



3

1

2



15

5

10

Multiplication results in the lengthening or shortening of a vector while keeping its direction unchanged.

Two vectors that are scalar multiples of each other are said to be collinear .

Addition of vectors

Two or more vectors can be added by adding their corresponding components

They must have the same number of components.

Vector addition is associative (the vectors can be grouped in any manner)

Eg,

( A + B ) + C = A + ( B + C) and also commutative (order is not important)

Eg.

A + B = B + A

726866400 2

Linear combination of vectors

Given the vectors v

1

=

1

2

 , v

2

=

3

2

 , and u =

9

10

Can scalars c

1 and c

2 be found such that u = c

1 v

1

+ c

2 v

2

?

If so, then u is said to be a linear combination of vectors v

1 and v

2

.

In general, given a set v

1

, v

2

, v

3

, . . . , v n

of vectors, a vector v is said to be a linear combination of the vectors v i if scalars c

1

, c

2

, . . . , c n

can be found such that v = c

1 v

1

+ c

2 v

2

+ . . . + c n v n

.

The set of all linear combinations of vectors v i

is called the set spanned by v i

.

Eg, The three vectors vector v

1

0 

 0 

,

0

1

0

 and

0

0

1

span all 3D space since any

can be written as a linear combination a

1

0

0

 b

0

1

0

 c

 1

0

0

The three vectors

1

0 

 0 

,

0

1

0

 and

0

0

1

are referred to as the standard basis for 3D space.

The coefficients are referred to as coordinates .

726866400 3

b)

1

1

 ,

Linear independence and n-dimensional space

We have seen a set of three vectors (example above) span the 3D space.

Can we generalise to say that any set of n vectors span an n -dimensional space?

Not so for all vectors, especially those which lie in the same plane.

If in a set of n vectors, at least one of them can be written as a linear combination of the others, the set is said to be linearly dependent .

An n -dimensional space is defined to be the set of vectors spanned by a set of n linearly independent vectors.

We can now define basis vectors of any ndimensional space as the n linearly independent vectors whose linear combinations give all vectors in that space

Can you verify this for the basis vectors of 3D space mentioned earlier?

Are the following vectors linearly independent?

1 a)

1

1

 and

1

2

1

2

 and



3

1

1 a) independent b) dependent

726866400 4

Vector inner products

The inner (or dot) product of two vectors is the sum of the products of the vector components.

Given two vectors with their components: u

 u

 u u x y z

 and v

 v v v x y z

 the inner product of u and v is defined to be u

v = u x

v x

+ u y

v y

+ u z

v z

An alternative definition of the dot product is: u

v = |u| |v| cos

 where

is the angle between the two vectors.

The inner product of two vectors results in a scalar .

For example, given two vectors A and B with components

A x

= 2, A y

= 3, A z

= 0;

B x

= 1, B y

= 0, B z

= 2;

A

B = A x

B x

+ A y

B y

+ A z

B z

= 2

726866400 5

Angle between vectors

The 2 nd definition of the inner product also provides a method for finding the angle between two vectors u

v = |u| |v| cos

 cos

=

| u u

|

.

v

| v |

When

is close to zero the inner product has a large value, which is maximum for

=0

The product is zero when

=90 o , and minimum when

=180 o .

When the inner product of two vectors is zero (

= 90 o ), the vectors are said to be orthogonal .

The basis vectors of 3D space are an example of a set of orthogonal vectors.

Projection of a vector

The projection x of a vector v on a second vector w is a measure of how much v is pointing in the direction of w . x = |v| cos

The projection would be zero if the two vectors are orthogonal

The Cauchy-Schwarz inequality

| v . w |

| v| |w |

Gives an upper bound on the inner product.

726866400 6

Use of vectors in modelling an artificial neuron v

1 w

1 v

2 w

2 u w n v n

Fig. A processing unit receiving input from n input units. u activation of output unit - a scalar v i activation of i th input unit - i th component of n -dimensional vector v w i weight associated with i th input line - i th component vector w

Operation of the neuron -

Activation of each input unit multiplied by the weight on its link, and each of these products added to give activation of output unit u = w

1 v

1

+ w

2 v

2

+ . . . + w n v n

Can be represented as u = w . v

In other words, activation of the output unit is the inner product of its weight vector with the vector of input activations.

726866400 7

Matrices

Multiplication of an input vector by a weight matrix is one of the key operations performed by ANNs.

They are used as mapping functions from one vector space to another.

An m

 n matrix is a 2D array of real numbers with m rows and n columns.

The element in the i th row and j th column is denoted as a ij

.

M

 a a

11

21

...

a m 1 a a a

12

22

...

m 2

...

...

...

...

a a

1 n

2 n

...

a mn

For example

A

1

0

0

1 

 is a 2 x 2 matrix, and

B

2

1

3

4

1

0

 is a 2 x 3 matrix.

726866400 8

Square matrix

A square matrix has the same number of rows and columns. For example,

M

2

1

2

1

0

1

0

1

0

Diagonal matrix

A diagonal matrix is a square matrix that has 0’s for all elements except those along its main diagonal.

Eg,

M

2

0

0

0

7

0

0

0

4

Addition of matrices

Similar to vectors, matrices can be added by adding corresponding elements.

Must have the same number of rows and columns.

726866400 9

Multiplication of a vector by a matrix

A vector v can be multiplied by a matrix W to produce a new vector u .

Eg, u = Wv =

3

1

4

0

5

1

 

1

0

2

3

1 .

.

1

1

4 .

0

0 .

0

5 .

2

1 .

2

13

3

Rows in a matrix can be treated as row vectors.

The components of u are inner products of v with row vectors of W .

The dimensionality of v must equal the number of columns of W f or the inner product to be possible.

726866400 10

How matrices can be used to analyse ANN models

Suppose there are n input neurons, each connected to and m output neurons.

Activations of output neurons are denoted by u

1

,u

2

, . . . , u m

.

Each output unit has its own weight vector w i

. v

1 u

1 v

2 u

2 v n u n v

W

 u

Activation of an output unit is given by the inner product of its weight vector with the input vector, u i

= w i

. v.

If W is the matrix whose row vectors are the vectors w i

,

And u is the vector whose components are the u i

,

Then using the rule for matrix multiplication u = Wv .

726866400 11

Linear and Nonlinear Systems

Suppose a function f represents a system in which for each input x to the system, the output y is given by y = f(x).

The function f is said to be linear if for any inputs x

1 and x

2

, and any real number c , the following two equations hold: f(cx) = c f(x) (1)

That is, if we multiply the input by some constant, then the output is multiplied by the same constant.

Output is proportional to input. f(x

1

+ x

2

) = f(x

1

) + f(x

2

) (2)

That is if we know how the system responds separately to any of the inputs, then we can add the outputs produced separately to obtain the response to the sum of the inputs.

The above properties do not hold for nonlinear systems.

We know for vector-matrix multiplications

W(av ) = a (W v )

W(u + v ) = Wu + Wv

So a system in which the output is obtained from the input by matrix multiplication is also a linear system.

In other words, if a function that maps from one vector space to another vector space is linear, then it can be represented by matrix multiplication.

726866400 12

Nonlinear Systems

An ANN with one output unit computes the inner product of the weight vector and the input vector to obtain the activation of the output unit.

This is a linear system because of the linearity of the inner product.

We can set a fixed number as a threshold for the output unit such that if the product is greater than the threshold, the unit’s output is 1, otherwise it outputs a 0.

The use of a threshold is common in using the neuron to classify a pattern as belonging to one class or another.

Thresholding is used to transform the activation of a neuron into its output in a nonlinear way.

ACTIVATION

It is also possible to have probabilistic threshold where the likelihood of the output being zero is a probabilistic function of how close the activation is to the threshold.

In biological systems it is often found that the system does not respond until input stimulus reaches a threshold, and then responds in a linear manner, as shown in diagram below:

726866400

ACTIVATION

13

All physical systems have a dynamic range.

The output keeps increasing linearly until it reaches a maximum response level M .

The response does not exceed this level as shown below

OUTPUT

ACTIVATION

In some systems, the output appears to approach the maximum response level M in an asymptotic way.

The response curve has an ‘S’ shape and is called a sigmoid .

Sigmoid transfer function are widely used for computing the outputs of artificial neurons.

726866400 14

Download