Online Dynamic Value System for Machine Learning

advertisement

Fourth International Symposium on Neural Networks (ISNN)

June 3-7, 2007, Nanjing, China

Online Dynamic Value System for Machine Learning

Haibo He, Stevens Institute of Technology

Janusz A. Starzyk, Ohio University

Outline

Introduction;

Online curve fitting principles ;

Network architecture and operation;

Simulation analysis;

Conclusion and future research;

2/22

Introduction: Why value system is important?

From traditional AI to the embodied intelligence:

Rat Neurons can fly F- 22 jet

Intelligent machine

Picture source: www.space.com

State

S t

Reward r t r t

1

S t

1

Environment

Action a t

 Make value judgments according to received information;

 Develop sensory-motor coordination to actively interaction with environment;

 Develop internal value system and apply it to decision making;

3/22

Introduction: What is the value signal?

Different applications will have different definition of value signal, but we define the value signal as an expected reward or desired objective for machine’s action.

Motivation: Goal-driven learning

To provide a mechanism for the intelligent machines to be able to dynamically estimate the value function in reinforcement learning

(specify “good” from “bad”), therefore guiding the machines to adjust its actions to achieve the goal.

Source: Biologically inspired robot at CWRU http://biorobots.cwru.edu/

4/22

Introduction: self-organizing learning array

(SOLAR)

Characteristics:

* Self-organization

* Sparse and local interconnections

* Dynamically reconfigurable

* Online data-driven learning

System clock Remote neurons

Nearest neighbour neuron

Other Neurons

II: information index

ID: information deficiency

5/22

How can value system help here?

Supervisor is not always available in the learning environment

Uncertain (no prior knowledge) external environment

Supervisor is not always necessary in the learning environment

– How learning happens in a one-year old baby

Source:

Sociable humanoid robots: Kismet at MIT Artificial Intelligence Lab

6/22

The challenges

Unstructured environment/uncertain information

Limited availability of information;

Information ambiguity and redundancy;

High dimensionality of the data set;

Time variability of the information;

7/22

Outline

Introduction;

Online Curve Fitting Principles;

Network architecture and operation;

Simulation analysis;

Conclusion and future research;

8/22

Online dynamic curve fitting

Consider dynamic adjustment of the fit function described by a linear combination of the selected base functions:

Y

 a

1

 

1

 a

2

 

2

......

 a q

  q

 a a

...

a q

1

2

 T 

1  T

Y

Y

1

 i n

1

 i n

1 i n

1

1 i

1 i

1 i

...

1 i

2 i qi

Storage requirements:

2

...

 q

 a a

2

...

a q

1

 

* A i n

1

1 i

2 i i n

1

2 i

2 i

...

i n

1

2 i

 qi

...

...

...

...

i n

1

1 i

 qi i n

1

2 i

...

 i n

1

 qi

 qi qi

1

 i n

1

 i n

1

...

i n

1

1 i

Y i

2 qi i

Y i

Y i

 s

 q ( q

1 )

 q

2

9/22

Value

Three curve fitting versus single curve fitting

A

Value A

Upper Curve

Neutral Curve

Lower Curve

B

B

Data dimension

Data dimension

Three curve fitting:

 Neutral Curve: a least square fit (LSF) fits to all the data samples in the space

 Upper Curve: only fits to the data points which are above the neutral curve.

 Lower Curve: only fits to the data points which are below the neutral curve

10/22

Differential Based Voting: d

1 i

 v ni

 v ui d

2 i d i

 d v ni

1 i

 v li d

2 i

2

Decision integration

Value

Vui

Vni

Vli

Upper Curve

Neutral Curve d1i d2i

Input data w i

1 d i v vote

 i k 

1

 v ni w i i k 

1 w i

Lower Curve

Input

11/22

Implementation of TCF

Value

V_true

Vni

New received point

Upper Curve-after modification

Upper Curve-before the new point is received

Neutral Curve-after modification

Pseudo code:

Lower Curve-keep unchanged

Neutral Curve-before the new point is received

Data dimension

{New data sample comes;

Modify the neutral curve;

Difference = v ni

 v true

If (Difference >= 0)

{ Modify the lower curve;

Keep the upper curve unchanged;} else

{ Modify the upper curve;

Keep the lower curve unchanged;} end end}

12/22

Outline

Introduction;

Online Curve Fitting Principles;

Network architecture and operation;

Simulation analysis;

Conclusion and future research;

13/22

Value system architecture

A pipelined dynamic architecture:

To all the processing elements in each layer

Value

C h a n n e l

Data samples

Bidirectional signal channel

Vn1

W1

Vn2

W2

C h a n n e l

Vni

Wn v

1 w

1

  w i v i

1

  w i

1

Data PE

Information PE

C h a n n e l v l w l

  w i v i l

  w i l

 v l

1

 w l

1

IPN a n n e l

C h

Communication Channel

Bidirectional signal channel

DPN

Final

Value

14/22

Inside a value system

Value

Input 1

Input 2

Processing

Element

Fitted value To Differential Voting

Transform function output

To another PE’s input

Input space transform function

Curve fitting

Fitted value

Transform function output

15/22

Outline

Introduction;

Online Curve Fitting Principles;

Network architecture and operation;

Simulation analysis;

Conclusion and future research;

16/22

Simulation analysis

Financial data analysis bank prime loan rate prediction

Data sets are available from: www.forecasts.org

Input:

 Monthly bank prime loan rate;

 Discount rate;

 Federal funds rate;

 Ten-year treasury constant maturity rate;

“market is unpredictable”

 Random Walk Hypothesis;

 Efficient Market Hypothesis;

Output:

 Next month’s bank prime loan rate

Training period:

 January 1995 to December 2000

Testing period:

 February 2001 to September 2002

17/22

Prediction results

Bank prime loan rate prediction by value system

(February 2001 to September 2002)

18/22

Result comparison: MSE error

Performance comparision

0.6

0.5

0.4

MSE error 0.3

0.2

0.1

0

Learning accuracy Prediction accuracy

Hybrid iterative evolutionary fuzzy neural network in [8]

Genetic fuzzy neural learning algorithm in [9]

Proposed value system

19/22

Outline

Introduction;

Online Curve Fitting Principles;

Network architecture and operation;

Simulation analysis;

Conclusion and future research;

20/22

Conclusion and future research

 Provide a mechanism for the intelligent machines to be able to dynamically estimate the value function;

 Dynamic online data driven learning;

 No backpropagation required;

 Three curve fitting method;

 General framework for different implementations

21/22

Future research

 Dynamically self-reconfigurable;

 Investigate different input transformation and base functions;

 Hardware implementation;

 Facilitate goal-driven learning;

 Integration with reinforcement learning within a realistic environment;

A promising future?

Ray Kurzweil predicted:

We achieve one Human Brain capability for $1,000 around the year 2023, for one cent around the year 2037;

We achieve one Human Race capability for $1,000 around the year 2049, for one cent around the year 2059.

---from

“The Law of Accelerating Returns” by Ray Kurzweil

Source: www.kurzweilai.net

22/22

Download