Why BICA is Necessary for AGI

advertisement
Biologically Inspired Cognitive Architecture
Why BICA is Necessary for AGI
Alexei Samsonovich (George Mason University)
1
Questions
Answers

Why BICA is necessary
for achieving AGI?

Because we need a humanlike universal learner

What kind of a BICA?

One that describes human
cognition and learning at a
higher symbolic level

What are the minimal
starting requirements, i.e.,
the “critical mass”?

“Critical mass” includes
human-like mental states
that can act on each other
2
A few words about GMU-BICA
3
Mental states in GMU-BICA

A mental state in GMU-BICA includes:

Contents of awareness represented by schemas

A token representing an instance of the Self who is aware
(labeled I-Now, I-Next, etc.)
Episodic memory:
Frozen mental
states of the Self
I-Past-1
I-Past-2
I-Past-3
I-Past-4
Working memory: Active mental states of the Self
I-Goal:
I-Meta:
I-Past:
•Scenario
I-Imagine:
•Past
experience
•Analysis
•Intermediate
goal situation
•Prospective
memories
•Stimulus
satisfaction
I-Previous:
I-Now:
•Ideas
•Ideas
I-Next:
•Visual input
•Intent
•Scheduled action
•Expectation
4
Mental state dynamics in working memory
of GMU-BICA: an example
Working memory
me'
I-Now
S
me
he
He-Now
me
S
Inputoutput
he'
S
me'
He-Next
I-Next
me
me
S
S
he
Semantic
memory
S
R
P
Q
5
Examples of types of mental states in GMUBICA (a possible snapshot of working memory)
I-Goal
I-Meta-1
I-Alt-Goal
I-Imagined-2
I-Meta-2
I-Imagined-1
I-Imagined-3
I-Subgoal
I-Next
I-Previous
I-Now
I-Next-Next
I-False-Belief
I-Past
I-Past-Revised
She-Past-Prev
I-Detail-1
She-Past
I-Detail-2
He-Now
I-Feel
He-Now-I-Now
6
Models that we need to integrate
7
Self-regulated learning (SRL)
model of problem solving
Ground
Object
Meta-
Level
Level
Level
Perception
Doing
Monitoring
Reasoning
Metareasoning
Model of meta-cognition
(Cox & Raja, 2007)
Performance
SRL in problem solving
Control
Task
analysis
Identifying data, goals and barriers
Selecting strategies and planning
Assessing expected self-efficacy
Goalorientation
Predicting outcomes, confidence
Understanding values of outcomes
Setting attitudes toward (sub)goals
Self-reflection
Action
Selection
Forethought
“…there is a need
to build a unified
model of metacognition and selfregulated learning
that incorporates
key aspects of
existing models,
assumptions,
processes,
mechanisms, and
phases”
(Azevedo
and Witherspoon,
AAAI BICA-2008)
Selfobservation
Introspective self-monitoring
Metacognitive self-analysis
Selfcontrol
Self-instruction and attention control
Self-imagery, self-experimentation
Selfjudgment
Standard-based self-evaluation
Selfreaction
Self-adaptation, conflict resolution
Causal self-attribution of outcomes
Self-rewarding, updating self-image
(based on Zimmerman & Kitsantas, 2006)
8
Result: A Mental-state model of SRL
I-Meta
HW Problem:
Solve for x: ax+b=c
I-Meta-Next
Forethought
Performance
Reflection
Task analysis
Self-beliefs
Self-control
Self-observation
Self-judgment
Self-reactions
I-Now
I-Next
I-Next-Next
Task analysis
Self-beliefs
Self-control
Self-observation
Self-evaluation
Self-reaction
Identify goal
Select strategic
steps (a plan)
Self-efficacy
Goal-orientation
Intrinsic interest
Enact selected
steps to solve the
problem
Self-recording
using a worksheet
Compare result
to the standard
(a template)
Met standard
Skill mastered
Self-reward
(Exit) -- OR –
Did not meet
standard
Attribute
failure to
ineffective
strategy
selection
(Loop reentry)
I-Detail-1
I-Goal
I-Detail-2
Homework task
Select strategic steps
Enact strategic steps
Result validation
Problem: ax+b = c
Goal: Solve for x, i.e.,
have a formula x=…
Isolate x
- use subtraction property
- use division property
ax+b = c
ax = c-b
x = (c-b)/a
x=(c-b)/a compare to
x = …(no x in r.h.s.)
There is a match.
| -b
| /a
(Samsonovich, De Jong & Kitsantas, to appear in International Journal of Machine Consciousness, 1, June 2009)
9
Final take-home message
10
- How to build a universal learner?
- Need to bootstrap from “critical mass” ( )
- How to build a “critical mass” (suppose we know what)?
There are at least three approaches to building a “critical mass”:
1. Incremental
bottom-up
engineering
Without a good
stimulus will take
forever
2. Brittle rapid
prototypedemo
Useless toy
(BICA Phase I)
Watch for AAAI 2009 Fall Symposia (BICA, SRL-metacog)
3. SRL assistant
(finessing lower
levels by students!)
Feasible and
practically useful
stepping stone
Thank you.
11
End of Talk 1 / Beginning of
Talk 2
12
A Cognitive Map of Natural
Language
Alexei Samsonovich (George Mason University)
13
Theory
14
Introducing two notions of a semantic
cognitive map (SCM):

“Strong” SCM with a
dissimilarity metric
B
A

C
A is closer to B than C 
A is more similar to B
than C

“Weak” SCM that
captures both synonym
and antonym relations
A B
C
A and B are synonyms,
A and C are antonyms.
Don’t care about unrelated.
15
Background: Method of building an SCM
1.
2.
3.
Represent symbols
(words, documents,
etc.) as vectors in Rn
Optimize vector
coordinates to
minimize H
Do truncated SVD
of the resultant
distribution
dot product
(a ) H   xy   xy   x 4
A
S
Q
(b) H   x  y   xy   x 4
2
S
A
Q
(c ) H   x  y   x  y   x 4   x 2
2
S
2
A

Q
(d ) H   x  y   exp  x  y
2
S
S
2

Q
c
x, y  Q – vectors in Rn
A
– antonym pairs
S
– synonym pairs
(Samsonovich & Ascoli, Proceedings of AGI-2007)
16
Example: color map

Sample N = 10,000 points on a
sphere (A)




declare some pairs of points ‘synonyms’
(some of those that are close to each
other)
declare some other pairs of points
‘antonyms’ (some of those that are
separated far apart)
assign random coordinates to points in
10-dimensional space (B)
apply an optimization procedure to the
set of 10,000 random vectors in order to
minimize the following energy function:
H
 xy   xy   x
xy A

A
xyS
B
C
4
x
The result is the reconstructed
spatial distribution of colors (C)
17
Geometric properties of the reconstructed color map are robust
with respect to variation of model parameters
18
Results for Synonym-Antonym
Dictionaries
19
Optimization results
20
positive
PC #1(valence)
negative
Individual words
clear
well
accept
praise
support
good
right
respect
increase
improve
decline
poor
stop
uncertain
fail
reject
sad
deny
vague
bad
stiff
tough
hard
heavy
serious
extreme
deep
loud
tense
intense
calm
soft
relaxed
mild
easy
gentle
modest
quiet
calmly
easygoing
close
final
detain
restraint
confine
swallow
restrain
local
wait
compact
release
go
fire
free
freedom
independent
new
expose
far
brief
Antonym pairs
Sorted words and antonym pairs (MS Word English)
PC #2(arousal)
PC #3(dominance)
exciting, tough calming, easy close, dominate open, free
accept
clear
good
support
praise
well
respect
continue
happy
right
decline
lose
poor
neglect
criticize
badly
deny
stop
sad
wrong
stiff
hard
fierce
tough
serious
tense
severe
strict
loud
heavy
relaxed
soft
calm
easy
mild
relax
gentle
easygoing
quiet
insignificant
restrain
close
stay
restraint
restricted
cushion
take somebody on
experienced
hold back
final
release
open
go
freedom
free
expose
fire
new
let go
first
21
Geometric characteristics of the SCM
(MS Word English)
22
Semantic characteristics of the SCM
Synonym pairs and antonym pairs, if
mixed together, can be separated
with 99% accuracy based on the
angle between vectors:
synonyms
acute  synonyms,
obtuse  antonyms
antonyms
*
Semantics of the first 3 dimensions
are more general than any words,
yet clearly identifiable:
PC#1: success, positive, clear,
makes good sense
PC#2: exciting, does not go easy
PC#3: beginning, source, origin,
release, liberation, exposure
23
PC-by-PC correlation across languages and datasets
WN
PC1
English PC2
PC3
PC4
MS
PC1
French PC2
PC3
PC4
MS
PC1
German PC2
PC3
PC4
MS
PC1
Spanish PC2
PC3
PC4
ANEW D1
D2
D3
PC1
0.73
-0.23
0.12
0.029
0.74
-0.01
-0.034
0.056
0.73
-0.081
0.049
-0.089
0.67
-0.20
0.17
0.0014
0.80
0.052
0.0085
MS English
PC2
PC3
0.20
-0.06
0.64
0.18
-0.13
0.57
-0.022 0.001
0.0057 0.0004
0.41
0.24
-0.33
0.37
0.066
-0.0058
0.037
0.025
0.21
0.16
-0.16
0.26
0.007
0.014
0.037
-0.046
0.45
-0.13
-0.056 0.46
0.33
0.19
-0.19
0.20
0.39
0.26
0.22
0.094
PC4
-0.031
0.22
0.13
0.30
0.034
0.14
0.0097
0.021
0.056
0.097
0.029
0.026
-0.014
0.14
0.066
0.18
0.21
0.22
-0.22
Canonical Corr. Coef.
0.78
0.72
0.63
0.52
0.75
0.54
0.49
0.27
0.78
0.57
0.46
0.24
0.71
0.62
0.60
0.45
0.83
0.55
0.37
24
Clustering of words in the first SCM
dimension: WordNet and ANEW vs. MS Word
25
Applications
26
Examples of “semantic twisting”
27
Sentiment analysis: 7 utterances
automatically allocated on SCM
1.
Please, chill out and be quiet. I am bored
and want you to relax. Sit back and listen
to me.
2.
Excuse me, sorry, but I cannot follow you
and am falling asleep. Can we pause? I've
got tired and need a break.
3.
I hate you, stupid idiot! You irritate me! Get
disappeared, or I will hit you!
4.
What you are telling me is terrible. I am
very upset and curious: what's next?
5.
Wow, this is really exciting! You are very
smart and brilliant, aren't you?
6.
I like very much every word that you say.
Please, please, continue. I feel like I am
falling in love with you.
7.
We have finally found the solution. It looks
easy after we found it. I feel completely
satisfied and free to go home.
(Samsonovich & Ascoli, in Proc. of AAAI
2008 Workshop on Preference Handling)
28
Sentiment analysis:
Mapping movie reviews as ‘bags of words’
 Acquired 40+
reviews for each of
three movies: Iron
Man, Superhero
and Prom Night,
from the site
www.mrqe.com
 For each review,
computed the
average map
coordinate of all
identified indexed
words and phrases.
 RESULT:
Statistics for PC#1
are consistent with
grades given to
the movies in the
reviews.
Iron Man: (1.95, 0.52), Superhero: (1.49, 0.36), Prom Night: (1.17, 0.42)
All differences are significant except PC#2 of Superhero vs. Prom Night
29
CONCLUSIONS
Weak SCM is low-dimensional,
yet distinguishes almost all
synonym-antonym pairs
Therefore, SCM can be used as
a metric system for semantics
(at least for the most general
part of semantics)
SCM dimensions have clearly
identifiable semantics that
make sense virtually in all
domains of knowledge
SCM can be used to guide the
process of thinking in symbolic
cognitive architectures
The map semantics and
geometrical characteristics are
consistent across corpora and
across languages
Other potential applications
include sentiment analysis,
semantic twisting, document
search, validation of translation
Credits to Giorgio A. Ascoli, Rebecca F. Goldin, Thomas T. Sheehan
Thank you.
30
Download