Novamente: An Integrative Architecture for General Intelligence Moshe Looks , Ben Goertzel

advertisement
Novamente: An Integrative Architecture for General Intelligence
Moshe Looks1, Ben Goertzel2 and Cassio Pennachin2
1. Object Sciences Corporation
6359 Walker Lane Suite 100
Alexandria, VA 22310
mlooks@objectsciences.com
2. Novamente LLC
11160 Veirs Mill Road, L-15 Suite 161
Wheaton, MD 20902
ben@novamente.net, cassio@novamente.net
Abstract
The Novamente AI Engine is briefly reviewed. The overall
architecture is unique, drawing on system-theoretic ideas
regarding complex mental dynamics and associated emergent patterns. We describe how these are facilitated by a
novel knowledge representation which allows diverse cognitive processes to interact effectively. We then elaborate the
two primary cognitive algorithms used to construct these
processes: probabilistic term logic (PTL), and the Bayesian
Optimization Algorithm (BOA). PTL is a highly flexible inference framework, applicable to domains involving uncertain, dynamic data, and autonomous agents in complex environments. BOA is a population-based optimization algorithm which can incorporate prior knowledge. While originally designed to operate on bit strings, our extended version also learns programs and predicates with variable
length and tree-like structure, used to represent actions, perceptions, and internal state. We detail some of the specific
dynamics and structures we expect to emerge through the
interaction of the cognitive processes, outline our approach
to training the system through experiential interactive learning, and conclude with a description of some recent results
obtained with our partial implementation, including practical work in bioinformatics, natural language processing, and
knowledge discovery.
Introduction and Motivation
The primary motivation behind the Novamente AI Engine
is to build a system that can achieve complex goals in
complex environments, a synopsis of the definition of intelligence given in (Goertzel 1993). The emphasis is on the
plurality of goals and environments. A chess-playing program is not a general intelligence, nor is a data mining engine, nor is a program that can cleverly manipulate a researcher-constructed microworld. A general intelligence
must be able to carry out a variety of different tasks in a
variety of different contexts, generalizing knowledge
between contexts and building up a context and task independent pragmatic understanding of itself and the world.
First among the tenets underlying the design is an understanding of mind as the interpenetration of a physical sys-
tem with an abstract set of patterns, where a pattern is
quantified in terms of algorithmic information theory (Goertzel 1997, Chaitin 1987). In essence, a pattern in an entity is an abstract program that is smaller than the entity,
and can rapidly compute the entity or its approximation.
For instance, a pattern in a drawing of a sine curve might
be a program that can compute the curve from a formula.
The understanding of mind as pattern ties in naturally
with the interpretation of intelligence, at the most abstract
level, as a problem of finding compact programs that encapsulate patterns in the environment, in the system itself,
and in behavior. This concept was first seriously elaborated
in Solomonoff’s work on the theory of induction (Solomonoff 1964, Solomonoff 1978), and has been developed
more rigorously and completely in Hutter’s recent work
(Hutter 2000), which integrates a body of prior work on
algorithmic information theory and statistical decision theory to formalize the concept of general intelligence. Novamente can be proven to be arbitrarily intelligent according to Hutter’s definition, if given sufficient computational
resources. Baum has expounded the cognitive science implications of this perspective on intelligence (Baum 2004).
Another tenet of the Novamente approach is the realization that intelligence most naturally emerges through situated and social experience. Abstract thoughts and representations are facilitated through the recognition and manipulation of patterns in environments with which a system has
sensorimotor interaction; see for example (Boroditsky and
Ramscar 2002). This interaction, embodied in the right
cognitive architecture, leads to autonomy, experiential interactive learning, and goal-oriented self-modification; a
mind continually adapts based on what it learns from its
environment and the entities it interacts with.
The final tenet is a view of the internal organization of a
mind as a collection of semi-autonomous agents embedded
in a common substrate (Goertzel 1993). In this vein, Novamente is less extreme than some alternative approaches
such as Minsky’s Society of Mind (Minsky 1986), where
agents are largely independent. Novamente is based on the
idea that minds are self-organizing systems of agents,
which interact with some degree of individual freedom, but
are also constrained by an overall architecture involving a
degree of inbuilt executive control, which nudges the selforganizing dynamics towards emergent hierarchical organization.
These abstract principles are coherently unified in a philosophy of cognition called the psynet model (Goertzel,
1997), which is foundational to Novamente, and provides a
moderately detailed theory of the emergent structures and
dynamics in intelligent systems. In the model, mental functions such as perception, action, reasoning and procedure
learning are described in terms of interactions between
agents. Any mind, at a given interval of time, is assumed to
have a particular goal system, which may be expressed
explicitly and/or implicitly. Thus, the dynamics of a cognitive system are understood to be governed by two main
forces: self-organization and goal-oriented behavior. More
specifically, several primary dynamical principles are posited, including:
Association. Patterns, when given attention, spread some
of this attention to other patterns that they have previously
been associated with in some way. Furthermore, there is
Peirce’s “law of mind” (Peirce 1892), which could be
paraphrased in modern terms as stating that the mind is an
associative memory network, whose dynamics dictate that
every idea in the memory is an active agent, continually
acting on those ideas with which the memory associates it.
Differential attention allocation. Patterns that have been
valuable for goal-achievement are given more attention,
and are encouraged to participate in giving rise to new patterns.
Pattern creation. Patterns that have been valuable for
goal-achievement are mutated and combined with each
other to yield new patterns.
Credit Assignment. Habitual patterns in the system that
are found valuable for goal-achievement are explicitly reinforced and made more habitual.
Furthermore, the network of patterns in the system must
give rise to the following large-scale emergent structures:
Hierarchical network. Patterns are habitually in relations
of control over other patterns that represent more specialized aspects of themselves.
Heterarchical network. The system retains a memory of
which patterns have previously been associated with each
other in any way.
Dual network. Hierarchical and heterarchical structures
are combined, with the dynamics of the two structures
working together harmoniously.
Self structure. A portion of the network of patterns forms
into an approximate (fractal) image of the overall network
of patterns.
Structures and Algorithms
The psynet model does not tell you how to build a mind,
only, in general terms, what a mind should be like. It
would be possible to create many different AI designs
based loosely on the psynet model; one example of this is
the Webmind AI Engine developed in the late 1990’s (Go-
ertzel et al. 2000, Goertzel 2002). Novamente, as a specific
system inspired by the psynet model, owes many of its
details to the limitations imposed by contemporary hardware performance and software design methodologies.
Furthermore, Novamente is intended to utilize a minimal
number of different knowledge representation structures
and cognitive algorithms.
Regarding knowledge representation, we have chosen an
intermediate-level atom network representation which
somewhat resembles classic semantic networks but has
dynamic aspects that are more similar to neural networks.
This enables a breadth of cognitive dynamics, but in a way
that utilizes drastically less memory and processing than a
more low-level, neural network style approach. The details
of the representation have been designed for compatibility
with the system’s cognitive algorithms.
Regarding cognition, we have reduced the set of fundamental algorithms to two: Probabilistic Term Logic (PTL)
and the Bayesian Optimization Algorithm (BOA). The
former deals with the local creation of pieces of new
knowledge from existing pieces of knowledge; the latter is
more oriented towards global optimization, and creates
new knowledge by integrating large amounts of existing
knowledge. These two algorithms themselves interact in
several ways, representing the necessary interdependence
of local and holistic cognition.
Having reduced the basic knowledge representations and
cognitive algorithms to this minimal core, the diverse functional specializations required for pragmatic general intelligence are provided by the introduction of a number of
node and link types in the atom network, and a high-level
architecture consisting of a number of functionally specialized lobes each deploying the same structures and algorithms for particular purposes (see Figure 1 below).
Knowledge Representation
Knowledge representation in Novamente involves two levels, the explicit and the emergent. This section focuses on
the explicit level; the emergent level involves selforganizing structures called maps, and will be discussed
later, after the fundamental cognitive dynamics have been
introduced.
Explicit knowledge representation in Novamente involves discrete units (atoms) of several types: nodes, links,
and containers, which are ordered or unordered collections
of atoms. Each atom is associated with a truth value, indicating, roughly, the degree to which it correctly describes
the world. Novamente has been designed with several different types of truth values in mind; the simplest of these
consists of a pair of value denoting probability and weight
of evidence. All atoms also have an associated attention
value, indicating how much computational effort should be
expended on them. These contain two values, specifying
short and long term importance levels.
Novamente node types include tokens which derive their
meaning via interrelationships with other nodes, nodes
representing perceptual inputs into the system (e.g., pixels,
Figure 1. Each component is a Lobe, which contains multiple atom types and mind agents. Lobes may span multiple machines, and are controlled by schemata which may be adapted/replaced by new ones learned by Schema Learning, as decided
by the Schema Learning Controller. The diagram shows a configuration with a single interaction channel, that contains sensors, actuators and linguistic input; real deployments may contain multiple channels, with different properties.
points in time, etc.), nodes representing moments and
intervals of time, and procedures (described below). Links
represent relationships between atoms, such as fuzzy set
membership, probabilistic logical relationships, implication, hypotheticality, and context. The particular types and
subtypes used, and the justifications for their inclusion, are
omitted for brevity.
Procedures in Novamente are objects which produce an
output, possibly based on a sequence of atoms as input.
They may contain generalized combinator trees. These are
computer programs written in sloppy combinatory logic, a
language that we have developed specifically to meet the
needs of tightly integrated inference and learning.
Combinatory logic (CL) is a simple yet Turing-complete
computational system (Curry and Feys 1958). The basic
units are combinators, which are higher order functions
that always produce new higher order functions when applied. Beyond combinators, our language contains numbers, arithmetic operators, looping constructs, and conditionals. Procedures may also contain embedded references
to other procedures. Two unique features of this language
that are advantageous for our purposes are that programs
can be expressed as binary trees where the program elements are contained in the leaves, and that variables are not
necessary (though they may be introduced where useful).
Furthermore, we have generalized the evaluation system of
combinatory logic so there are no type restrictions on programs, allowing them to be easily modified and evolved by
Novamente’s cognitive processes (hence sloppy). A full
exposition of the language is omitted for brevity; see
(Looks, Goertzel, and Pennachin 2004).
Schemata and predicates are procedures that output at-
oms and truth values, respectively. Special-purpose predicates, instead of containing combinator trees, represent
specific queries that report to the Novamente system some
fact about its own state. Predicates may also be designated
as goal nodes, in which case the system orients towards
making them true.
Cognitive Algorithms
Novamente cognitive processes make use of two main algorithms, Probabilistic Term Logic (PTL), and the Bayesian Optimization Algorithm (BOA), described below.
Probabilistic Term Logic
PTL is a highly flexible inference framework, applicable to
many different situations, including inference involving
uncertain, dynamic data and/or data of mixed type, and
inference involving autonomous agents in complex environments. It was designed specifically for use in Novamente, yet also has applicability beyond the Novamente
framework; see (Goertzel et al. 2004) for a full exposition.
The goals motivating the development of PTL were the
desire to have an inference system that:
• Operates consistently with probability theory, when deployed within any local context (which may be adaptively identified).
• Deals rapidly (but not always perfectly accurately) with
large quantities of data, yet allows arbitrarily accurate
and careful reasoning on smaller amounts of information, when this is deemed appropriate.
• Deals well with the fact that different beliefs and ideas
are bolstered by different amounts of evidence (for an
explanation of how traditional probabilistic models fail
here, see Wang 1993).
• Enables robust, flexible analogical inference between
different domains of knowledge (Indurkhya 1992)
• Does not require a globally consistent probability model
of the world, but is able to create locally consistent models of local contexts, and maintain a dynamically-almostconsistent overall world-model, dealing gracefully with
inconsistencies as they occur.
• Encompasses both abstract, precise mathematical reasoning and more speculative hypothetical, inductive, and/or
analogical reasoning.
• Encompasses the inference of both declarative and procedural knowledge.
• Deals with inconsistent initial premises by dynamically
iterating into a condition of "reasonable almostconsistency and reasonable compatibility with the premises", thus, for example, perceiving sensory reality in a
way compatible with conceptual understanding, in the
manner portrayed by Gestalt psychology (Kohler, 1993)
and developed in the contemporary neural network literature, see (Haikonen 2003).
• Makes most humanly simple inferences appear brief,
compact and simple. For a sustained argument that term
logic exceeds predicate logic in this regard, see (Sommers and Englebretsen, 2000).
One difference between PTL and standard probabilistic
frameworks is that PTL deals with multivariable truth values. Its minimal truth value object has two components:
strength and weight of evidence. Alternately, it can use
probability distributions (or discrete approximations
thereof) as truth values. This, along with the fact that PTL
PTL does not assume that all probabilities are estimated
from the same sample space, makes a large difference in
the handling of various realistic inference situations.
Another difference is PTL’s awareness of context. The
context used by PTL can be universal (everything the system has ever seen), local (only the information directly
involved in a given inference), or many levels between.
This provides a way of toggling between more rigorous
and more speculative inference, and also a way of making
inference consistent within a given context even when a
system’s overall knowledge base is not entirely consistent.
First-order PTL deals with probabilistic inference on
(asymmetric) inheritance and (symmetric) similarity relationships, where different Novamente link types are used
to represent intensional versus extensional relationships
(Wang, 1995). The inference rules here are deduction
(AB, BC |- A C), inversion (Bayes rule), similarityto-inheritance-conversion, and revision (which merges
different estimates of the truth value of the same atom).
Each inference rules comes with its own quantitative truth
value formula, derived using probability theory and related
considerations. Analogical inference is obtained as a combination of deductive and inversive inference, and via their
effects on map dynamics (described later).
Higher-order PTL deals with inference on links that
point to links rather than nodes, and on predicates and
schemata. The truth value functions here are the same as in
first-order PTL, but the interpretations of the functions are
different. Variable-free inference using combinators and
inference using explicit variables and quantifiers are both
supported; the two styles may be freely intermingled.
The Bayesian Optimization Algorithm
BOA is a population-based optimization algorithm that
works on bit strings. BOA significantly outperforms the
genetic algorithm on a range of simple optimization problems by maintaining a centralized probabilistic model of
the population it is evolving (Pelikan, Goldberg, and
Cantú-Paz 1999; Pelikan 2002). When a candidate population has been evaluated for fitness, BOA seeks to uncover
dependencies between the variables that characterize
“good” candidate solutions (e.g., the correct value for position 5 in the genome depends on the value at position 7),
and adds them to its model. In this way, it is hoped that
BOA will explicitly discover and utilize probabilistic
“building blocks”, which are then used to generate new
candidate solutions to populate the next generation. The
basic algorithm is as follows (adapted from Pelikan 2002):
(1)
(2)
(3)
(4)
Generate a random initial population P(0).
Use the best instances in P(t) to learn a model M(t).
Generate a new set of instances O(t) from M(t).
Create P(t+1) by merging O(t) and P(t) according to
some criteria.
(5) Iterate steps (2) through (4) until termination criteria
are satisfied.
This algorithm tends to preserve good collections of
variable assignments throughout the evolution, and can
explore new areas of the search space in a more directed
and focused way than GA/GP, while retaining the positive
traits of a population-based optimization algorithm (diversity of candidate solutions and non-local search).
For Novamente, we have extended BOA to evolve programs, written in our sloppy combinatory logic representation, rather than bit strings (Looks, Goertzel, and Pennachin 2004). Previous work by Ocenasek (Ocenasek
2002) has extended binary BOA to fixed-length strings
with non-binary discrete and continuous variables. There is
a fundamental difference between learning fixed length
strings and programs. In the former, one is evolving individuals with a fixed level of complexity and functionality.
In order to evolve non-trivial program trees however, one
must rely on incremental progress; a simpler program accretes complexity over time until it is correct. While BOA
as described above is effective at optimizing and preserving small components, it does not innately lead to the addi-
tion of new components. In order to remedy this, we have
added probabilistic variables to the instance generation
process that, when activated, have effects similar to crossover in genetic programming; see (Looks, Goertzel, and
Pennachin 2004) for details and examples.
We have also begun using BOA to discover surprising
patterns in large bodies of data, using an approach we call
pattern mining. In this application, BOA is given a number
of predefined predicates, such as isTall(X),
loves(X,Y), isMale(X), etc. Patterns are logical
combinations of predicates, e.g., isTall(X) AND
isMale(X). Based on the overall philosophy behind Novamente, patterns are evaluated based on “interestingness”
which is composed of two factors: pattern-intensity, and
novelty relative to the system’s current knowledge base.
Pattern-intensity refers to how well a pattern compresses
regularities in the system’s knowledge; this can be quantified as the difference between the actual frequency of the
expression, and the frequency that would be computed
assuming probabilistic independence. In a domain consisting of random men and women, the example given above
would be intense, because tallness correlates with maleness. If this correlation were not known to the system nor
easily derivable by the system, then it would be significant
and acceptably novel, thus being considered “interesting”.
This pattern mining approach might run into scalability
issues, as the predicate space tends to be very large. We
can resolve this problem by encoding the fact that varying
degrees of similarity exist between predicates. When similarity is meaningfully quantified, an important cognitive
mechanism used in creative thought called slippage comes
into play, where ideas are transformed by substituting one
concept for another in an intelligent, context-dependant
fashion (Hofstadter 1986). We incorporate prior similarity
information by embedding predicates in real spaces, so that
similar predicates are embedded close to each other. When
a refinement of this approach is used, BOA can construct
new procedures that make use of existing ones, leading to
hierarchical design.
A number of factors make our modified BOA variant
advantageous for use within Novamente. The centralized
probabilistic model used can be constructed with the aid of
PTL inference and prior knowledge, allowing them to be
used in instance generation. Contrariwise, fitness evaluation of instances generated by BOA with a particular
model can be used to revise existing knowledge in the system, and infer new knowledge. As with most evolutionary
techniques, BOA can also be used to perform a number of
learning tasks inside Novamente, such as categorization,
unsupervised clustering of atoms, and function learning, by
using appropriate fitness functions.
Cognitive Processes
This section presents the most important cognitive processes that take place in Novamente. All of the cognitive
processes described below are encapsulated within one or
more mind agents; software objects which dynamically
update the atoms in the system on an ongoing basis.
In the present Novamente code, mind agents are coded
directly in C++. However, our intention is to ultimately
replace these C++ objects with schemata written in sloppy
combinatory logic, which will enable Novamente to reason
about and modify its own cognitive mechanisms, thus
allowing thoroughly self-modifying and self-improving
general intelligence. The execution of this plan awaits only
the implementation of some optimizations in the schema
execution framework.
Inference
The most direct application of PTL is to seek interrelations
between sets of important, apparently interrelated atoms,
leading to the creation of links and predicates joining important atoms. As a compact way of storing/computing
when entities are interrelated, dimensional embeddings is
used. Additionally, important predicates are evaluated, and
links are constructed between them. This process is the
primary creator of relations between atoms in Novamente.
Learning of Cognitive Schemata
A critical issue here is inference control, which is provided
by a combination of hard-coded algorithms and learned
programs called cognitive schemata. A key milestone in
Novamente intelligence will occur when BOA+PTL learning is able to learn cognitive schemata of complexity equal
to that of Novamente mind agents. At this point, Novamente will be able to improve its own cognitive processes, which should lead to a phase of exponentially increasing intelligence.
Pattern Mining and Concept Creation
Pattern mining is performed on all of the atoms in the system, and new predicates are created encapsulating these
patterns. It is also used to create new schemata through the
combination of existing ones. Pattern mining can also be
restricted to create new nodes representing logical combinations of existing nodes.
Goal-Directed Behavior
Schemata that are expected to achieve current goals are
executed, the connection between the schemata and the
goals having been found through inference. Furthermore,
inference is used to search for schemata that will achieve
particular goals in important contexts, including, as mentioned above, abstract cognitive control schemata.
System Maintenance
As time passes, system status and goal satisfaction are reevaluated, new percepts are fed into the system, and inference is used to perform context-dependant time-decay on
knowledge. Atoms with the lowest long term importance
are deleted from the system. System parameters are adjusted to optimize the system’s behavior and prevent its
degradation.
Attention Allocation
Emergent Structures
Novamente, at any given time, will possess a huge number
of atoms. In order to cope with restrictions on computational resources, it is critical that the system be able to intelligently focus its attention on the subset of atoms that is
most important at the moment.
Attention allocation is done through the application of
inference and predicate learning to activity tables, which
record when the different cognitive processes are applied
and to which atoms. The goal is to intelligently update the
short and long term importance levels as time passes; the
former in order to move from focusing on one set of atoms
to focusing on another set, and the latter to assign credit
based on the utility of carrying out the cognitive processes
on different atoms, and of determining which schemata
lead to goal fulfillment in different contexts.
Short and long term importance are updated using the
same criterion but on different time scales, based on the
distinction between what Novamente is thinking about and
what Novamente considers important over time.
Importance enters the system via atoms associated with the
perception of events in the outside world, and atoms expected to lead to goal fulfillment. Included in the latter
category is the meta-level goal of adding valuable knowledge to the system; if thinking about something has been
useful in the past, it may be useful to think about it in the
future. Importance is differentially spread along links via
inference, based on the link type.
An important aspect of credit assignment is dealing with
false causality; for example, roosters often crow prior to
the sun rising. If the system observes this, and wants to
cause the sun to rise, it may well cause a rooster to crow.
But once this fails, or if the system holds background
knowledge indicating that roosters crowing is not likely to
cause the sun to rise, then this will be invoked by inference
to discount the strength of the implication between roostercrowing to sun-rising, so that it will not be strong enough
to guide schema execution.
The entire Novamente design is structured in order to give
rise implicitly to the advanced representations and behaviors described in the next section. However, there are several processes which are explicitly geared towards their
creation. As a form of unsupervised learning, clustering is
performed on the atoms in the system with BOA. Inference
and pattern mining are performed on the activity table to
find complex recurrent patterns of activity, which and then
use them to guide future activity, and sometimes create
predicates embodying them. These recurrent patterns of
activity are called maps, and are the primary high-level
structures described in the next section.
Perception and Action
In Novamente, perception and action are fully integrated
with the rest of cognition. The system will initially be provided with modality-appropriate predicates and schemata,
such as low-level visual processing and basic movement
commands. These will be written in sloppy combinatory
logic, so that they can be modified and augmented via
BOA and PTL through experiential interactive learning,
described later.
Since perceptual processing will occur within the same
regime as the rest of the system, focus can iteratively shift
between perception and cognition, leading to a balance of
bottom-up and top-down constraints in tasks such as object
recognition, as described in Gestalt psychology (Kohler
1992, Haikonen 2003). For ambiguous percepts such as the
Necker cube (the two-dimensional shadow of a wire-frame
cube), this process need not converge.
Emergent Representation and Dynamics
Perhaps the subtlest aspect of the Novamente design is the
interaction whereby attention allocation dynamics are used
to drive inference and learning, which feed back fluidly to
direct attention allocation in turn. This allows the definition of maps; sets of Atoms that tend to be activated together, or tend to be activated according to a certain pattern, such as an oscillation or a strange attractor.
Generally speaking, the same types of knowledge are
represented by individual atoms, as well as by large maps
of atoms (some atoms gain meaning only via their coordinated activity involving other atoms). This is complementary to different tasks; atom level representation is crisp,
while map level representation is more flexible and nonbrittle. Below, we describe a few of the map types that can
emerge in Novamente.
A static map, the simplest kind, is a collection of atoms
that are strongly interconnected with Hebbian links; such
atoms in such a map will clearly tend to be acted on simultaneously, due to the spreading of short term importance.
Schema maps, or distributed schemata, represent complex
behaviors, and consist primarily of schemata. For example,
when executing a complex motor movement, a sequence of
schemata may be executed, with the precise timing and
parameters depending on input from perception and
internal state, both of which affect the attention allocation
process. A memory map consists largely of nodes denoting
specific entities, and the relationships between them.
Many of the processes implemented explicitly on the
atom level can emerge implicitly on the map level. For
example, the collection of links between the individual
atoms of two maps can be seen as a higher level structure
linking the two maps together, and will manifest many of
the same dynamics that occur along single links, in a more
flexible, complex, and context-aware fashion.
Focused attention in inference is particularly decisive for
map formation; it causes relatively small maps to form, as
well as hierarchical maps of a certain nature. If a certain set
of nodes has been held within focused attention, meaning
that many predicates embodying combinations of them
have been constructed, evaluated, and reasoned on, this
leads to the construction of Hebbian links between them,
forming a map. Focused attention in inference allows the
system to minimize the use of independence assumptions,
thus improving the accuracy of PTL. This process is quite
expensive, and scales exponentially, so focused attention
cannot hold too many items within it at one time.
While the maps formed via focused attention will generally be relatively small, their members may be nodes
grouping other nodes together, which may themselves be
involved in maps. In this way, hierarchical maps may form
via clusters of clusters. Since the clusters on each level
may be interconnected, this is a manifestation of the dual
network structure mentioned earlier.
Experiential Interactive Learning
In practice an intelligent system is not just a thinking machine, it is a control system embedded in a world, using
cognition to carry out real-world tasks that it judges important. Not only are perception and action intrinsically critical, they may also serve as a foundation for more abstract
abilities (e.g., Boroditsky and Ramscar 2002), including
communication, control of abstract cognition, and selfmodification.
Experiential interactive learning gives a mind a rich and
workable understanding of certain basic concepts that are
indispensable for making sense of itself and the world.
This understanding is expressed not as a short list of basic
facts about the world, but rather a rich network of relationships involving a short list of basic concepts. In Novamente, this corresponds to flexible concept maps
centered around a key collection of nodes.
The notion of these basic concepts has a long history;
the form that had the most impact on the Novamente design that of Wierzbicka, who attempted the first enumeration of these “semantic universals” (Wierzbicka 1972),
which by now number around forty (Wierzbicka 1996).
Without going so far as to accept that this list constitutes a
fixed and rigid universal ontology, one can posit a loosely
defined set of semantic primitives as being foundational to
understanding the world, notions such as I, you, when, because, after/before, etc. All of the primitives are grounded
in Novamente’s built-in structures and dynamics; e.g., because can be grounded in causality and implication links,
want in goals nodes, and so forth. Experiential interactive
learning allows higher level representations and mechanisms to develop around these impoverished mappings.
Through observing itself interact with an environment, a
Novamente system will build up a complex system of internal maps describing its own behavior in various contexts, and the behavior of other systems in the environment, including other minds. This will lead to the formation of a “self-system” potentially including multiple subselves (Rowan 1990), as well as the possibility of goaldirected self-modification, in which a Novamente system
modifies its own cognitive algorithms and ultimately its
own data structures in order to allow it to better achieve its
goals. This sort of experience-driven cognitive selfmodification is the crux of Novamente general intelligence.
Human Language Processing
Human language processing presents a uniquely difficult
problem for Novamente and other artificial systems. In
principle, it is not necessarily solvable by the creation of
learning and memory mechanisms that operate with human-level effectiveness. Learning human language without
a human embodiment or evolutionary heritage is a much
harder problem than learning human language in the possession of such endowments. For communication between
different Novamente systems, a special language will be
utilized, which is radically different from any human language, lacking a linear ordering.
Providing Novamente with understanding of human
language is done through a hybrid approach which combines experiential interactive learning, and a conventional
computational-linguistics based architecture implemented
within the Novamente framework. In this approach, syntax
processing is carried out using the link parsing framework
developed by (Sleator and Temperley 1991), and the output of link parsing is mapped into Novamente nodes and
links using special schemata called “semantic algorithms.”
Semantic disambiguation and reference resolution (Manning and Schuetze 1999) are carried out via PTL inference.
At present this Novamente-based NLP framework is capable of dealing with a wide variety of English sentences,
and its comprehension is enhanced via an interactive user
interface which allows users to view Novamente’s interpretation of their input and rephrase their language or correct Novamente’s interpretation as necessary.
Practical Applications
The Novamente design outline in the preceding sections of
this paper is embodied in a C++ implementation that is
approximately 50% complete. A number of performance
issues, such as effectively swapping atoms between disk
and memory, and distributed processing, create additional
complications that we have omitted for brevity. PTL and
BOA have both been implemented and tested successfully;
and much of the natural language framework described
above has been completed.
In parallel with the development of Novamente to
achieve general intelligence, the system has been utilized
more narrowly as an “AI toolkit” in the construction of
practical commercial software applications, for example:
Bioinformatics. The Biomind Analyzer, developed by
Biomind LLC together with Novamente LLC, is an enterprise system for intelligent analysis of microarray gene
expression data. BOA is used to uncover interesting patterns in labeled datasets, and also to learn classification
models. PTL is used for the integration of background information from a number of heterogeneous sources of biological knowledge, covering gene and protein function,
research papers, gene sequence alignment, protein interactions, and pathways. This allows the Biomind Analyzer to
augment the datasets it analyzes with background features
corresponding to gene or protein categories, participation
in pathways, etc. Inference is then used to create new relations between the genes and the functional categories provided by the background sources, effectively suggesting
function assignments to genes with unknown roles.
Human Language and Knowledge Management. The
Knowledge Mining Initiative, developed by Object Sciences Corp. together with Novamente LLC, utilizes Novamente’s cognitive algorithms in combination with its
NLP framework. Knowledge is entered via an interactive
interface, which allows users to review and revise the system’s understanding of that knowledge. A knowledge base
is thus produced, which is augmented by reasoning, and
may be queried in English or a special formal language.
BOA Pattern Mining is used to spontaneously create queries that are judged interesting.
Acknowledgements
The authors have been partially supported by Object Sciences Corp., Biomind LLC, and Vetta Technologies Ltda.
during the course of this research.
References
Baum, E. B. 2004. What is Thought? Cambridge, MA: MIT
Press.
Boroditsky, L., and Ramscar, M. 2002. The Roles of Body and
Mind in Abstract Thought. Psychological Science 13(2):185-188.
Indurkhya, B. 1992. Metaphor and Cognition: An Interactionist
Approach. New York, NY: Kluwer Academic.
Kohler, W. 1992. Gestalt Psychology. New York:, NY: Liveright.
Looks, M., Goertzel, B., and Pennachin, C. 2004. Learning Computer Programs with the Bayesian Optimization Algorithm.
Forthcoming.
Manning, C. and Schuetze, H. 1999. Foundations of Statistical
Natural Language Processing. Cambride, MA: MIT Press.
Minsky, M. 1986. The Society of Mind. New York, NY: Simon
and Schuster.
Ocenasek, J. 2002. Parallel Estimation of Distribution Algorithms. PhD. thesis, Faculty of Information Technology, Brno
University of Technology.
Peirce, C. S. 1892. The Law of Mind. Reprinted in Hartshorne,
C., Weiss, P., and Burks, A.W. eds. 1980. The Collected Papers
of Charles Sanders Peirce, Cambridge, MA: Harvard University
Press, 6.102-6.163.
Pelikan, M., Goldberg, D. E., and Cantú-Paz, E. 1999. BOA: The
Bayesian Optimization Algorithm. Technical Report, IlliGAL
Report No. 99003, Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign.
Pelikan, M. 2002. Bayesian Optimization Algorithm: From Single
Level to Hierarchy. Ph.D. thesis, Department of Computer Science, University of Illinois at Urbana-Champaign.
Rowan, J. 1990. Subpersonalities, London: Routledge Press.
Chaitin, G. 1987. Algoritmic Information Theory. New York, NY:
Cambridge University Press.
Sleator, D., and Temperley, D. 1991. Parsing English with a Link
Grammar. Technical Report, CMU-CS-91-196, Dept. of Computer Science, Carnegie Mellon University.
Curry, H. B., and Feys, R. 1958. Combinatory Logic. Amsterdam,
Holland: North-Holland.
Solomonoff, R. J. 1964. A Formal Theory of Inductive Inference,
Part 1 and 2. Inform. Contr. 7:1-22, 224-254.
Goertzel, B. 1993. The Structure of Intelligence. New York, NY:
Springer-Verlag.
Solomonoff, R. J. 1978. Complexity-based induction systems:
Comparisons and Convergence Theorems. IEEE Trans. Inform.
Theory IT-24:4:422-432.
Goertzel, B. 1997. From Complexity to Creativity. New York,
NY: Plenum Press.
Goertzel, B. 2002. Creating Internet Intelligence. New York, NY:
Plenum Press.
Goertzel, B., Iklé, M., Freire, I. L., Pressing, J., and Pennachin. C.
2004. Probabilistic Term Logic. Forthcoming.
Goertzel, B., Silverman, K., Hartley C., Bugaj, S., and Ross, M.
2000. The Baby Webmind Project. In Proceedings of AISB 00,
London: SSAISB.
Haikonen, P. 2003. The Cognitive Approach to Conscious Machines. Exeter, UK: Imprint Academic.
Hofstadter, D. 1986. Metamagical Themas: Questing for Essence
of Mind and Pattern. New York, NY: Bantam Books..
Hutter, M. 2000. A Theory of Universal Artificial Intelligence
Based on Algorithmic Complexity. Technical Report, ISDIA-1400, ISDIA.
Sommers, F., and Englebretsen, G. 2000. An Invitation to Formal
Reasoning. Aldershot, UK: Ashgate Publishing Company.
Wang, P. 1993. Belief Revision in Probability Theory. In Proceedings of the Ninth Conference of Uncertainty in Artificial
Intelligence, 519-526. San Mateo, CA: Morgan Kaufmann.
Wang, P. 1995. Non-Axiomatic Reasoning System - Exploring
the Essence of Intelligence. Ph.D. thesis, Department of Computer Science, Indiana University.
Wierzbicka, A. 1972. Semantic Primitives. Frankfurt: Athenäum.
Wierzbicka, A. 1996. Semantics, Primes and Universals. Oxford:
Oxford University Press.
Download